Recording video from a TV tuner card

2020/06/06

Just in case anyone would need a hastily written summary of how to record video (... e.g. for archival of VHS tapes) from a v4l2 device, in our case a Leadtek Winfast 2000XP Expert, from... well, 2004.

(... it'll probably work the same with newer, USB-based frame grabber devices, too.)

First of all, install v4l-utils (Ubuntu) or something similarly named, which we'll need for the v4l2-ctl tool. We'll use this to change inputs, inspect input formats, etc. We'll of course also need ffmpeg, which is nice enough to be able to read from v4l2 directly.

First of all, we'll need to set up which input we'd like to capture video from. TV tuners are a bit more complex, but for composite we can do things like:

          Driver Info (not using libv4l2):
        Driver name   : cx8800
        Card type     : Leadtek Winfast 2000XP Expert
        Bus info      : PCI:0000:03:06.0
        Driver version: 4.15.18
        Capabilities  : 0x85250011
                Video Capture
                VBI Capture
                Tuner
                Radio
                Read/Write
                Streaming
                Extended Pix Format
                Device Capabilities
        Device Caps   : 0x05210001
                Video Capture
                Tuner
                Read/Write
                Streaming
                Extended Pix Format
Priority: 2
Frequency for tuner 0: 6400 (400.000000 MHz)
Tuner 0:
        Name                 : Television
        Type                 : Analog TV
        Capabilities         : 62.5 kHz stereo lang1 lang2 freq-bands
        Frequency range      : 44.000 MHz - 958.000 MHz
        Signal strength/AFC  : 100%/0
        Current audio mode   : mono
        Available subchannels: mono
Video input : 1 (Composite1: ok)
Video Standard = 0x00000200
        PAL-N
Format Video Capture:
        Width/Height      : 704/576
        Pixel Format      : 'YUYV'
        Field             : Interlaced
        Bytes per Line    : 1408
        Size Image        : 811008
        Colorspace        : SMPTE 170M
        Transfer Function : Default (maps to Rec. 709)
        YCbCr/HSV Encoding: Default (maps to ITU-R 601)
        Quantization      : Default (maps to Limited Range)
        Flags             :
Streaming Parameters Video Capture:
        Frames per second: 25.000 (25/1)
        Read buffers     : 2

User Controls

                     brightness 0x00980900 (int)    : min=0 max=255 step=1 default=127 value=127 flags=slider
                       contrast 0x00980901 (int)    : min=0 max=255 step=1 default=63 value=63 flags=slider
                     saturation 0x00980902 (int)    : min=0 max=255 step=1 default=127 value=127 flags=slider
                            hue 0x00980903 (int)    : min=0 max=255 step=1 default=127 value=127 flags=slider
                         volume 0x00980905 (int)    : min=0 max=63 step=1 default=63 value=63 flags=slider
                        balance 0x00980906 (int)    : min=0 max=127 step=1 default=64 value=64 flags=slider
                           mute 0x00980909 (bool)   : default=1 value=1
                      sharpness 0x0098091b (int)    : min=0 max=4 step=1 default=0 value=0 flags=slider
                     chroma_agc 0x0098091d (bool)   : default=1 value=1
                   color_killer 0x0098091e (bool)   : default=1 value=1
               band_stop_filter 0x00980921 (int)    : min=0 max=1 step=1 default=0 value=0

What we might have to set:

if colors are weird: the video format; see --list_standards and --set-standard. For Europe, PAL-B/G or PAL-D/K should be fine.
the input source: -n and --set-input

Note that there might be subtle differences between PAL standards. Namely, if you pick PAL-M (which happens to be a Brazil standard), you'll get a picture that's almostgood, except you do get a lot of saw-like aliasing artifacts. The reason for this is that for interlaced video, they use a "bottom-first" field order, where other PAL versions are top-first... so if you mix these up, you'll get frames looking like line 1, line 3, line 2, line 4 etc. This doesn't look overly good.

As for "resolution", PAL is supposed to be around 720x576, so I just went with that. (... also, if you turn up the ffmpeg resolution to something much larger, that's the image size you get, so it's probably the cleanest one.)

By the way, here is the command to try it out:

          ffplay -f v4l2 -video_size 1024x768 -i /dev/video0

Encoder parameters

If we want to actually encode this... I'll start with the ffmpeg command in case you'd want to just copy this; the explanation follows later.

          ffmpeg -f v4l2 -video_size 1080x1024 -i /dev/video0 \
             -f alsa -ar 44100 -i hw:1,0 \
             -c:v h264 -flags +ildct+ilme -preset veryfast -b:v 10M \
             -c:a aac -ac 1 -b:a 192k \
             $the_file_name_you_want_to_output

Note that this probably won't work straight out of the box... there might be some tweaking to do.

Audio

We're using ALSA directly. I guess technically we could go throgh Pulseaudio; nevertheless, given how it also uses ALSA underneath, there is a smaller chance of things going broken with one fewer layers. It might, on the other hand, grab the ALSA card and not let you record. Before trying to kill it with pulseaudio -k , you might want to look at /etc/pulse/client.conf, especially at the "autospawn = yes" line, and potentially comment it out, since otherwise it might end up restarting automatically. (... of course, this might break things that actually need pulseaudio, e.g. sound in firefox; pulseaudio --start is still a thing though.)

To grab things from the right ALSA card, you might want to run arecord -l , to list the available recording devices. The "1,0" corresponds to card 1, subdevice 0 in a listing like this:

          **** List of CAPTURE Hardware Devices ****
          card 1: CODEC [USB Audio CODEC], device 0: USB Audio [USB Audio]
            Subdevices: 1/1
            Subdevice #0: subdevice #0

Also, 44100 Hz as a sampling rate seems reasonable (CD audio has this); the other popular option is 48000 Hz. As for the output, we picked aac as the codec, but mp3 should work fine; 192k for a mono file (-ac 1) should be on the "overly too much, and thus definitely enough" side. We're recording mono audio because both the VCR and the camcorder I have has only a single output; if you have better equipment, you can definitely keep it stereo here.

... and one more note on sound cards. Maybe it was me who was unlucky, but... I went through about 4 sound cards while trying to get this to work. The onboard input had terrible a/v sync issues. The SB Live! that worked a few years ago in this same machine just didn't record anything anymore, no matter the configuration in alsamixer. Yet another PCI card, a cheap-ish one with with an Ensoniq ES1938 chip, worked, but kept producing buffer underruns after a few tens of seconds. Yet another Ensoniq (... an ES1370) worked nicely for about half a minute after each time the machine was switched on, then made the entire machine hang and disappeard from lspci after a reset. So... the device mentioned above is a Behringer U-Phoria UM2; it has actual audio level monitor LEDs, it just works, and it's not too expensive either; highly recommended.

Video

-f v4l2 -i /dev/video0 specifies that we'd like to grab video from the TV tuner card. It might also end up being /dev/video1 etc, if you happen to have multiple TV tuner cards or webcams. For the video size, we specify an unreasonably large value, as mentioned before, so it'll pick the max resolution it can grab; ffmpeg will actually tell you about the size it picked once it starts up.

We're encoding h.264 (a.k.a. AVC) video; this is the same codec as most phones / camcorders record in lately. I'm using -preset veryfast here to decrease the amount of CPU power needed to encode the video; it's definitely needed on the 2011-ish three-core AMD box I'm using.

While this will make compression a bit less efficient, the one thing we're not saving on is bitrate. That 10 megabits for -b:v is probably massively overkill for an SD resolution; this is what a typical BluRay disc uses for excellent quality HD video. However... it's a good idea for multiple reasons:

we got to make up for -preset veryfast; our compression is less good for the given bitrate, so we just add more bits.
this is archive material; we want it in the best possible quality.
hard drives are cheap; if we need something actually portable / streamable, we can re-encode later (... for which it's also nice not to have lost a lot of quality while grabbing it in the first place)
the input video is fairly grainy (... some of it was taken by a not-really-top-of-the-line Video8 camcorder 20 years ago... some others recorded from TV with not-that-great reception); we'd want to preserve the graininess for historical accuracy, but it's basically just noise, which takes up an unreasonably large amount of bandwidth to store.

Also, one more thing about "archival": interlacing. Short story: analog TV sends only half of the picture each time, with every second line. What we'd want when viewing these is to fill in the gaps, inferred from neighboring frames (also known as "deinterlacing"); without this, the result will have comb-like artifacts on computer screens. (Analog TVs were just slow and blurry enough for it to blend in; newer, digital ones have chips in them to do the deinterlacing.) So we can choose: do we do it before encoding or while playing the video back?

Doing it before sounds like a good idea: we actually get a video that's playable without yet another filter, and compression seems easier too if the encoder doesn't have to look at weird comb-y artifacts. However... this doesn't really fit well into our "preserve the original" idea; also, as it turns out, h.264 can keep the video interlaced while encoding (... in fact, many camcorders record in 1080i). As with the weirdly high bitrate, we can just fix this later (... and then do it again with the same archive file but in a different way, if we find a better method). This is where -flags +ildct+ilme are coming from.

You can even test this out, to make it sure that it's not just comb-y video encoded (badly) as progressive (as in: non-interlaced). Just turn the bitrate down to a stupidly low level (200k?), record some video, and look at the result. It'll look terribly blocky, but it should have interlace comb artifacts that would have no chance to fit into the 200k bitstream.

Conclusions

... yes, it's a surprising amount of work. However, you'll get high-quality videos that are as close to the original as possible. Definitely worth it.