What is a video file?
Let us start by understanding more about what a video file contains. Find a video file on your computer and look at its metadata. On most systems (Linux, Mac, Windows), you should see some basic information about the video content by selecting something like "properties" from the file inspector (see example in the figure to the right). From this information, you should be able to answer these questions:
- What are the dimensions?
- What is the framerate?
- What type of compression is used?
All of these questions can be answered by looking at the metadata of the file. From the description of our example video, we can see the dimensions (640 pixels wide, 480 pixels tall) and framerate (30 frames per second). We can also see that the video is compressed with the H.264 codec at a bitrate of 1500 kbps. These concepts will be described below.
Video as a stream of numbers
A digital video file is a collection of numbers, as illustrated in the figure below. In a typical video file, each pixel is stored 8-bit resolution. That means that a number between 0 and 255 represents each pixel, where 0 means black and 255 means white.
Also, video files in colour have four planes per pixel: the alpha channel (transparency), red, green, and blue. A greyscale video only has one plane per pixel. So greyscale files are 1/4 the size of a colour file. Similarly, processing greyscale files take 1/4 the time of a colour file. This is important to remember when doing computational video analysis.
Container formats
One of the confusing things about video files is that they have both a container and a compression format. The container is often what denotes the file suffix. Apple introduced the .mov format for QuickTime files, and Microsoft used to use .avi files.
Nowadays, there seems to a converge towards using MPEG containers and .mp4 files. However, both Apple and Microsoft software (and others) still output other formats. This is confusing and can also lead to various playback issues. For example, many web browsers are not able to play these formats natively. Read more about media container formats.
Compression formats
The compression format denotes how the video data is organized on the inside of a container. Also, here there are many different formats. The most common today is to use the H.264 format for video and AAC for audio. These are both parts of the MPEG-4 standard and can be embedded in .mp4 containers. However, both H.264 and AAC can also be embedded in other containers, such as .mov and .avi files.
The important thing to notice is that both .mov and .avi files may contain H.264 video and AAC audio. In those cases, the inside of such files is identical to the content of a .mp4 file. But since the container is different, it may still be unplayable in certain software. That is why I would like to convert from one container format to another. In practice that means converting from .mov or .avi to .mp4 files.
The H.264 standard is the most common video compression codec these days. It is a lossy codec, meaning that it throws away lots of data when it compresses the file. The H.264 standard is also a time-based compression codec, meaning that it compares frames over time, and only stores the information that changes between so-called keyframes. This is an efficient way of creating good-looking videos, but it is less ideal for analytical purposes.
Recording video for analysis
One thing to bear in mind is that a video recording meant for analytical purposes is quite different from a video recording shot for documentary or artistic purposes. The latter type of video is usually based on creating an aesthetically pleasing result, which often includes continuous variation in the shots through changes in the lighting, background, zooming, panning, etc. A video recording for analysis, on the other hand, is quite the opposite: it is best to record it in a controlled studio or lab setting with as few camera changes as possible. This ensures that it is the recording's content, that is, the human motion, which is in focus, not the motion of the camera or the environment.
Even though a controlled environment may be the best choice from a purely scientific point of view, it is possible to obtain useful recordings for analytical purposes also out in the field. This, however, requires some planning and attention to detail. Here are a few things to consider:
-
Foreground/background: place the subject in front of a background that is as plain as possible, so it is possible to discern easily between the important and non-important elements in the image. It is essential for computer vision recordings to avoid backgrounds with moving objects, since these may influence the analysis.
-
Lighting: avoid changing lights, as they will influence the final video. It may be worth recording with an infrared camera in dark locations, or if the lights are changing rapidly (such as in a disco or club concert). Some consumer cameras come with a “night mode.” that serves the same purpose. Even though such recordings' visual result may be unsatisfactory, they can still work well for computer-based motion analysis.
-
Camera placement: place the camera on a tripod, and avoid moving the camera while recording. Both panning and zooming make it more difficult to analyse the content of the recordings later. If both overview images and close-ups are needed, it is better to use two (or more) cameras to capture different parts of the scene in question.
-
Image quality: it is always best to record at the highest possible spatial (number of pixels), temporal (frames per second) and compression (format and ratio) settings the camera allows for. However, the most important is to find a balance between image quality, file size and processing time.
As mentioned earlier, a video recording can be used as the starting point for both qualitative and quantitative analysis. We will look at a couple of different possibilities, moving from more qualitative visualisation methods to advanced motion capture techniques.
Preparing video for analysis
The H.264 codec creates video files with high visual quality with a small filesize. It is possible to use regular .MP4 files for computer-based video analysis. However, in most cases, computer vision software prefers to work with raw data or other compression formats:
-
Video: use MJPEG (Motion JPEG) as the compression format. This compresses each frame individually. Use .AVI as the container, since this is the one that works best on all platforms.
-
Audio: use uncompressed audio (16-bit PCM), saved as .WAV files (.AIFF usually also works fine). If you need to use compression, MP3 compression (MPEG-1, Layer 3) is still more versatile than AAC (used in .MP4 files). If you use a bitrate of 192 Kbs or higher, you should not get too many artefacts.
FFMPEG is a handy (free) tool for doing all sorts of audio/video manipulation and installed on most systems. It is somewhat intimidating for beginners, but the trick is to know what works. Here is a oneliner that will convert from an .MP4 file into a .AVI file with MJPEG and PCM audio:
FFmpeg -i input.mp4 -c:a pcm_s16le -c:v mjpeg -q:v 3 -huffman optimal output.avi