[linux-dvb] RFC 0.3: MPEG encoding and decoding V4L2/DVB API additions

Hans Verkuil hverkuil at xs4all.nl
Sun Feb 18 17:36:03 CET 2007

RFC MPEG encoding and decoding V4L2/DVB API additions
Version 0.3

(This third revision incorporates the comments and suggestions that 
resulted from discussing this RFC with Ralph Metzler. This should be 
pretty much the final version. I hope.)

This RFC adds new functionality to the V4L2/DVB API in order to properly 
support MPEG hardware encoders and decoders. This is mostly driven by 
the work to get the ivtv driver (www.ivtvdriver.org) into the kernel, 
but it can also benefit other hardware encoders and decoders. Which is 
why this RFC is cross-posted to the dxr3-devel mailinglist as well.

A general note: while MPEG-1/2/4 is currently the codec most often 
found, this RFC should also work for other compressed-stream format, 
possibly with some later additions.

This RFC only deals with the encoding and decoding part. The cx23415 
also supports and On-Screen Display (OSD). Another RFC will appear for 
that later. I need to do some more research on that first before I can 
issue that.

This RFC is divided into several sections. The first section describes a 
few additional MPEG compression controls. It is followed by a 
description of the new MPEG Index functionality. Then a description is 
given of the actual MPEG encoding commands (start, stop, pause, resume) 
and how to handle timing information.

This is followed by a description of the MPEG decoding API, in 
particular how the DVB decoding API maps to what is needed for the ivtv 
driver, and how it can be extended to support the functionality of the 

Changes since 0.2:


Part I: MPEG encoding

This API has been reviewed by Mauro and his suggestions have been 
As far as I am concerned this is pretty much the definitive API as far 
as MPEG encoding is concerned.

MPEG compression controls

Type: integer
Description: Mutes the video to a fixed color when capturing. This is 
useful for testing as it creates a fixed and reproducable video 

The supplied 32-bit integer has the following value:

         0      '0'=video not muted
                '1'=video muted, creates frames with the YUV color 
defined below
         1:7    Unused, set to 0.
         8:15   V chrominance information
        16:23   U chrominance information
        24:31   Y luminance information

Type: bool
Description: Mutes the audio when capturing. This is not done by muting 
audio hardware, which can still produce a slight hiss, but in the 
encoder itself, guaranteeing a fixed and reproducable audio bitstream.

0 = unmuted, 1 = muted.
Type: bool
Description: this control is specific to the CX23415/6. If set, then it 
enables navigation pack insertion for DVD. To be precise: it adds 0xbf 
(private stream 2) packets to the MPEG. The size of these packets is 
2048 bytes (including the 6-byte header). The payload is zeroed and it 
is up to the application to fill them in. These packets are inserted 
every four frames.

0 = do not insert, 1 = insert DVD navigation packets.

MPEG Index

#define V4L2_ENC_IDX_FRAME_I    (0)
#define V4L2_ENC_IDX_FRAME_P    (1)
#define V4L2_ENC_IDX_FRAME_B    (2)
#define V4L2_ENC_IDX_FRAME_MASK (0xf)

struct v4l2_enc_idx_entry {
	u64 offset;
	u64 pts;
	u32 length;
	u32 flags;
	u32 reserved[2];

#define V4L2_ENC_IDX_ENTRIES (64)
struct v4l2_enc_idx {
	u32 entries;
	u32 entries_cap;
	u32 reserved[4];
	struct v4l2_enc_idx_entry entry[V4L2_ENC_IDX_ENTRIES];
#define VIDIOC_G_ENC_INDEX        _IOR('V', 64, struct v4l2_enc_idx)

Return MPEG stream indices. I.e. at the given offset a frame starts 
(P/I/B according to the flags) and with the given PTS (Presentation 
Time Stamp) and length. The offset may never exceed the number of bytes 
actually read. I.e. it should never return 'future events'.

'entries' is the number of entries filled in the entry array.
'entries_cap' is the capacity of the index in the driver. This may be 
larger or smalled than V4L2_ENC_IDX_ENTRIES. 'entries' will always be 
less or equal to min(entries_cap, V4L2_ENC_IDX_ENTRIES).

If this ioctl is called when no capture is in progress, then 'entries' 
is 0 and 'entries_cap' should be set to the capacity. This way 
applications can check beforehand how frequently the index should be 

MPEG Encoding commands

#define V4L2_ENC_CMD_START 	(0)
#define V4L2_ENC_CMD_STOP 	(1)
#define V4L2_ENC_CMD_PAUSE 	(2)
#define V4L2_ENC_CMD_RESUME	(3)

/* Flags for V4L2_ENC_CMD_STOP */
#define V4L2_ENC_CMD_STOP_AT_GOP_END 	(1 << 0)

struct v4l2_encoder_cmd {
	__u32 cmd;
	__u32 flags;
	union {
		struct {
			__u32 data[8];
		} raw;
#define VIDIOC_ENCODER_CMD     _IORW('V', 69, struct v4l2_encoder_cmd)
#define VIDIOC_TRY_ENCODER_CMD _IORW('V', 69, struct v4l2_encoder_cmd)

Before calling this ioctl the unused fields of v4l2_encoder_cmd must be 

'cmd' is set by the user and is the command for the encoder.
'flags' is currently only used by the STOP command and contains one bit: 
If V4L2_ENC_CMD_STOP_AT_GOP_END is set, then the capture continues 
until the end of the GOP, otherwise it stops immediately.

These ioctl wills check whether the command is supported (-EINVAL is 
returned if not) and modify any arguments if needed to make it a valid 
call for the available hardware. The modified arguments are returned. 
that the TRY ioctl does not actually execute the command.

Note that a read() to a stopped encoder implies a V4L2_ENC_CMD_START. A 
close() of an encoder that is currently encoding implies an immediate 
V4L2_ENC_CMD_STOP. When the encoder has no more pending data after 
issuing a STOP the read() call will return 0 to indicate that the 
encoder has stopped. The next read will start the encoder again.

MPEG Timing

The dvb API contains two ioctls: AUDIO_GET_PTS and VIDEO_GET_PTS.
For the conexant chips the way to obtain PTS values during MPEG encoding 
is through the VIDIOC_G_ENC_INDEX ioctl. The only time when the PTS is 
needed in ivtv is when capturing raw PCM and YUV. Since these two raw 
streams are not in sync you need the actual PTS value from each in 
order to synchronize them. For that you can use the dvb API. The PCM 
device will change anyway to an ALSA device in the future. And this 
feature is of very limited interest.

Part II: MPEG decoding

For MPEG decoding there is a DVB API available (media/video.h). After 
researching this API it's become clear that it can be used for most of 
the ivtv functionality. Especially if some small additions can be made.

Together with Ralph Metzler I arrived at the following additions:

MPEG Decoding commands

In this section I will examine how to implement the decoding 
functionality of the conexant cx23415 in terms of the DVB API, and 
what, if any, additions to that API are needed to support it fully.

1) Start/Stop/Pause/Resume decoding

After discussing this with Ralph it became clear that it was best to add 
two new ioctls (as designed in the first version of this RFC) since the 
existing VIDEO_PLAY/STOP/FREEZE/CONTINUE did not provide the required 
functionality. The existing ioctls can still be used, but only do the 
simple action. For more refined control (and better support for future 
extensions) new VIDEO_COMMAND and VIDEO_TRY_COMMAND ioctls are added. 
This ensures that existing apps won't break, but that the cx23415 is 
still fully supported. Also future extensions are much easier.

#define VIDEO_CMD_PLAY        (0)
#define VIDEO_CMD_STOP        (1)
#define VIDEO_CMD_FREEZE      (2)
#define VIDEO_CMD_CONTINUE    (3)

/* Flags for VIDEO_CMD_CONTINUE */
#define VIDEO_CMD_PAUSE_TO_BLACK     (1 << 0)

/* Flags for VIDEO_CMD_STOP */
#define VIDEO_CMD_STOP_TO_BLACK      (1 << 0)

/* Flags for VIDEO_CMD_PLAY */

/* Play input formats: */

/* The decoder has no special format requirements */
#define VIDEO_PLAY_FMT_NONE         (0)
/* The decoder requires full GOPs */
#define VIDEO_PLAY_FMT_GOP          (1)

struct video_command {
        __u32 cmd;
        __u32 flags;
        union {
                struct {
                        __u64 pts;
                } stop;

                struct {
                        __u32 speed;
                        __u32 format;
                } play;

                struct {
                        __u32 data[16];
                } raw;
#define VIDEO_COMMAND     _IORW('o', 58, struct video_command)
#define VIDEO_TRY_COMMAND _IORW('o', 59, struct video_command)

Before calling this ioctl the unused fields of video_command must be 

'cmd' is set by the user and is the command for the decoder.

'flags' is used by several commands:

PAUSE and STOP can either leave the last frame or clear the output to  
black at the end depending on the specified flag.

VIDEO_CMD_PLAY_SPEED_MUTE_AUDIO selects whether the audio should be 
muted when decoding at non-standard speed.

Some extra arguments are available for specific commands:

Stop can set the PTS it should stop at. If pts == 0, then the decoder 
stops accepting new data immediately.

In order to wait until the decoder has finished a new event is added: 
VIDEO_EVENT_DECODER_STOPPED. You can select() or poll() on the video 
device to wait for an exception and use VIDEO_GET_EVENT to query it. 
This is valid for both the stop VIDEO_COMMAND and for the VIDEO_STOP 

Play has a speed setting as extra argument. PLAY can be called again 
when already playing in order to change the speed.

For the speed setting to the play command I suggest that the 
DVB_VIDEO_PLAY proposal from the DVB V4 API document is followed: the 
speed argument would be interpreted as follows:

   speed == 0 || speed == 1000: normal speed
   speed == 1: single step forward
   speed == -1: single step backward
   1 < speed < 1000: slow forward
   speed > 1000: fast forward
   speed == -1000: reverse play at normal speed
   -1000 < speed < -1: slow reverse
   speed < -1000: fast reverse.

The driver will return the closest actual speed that the driver can 
handle, together with the required input format. E.g. for reverse 
playback the cx23415 requires full GOPs, fed into the decoder in 
reverse order.

An error is returned if the requested feature is completely unsupported 
(e.g. if the hardware cannot do single stepping or reverse playback).

These ioctls will check whether the command is supported (-EINVAL is  
returned if not) and modify any arguments if needed to make it a valid 
call for the available hardware. The modified arguments are returned.  
The VIDEO_TRY_COMMAND is identical to VIDEO_COMMAND, except that the 
TRY ioctl does not actually execute the command.

Note that a write() to a stopped decoder implies a VIDEO_CMD_PLAY. A  
close() of a decoder that is currently decoding implies an immediate  
VIDEO_CMD_STOP. When the decoder stops accepting data after issuing  a 
STOP the write() call will return 0 to indicate that the decoder has  
stopped and accepts no more data. The next write will start the decoder  

2) Passthrough

The Passthrough feature of the cx23415 does the following:
if the passthrough mode is started then the video/audio input from the 
MPEG encoder is routed straight to the video/audio output. This is done 
internally in the cx23415. While Passthrough is on, it is still 
possible to record from the input at the same time. It's basically live 
TV functionality.

For this the VIDEO_SELECT_SOURCE is actually a good choice. Selecting 
VIDEO_SOURCE_DEMUX will select passthrough mode, selecting 

3) Timing information on the displayed frame

Use VIDEO_GET_PTS. There is current no method of retrieving the SCR/PCR 
clock, though. But I don't think anyone is using that. In the future it 
might be possible to use DMX_GET_STC for this.

More problematic is that MythTV is using the frame counter (i.e. how 
many frames have been played back since the start of the stream). For 
that I would need a VIDEO_GET_FRAME_COUNT ioctl:

#define VIDEO_GET_FRAME_COUNT  _IOR('o', 60, __u64)

4) Wait for next frame to be displayed

Several applications need to know when a new frame is displayed. This 
usually triggers some On Screen Display update or something like that. 
This too is easy to implement using event. All that is needed is a new 

5) Audio mode selection

The cx23415 allows automatic selection of the audio mode (stereo, left, 
right, mono or swapped channels) for both a normal stereo capture and a 
bilingual capture.

The AUDIO_CHANNEL_SELECT ioctl comes close. If the 
audio_channel_select_t enum was extended with AUDIO_MONO and 
added, then this would fully implement this feature.

6) Scaling and positioning of the video

The cx23415 can take the MPEG stream and scale it to an arbitrary width 
and height and position it at anywhere in the TV-out screen. So you can 
get effects like having the MPEG output to the top left corner and an 
OSD in the lower right corner.

With VIDIOC_S_FMT I can set the width and height, but there is no 
provision for an x and y coordinate. Can the struct v4l2_pix_format be 
expanded to include this? It would be the logical place for it. For 
most devices the x and y would always to 0, so I don't think it would 
be a problem.

This concludes this RFC. Comments are welcome!


	Hans Verkuil

More information about the linux-dvb mailing list