[linux-dvb] RFC: MPEG encoding and decoding V4L2/DVB API additions

Hans Verkuil hverkuil at xs4all.nl
Sun Feb 18 14:03:01 CET 2007


RFC MPEG encoding and decoding V4L2/DVB API additions
Version 0.2

(The latest version of this RFC can be found here as well:
http://ivtvdriver.org/viewcvs/ivtv/trunk/doc/)

This RFC adds new functionality to the V4L2/DVB API in order to properly 
support MPEG hardware encoders and decoders. This is mostly driven by 
the work to get the ivtv driver (www.ivtvdriver.org) into the kernel, 
but it can also benefit other hardware encoders and decoders. Which is 
why this RFC is cross-posted to the dxr3-devel mailinglist as well.

A general note: while MPEG-1/2/4 is currently the codec most often 
found, this RFC should also work for other compressed-stream format, 
possibly with some later additions.

This RFC only deals with the encoding and decoding part. The cx23415 
also supports and On-Screen Display (OSD). Another RFC will appear for 
that later. I need to do some more research on that first before I can 
issue that.

This RFC is divided into several sections. The first section describes a 
few additional MPEG compression controls. It is followed by a 
description of the new MPEG Index functionality. Then a description is 
given of the actual MPEG encoding commands (start, stop, pause, resume) 
and how to handle timing information.

This is followed by a description of the MPEG decoding API, in 
particular how the DVB decoding API maps to what is needed for the ivtv 
driver, and how it can be extended to support the functionality of the 
driver.


Part I: MPEG encoding
=====================

This API has been reviewed by Mauro and his suggestions have been 
incorporated. As far as I am concerned this is pretty much the 
definitive API as far as MPEG encoding is concerned.

MPEG compression controls
-------------------------

V4L2_CID_MPEG_VIDEO_MUTE
Type: integer
Description: Mutes the video to a fixed color when capturing. This is 
useful for testing as it creates a fixed and reproducable video 
bitstream.

The supplied 32-bit integer has the following value:

         0      '0'=video not muted
                '1'=video muted, creates frames with the YUV color 
defined below
         1:7    Unused, set to 0.
         8:15   V chrominance information
        16:23   U chrominance information
        24:31   Y luminance information

V4L2_CID_MPEG_AUDIO_MUTE
Type: bool
Description: Mutes the audio when capturing. This is not done by muting 
audio hardware, which can still produce a slight hiss, but in the 
encoder itself, guaranteeing a fixed and reproducable audio bitstream.

0 = unmuted, 1 = muted.
 
V4L2_CID_MPEG_CX2341X_STREAM_INSERT_NAV_PACKETS
Type: bool
Description: this control is specific to the CX23415/6. If set, then it 
enables navigation pack insertion for DVD. To be precise: it adds 0xbf 
(private stream 2) packets to the MPEG. The size of these packets is 
2048 bytes (including the 6-byte header). The payload is zeroed and it 
is up to the application to fill them in. These packets are inserted 
every four frames.

0 = do not insert, 1 = insert DVD navigation packets.


MPEG Index
----------

#define V4L2_ENC_IDX_FRAME_I    (0)
#define V4L2_ENC_IDX_FRAME_P    (1)
#define V4L2_ENC_IDX_FRAME_B    (2)
#define V4L2_ENC_IDX_FRAME_MASK (0xf)

struct v4l2_enc_idx_entry {
	u64 offset;
	u64 pts;
	u32 length;
	u32 flags;
	u32 reserved[2];
};

#define V4L2_ENC_IDX_ENTRIES (64)
struct v4l2_enc_idx {
	u32 entries;
	u32 entries_cap;
	u32 reserved[4];
	struct v4l2_enc_idx_entry entry[V4L2_ENC_IDX_ENTRIES];
};
#define VIDIOC_G_ENC_INDEX        _IOR('V', 64, struct v4l2_enc_idx)

Return MPEG stream indices. I.e. at the given offset a frame starts 
(P/I/B according to the flags) and with the given PTS (Presentation 
Time Stamp) and length. The offset may never exceed the number of bytes 
actually read. I.e. it should never return 'future events'.

'entries' is the number of entries filled in the entry array.
'entries_cap' is the capacity of the index in the driver. This may be 
larger or smalled than V4L2_ENC_IDX_ENTRIES. 'entries' will always be 
less or equal to min(entries_cap, V4L2_ENC_IDX_ENTRIES).

If this ioctl is called when no capture is in progress, then 'entries' 
is 0 and 'entries_cap' should be set to the capacity. This way 
applications can check beforehand how frequently the index should be 
obtained. 


MPEG Encoding commands
----------------------

#define V4L2_ENC_CMD_START 	(0)
#define V4L2_ENC_CMD_STOP 	(1)
#define V4L2_ENC_CMD_PAUSE 	(2)
#define V4L2_ENC_CMD_RESUME	(3)

/* Flags for V4L2_ENC_CMD_STOP */
#define V4L2_ENC_CMD_STOP_AT_GOP_END 	(1 << 0)

struct v4l2_encoder_cmd {
	__u32 cmd;
	__u32 flags;
	union {
		struct {
			__u32 data[8];
		} raw;
	};
};
#define VIDIOC_ENCODER_CMD     _IORW('V', 69, struct v4l2_encoder_cmd)
#define VIDIOC_TRY_ENCODER_CMD _IORW('V', 69, struct v4l2_encoder_cmd)

Before calling this ioctl the unused fields of v4l2_encoder_cmd must be 
zeroed.

'cmd' is set by the user and is the command for the encoder.
'flags' is currently only used by the STOP command and contains one bit: 
If V4L2_ENC_CMD_STOP_AT_GOP_END is set, then the capture continues 
until the end of the GOP, otherwise it stops immediately.

These ioctl wills check whether the command is supported (-EINVAL is 
returned if not) and modify any arguments if needed to make it a valid 
call for the available hardware. The modified arguments are returned. 
The VIDIOC_TRY_ENCODER_CMD is identical to VIDIOC_ENCODER_CMD, except 
that the TRY ioctl does not actually execute the command.

Note that a read() to a stopped encoder implies a V4L2_ENC_CMD_START. A 
close() of an encoder that is currently encoding implies an immediate 
V4L2_ENC_CMD_STOP. When the encoder has no more pending data after 
issuing a STOP the read() call will return 0 to indicate that the 
encoder has stopped. The next read will start the encoder again.

MPEG Timing
-----------

The dvb API contains two ioctls: AUDIO_GET_PTS and VIDEO_GET_PTS. For 
the conexant chips the way to obtain PTS values during MPEG encoding is 
through the VIDIOC_G_ENC_INDEX ioctl. The only time when the PTS is 
needed in ivtv is when capturing raw PCM and YUV. Since these two raw 
streams are not in sync you need the actual PTS value from each in 
order to synchronize them. For that you can use the dvb API. The PCM 
device will change anyway to an ALSA device in the future. And this 
feature is of very limited interest.


Part II: MPEG decoding
======================

For MPEG decoding there is a DVB API available (media/video.h). After 
researching this API it's become clear that it can be used for most of 
the ivtv functionality. Especially if some small additions can be made.

This has been discussed with Mauro, but needs review from Ralph Metzler 
and Mauro.


MPEG Decoding commands
----------------------

In this section I will examine how to implement the decoding 
functionality of the conexant cx24315 in terms of the DVB API, and 
what, if any, additions to that API are needed to support it fully.

1) Start decoding

Use VIDEO_PLAY (but see item 5, Speed control, for extra changes).

2) Stop decoding

Use VIDEO_STOP. The cx23415 can keep showing the last frame or go to 
black. That can be implemented by VIDEO_SET_BLANK. However, I would 
suggest an addition to VIDEO_STOP: pass a STOP_TO_BLACK flag as 
argument, that puts this setting in the place where it belongs, instead 
of requiring the application to keep track of the previous SET_BLANK 
setting. ivtv currently uses a similar mechanism as SET_BLANK and it is 
very awkward to work with. In practice you have to first call 
SET_BLANK, followed by STOP to be sure you have the correct BLANK 
setting.

ivtv also has an option to wait until the decoder has finished with all 
pending MPEG data. This can be perfectly implemented using the EVENT 
mechanism. All that is needed is a new event: 
VIDEO_EVENT_DECODER_STOPPED.

You can select() or poll() on that, and it is much better than my 
original proposal.

Finally, you can specify a PTS value at which the decoder should stop. 
There is currently no way of doing that in the DVB API. One option 
might be to add a VIDEO_S_PTS and add a USE_PTS flag that can be 
specified with VIDEO_STOP. Not terribly elegant, though. A 
VIDEO_STOP_AT_PTS ioctl might be better.

3) Pause decoding
 
Use VIDEO_FREEZE. The cx23415 can keep showing the last frame or go to 
black. That can be implemented by VIDEO_SET_BLANK. However, I would 
suggest an addition to VIDEO_FREEZE: pass a PAUSE_TO_BLACK flag as 
argument, that puts this setting in the place where it belongs.

4) Resume decoding

Use VIDEO_CONTINUE.

5) Speed control.

The DVB API has two relevant ioctls: VIDEO_FAST_FORWARD and 
VIDEO_SLOWMOTION. Currently the argument of these ioctls is ignored in 
the av7110 implementation. The cx23415 can do fast forward and backward 
at 1.5 and 2x normal speed, and slow motion at various speeds. It can 
also single step forwards or backwards. Furthermore it can specify 
whether audio should be muted or not (only relevant for 1.5x normal 
speed).

My suggestion would be to follow the DVB_VIDEO_PLAY ioctl as proposed in 
the DVB V4 API document: the VIDEO_PLAY argument would be interpreted 
as follows:

   speed == 0 || speed == 1000: normal speed
   speed == 1: single step forward
   speed == -1: single step backward
   1 < speed < 1000: slow forward
   speed > 1000: fast forward
   speed == -1000: reverse play at normal speed
   -1000 < speed < -1: slow reverse
   speed < -1000: fast reverse.

This change implies that it is possible to call VIDEO_PLAY when already 
playing in order to change the speed/direction.

VIDEO_PLAY will map the speed to the closest speed setting possible. It 
will return an error if the requested functionality is not possible 
(e.g. if no reverse playback is supported, or if there is no single 
step).

Should VIDEO_GET_CAPABILITIES return which of the above speed 
combinations are possible? A method of retrieving the actual speed 
would also be nice. Unfortunately, struct video_status has no room for 
additional fields.

The audio mute could be implemented through AUDIO_MUTE. It has a similar 
problem as the STOP_TO_BLACK flag in that it really belongs to the 
VIDEO_PLAY ioctl as an atomic action.

6) Passthrough

The Passthrough feature of the cx23415 does the following: if the 
passthrough mode is started then the video/audio input from the MPEG 
encoder is routed straight to the video/audio output. This is done 
internally in the cx23415. While Passthrough is on, it is still 
possible to record from the input at the same time. It's basically live 
TV functionality.

For this the VIDEO_SELECT_SOURCE is actually a good choice, provided I 
can add VIDEO_SOURCE_ENCODER as new source to the video_stream_source_t 
enum. It think the current _DEMUX source has not quite the same 
meaning. I might be wrong on that, though.

7) Timing information on the displayed frame

Use VIDEO_GET_PTS. There is current no method of retrieving the SCR/PCR 
clock, though. But I don't think anyone is using that.

More problematic is that MythTV is using the frame counter (i.e. how 
many frames have been played back since the start of the stream). For 
that I would need a VIDEO_GET_FRAME_COUNT.

8) Wait for next frame to be displayed

Several applications need to know when a new frame is displayed. This 
usually triggers some On Screen Display update or something like that. 
This too is easy to implement using event. All that is needed is a new 
event VIDEO_EVENT_DECODER_VSYNC.

9) Audio mode selection

The cx23415 allows automatic selection of the audio mode (stereo, left, 
right, mono or swapped channels) for both a normal stereo capture and a 
bilingual capture.

The AUDIO_CHANNEL_SELECT ioctl comes close. If the 
audio_channel_select_t enum was extended with AUDIO_MONO and 
AUDIO_STEREO_SWAPPED and a AUDIO_BILINGUAL_CHANNEL_SELECT ioctl was 
added, then this would fully implement this feature.

An alternative approach is if AUDIO_CHANNEL_SELECT received a bitmask, 
e.g. the low 8 bits is the channel select for a stereo MPEG, and bits 
8-15 is the channel select for a bilingual MPEG.

10) Scaling and positioning of the video

The cx23415 can take the MPEG stream and scale it to an arbitrary width 
and height and position it at anywhere in the TV-out screen. So you can 
get effects like having the MPEG output to the top left corner and an 
OSD in the lower right corner.

With VIDIOC_S_FMT I can set the width and height, but there is no 
provision for an x and y coordinate. Can the struct v4l2_pix_format be 
expanded to include this? It would be the logical place for it. For 
most devices the x and y would always to 0, so I don't think it would 
be a problem.

This concludes this RFC. Comments are welcome!

Regards,

	Hans Verkuil



More information about the linux-dvb mailing list