libsoftmpeg/IDEAS - view

File: [DVB] / libsoftmpeg / IDEAS
Revision 1.2: download - view: text, annotated - select for diffs
Fri Feb 6 12:04:54 2004 UTC (20 years, 4 months ago) by hunold
Branches: MAIN
CVS tags: HEAD

- reformat docs
- add constant offset and a note that we need to fix it
- add fusionsound realtimepriority patch and a note to the docs

1) The problem with audio/video synchronization ----------------------------------------------- DVB consists of encoding/broadcasting and reception/decoding. Fortunately, encoding, broadcasting and reception does not bother use. One problem is, however, that the decoding must be syncronized with the encoding process. Because the data comes into the system with a defined rate, you should not consume the data too fast. In that case, you will get buffer underruns in your system. You shouldn't consume the datat too slow either, or your buffers will overflow. So you basically need to decode and display just with the speed the broadcaster has encoded the data. To achieve this, the broadcaster adds a program clock reference (PCR) to the stream. What basically happens is that the system time clock (STC) of the decoder is compared to the PCR on a regular basis. If it's running too fast or too slow, the main clock of the system is adjusted slightly. Addinionally, the decoder has full control of the audio/video decoding and audio/video display and all components use the same clock. On a x86 system, however, things are really bad on the first glance You have three different clocks: the system clock, the "clock" inside your sound card and the "clock" inside your gfx card that drives your tv out. You cannot rely on either of these clocks. Let's assume that you have video and audio prebuffered. Even if you tell your sound card that it should playback the data at a rate of 48kHz, it might play it out with 48010 or 47990Hz. This depends on the quartz on the sound card and there is no way you can find that out, not to speak of adjusting it. The same goes for video: let's assume that you want to display frames with a defined rate of 25 full-frames per second, ie. one frame each 40ms. If the video encoder operates with 25.01 frames per second, sooner or later you've skip a frame. You cannot rely on the system clock either, you might have noticed that you need to adjust it for multiple seconds after a few days. Additionally, you cannot adjust the quartz that drives the clock. All this does not really matter for "mplayer" and "xine" for example. They "only" need to play out the audio and sync the video frames accordingly. The can skip and double frames as they like and overcome the problem of unprecise audio and video output. The heavy rely on the fact that they can seek inside the stream, ie. if the stream is not properly interleaved, the simply use one file pointer to access the audio data and one file pointer to access the video pointer. For live DVB playback, however, this approach is not feasible. Of course you can simply prebuffer a few megabytes of data and then use "mplayer" to play that back, but prebuffering always results in high channel switching latencies far above one second. For live tv, this is unbearable. But even if you don't care, there is still the problem of buffer underruns and overflows. When the soundcard plays back the audio data too fast, the buffer will underflow sooner or later. Because the application cannot simply seek in the stream but needs to take what's coming off the air, the application will stop playback and prebuffer a few megabytes of data. Now imagine you're watching the showdown of some action movie and your tv freezes because data needs to be prebuffered. Annoying! "libsoftmpeg" currently takes the following approach for short term audio/video syncronization: Audio is taken as a master source and is simply played back with the original sampling rate (mostly 48kHz). Fluent audio is most important; you'll notice glitches and skips in audio more likely than a few skips in the video playback. Playback is started when a specified amount of audio data (currently 500ms) is available. In the meantime, the coded mpeg video frames are cached. For DVB, video is mostly transmitted a few hundred milliseconds in advance (usually 100-300ms). If we prebuffer 500ms of audio, we should be able to cache 800ms of video, ie. at least 20 compressed mpeg frames. Audio and video frames carry presentation time stamps (PTS), so if we know which byte of audio data is currently played back by the sound card, we can calculate which video frame matches best next. This is done once for initial audio/video sync. This is what I called "sync-to-audio-pts" earlier. After that, video frames are decoded "just in time" and displayed with the fixed rate of the video encoder, ie. 25fps for PAL video. ("free flowing video display"). Because of the different playback speeds, sooner or later video and audio will most likely drift apart. We can always calculate the PTS of the audio frame that's currently played back and look at the PTS of video frame that's going to be displayed. If we notice that video is playing too fast, we can double one field (20ms); if it's too slow, we can skip one field and gain 20ms. It's important to do this *not* too often, otherwise you will notice jerky video (look at the news banner on CNN for example). This idea works very well and gives a very good short term audio/video sync. 2) How to achieve long-term playback stability ---------------------------------------------- As I've already explained above, if we consume the audio data too fast or too slow, our audio buffer will overflow soonder or later. The first idea might be to simply do what a set-top-box does: have a look at the PCRs in the stream and use that data. One big problem, however, is that you don't have a chance to really compare the PCR to anything. In the moment the userspace application sees a transport stream packet with a PCR in it, the transport packet has already passed three buffers: the buffer used for dma transfer to the kernel memory, the ringbuffer used to provide the transport packets to user space and the buffer of the user application. If you now extract the PCR informations and compare it to your local system time ("gettimeofday()"), you get a huge jitter in that comparison. Even worse, you have a big burstiness inside these measurements. Let's assume your application uses a buffer of 1024 transport stream packets. If packet 1 and 1023 contain a PCR and your application processes these buffer in one chunk, then the gettimeofday()s will be closely together, although the PCRs might be several 10ms apart. You would neet to low pass filter the results heavily and event then is questionable if this will ever tell you that your system clock is 0.01% too fast. Even more worse, this does not tell you anything about the quartz on your sound card or one your video encoder. 8-( But this is not really a problem: in contrast to net streaming applications we don't need a PCR. The data rate with DVB is fixed. For PAL we *know* that audio is coming with 48kHz and video is coming with 25 frames per second. Because "libsoftmpeg" uses audio as a master sync source and video is already synced to the audio on a short-term basis, the basic idea is to use a "buffer fullness strategy" for the audio buffer. If we can achieve that the audio buffer is always half full, then the we'll never have problems with buffer under- or overflow. The (currently unimplemented) idea is to monitor the buffer fullness of the audio buffer. If we're in normal operation and the fullness drops below - say - 25% in x seconds, then we can calculate the rate the sound card is consuming data too fast. To overcome this problem, we can *slightly* adjust the pitch of the sound to come back to a fill grade of 50% perhaps in 5*x seconds. Of course this should be in the area of only a few Hertz, otherwise looking at an opera can be annoying, too. ;-)