Hi Klaus,
I've tested your patch on x86_64 and ARM (I picked the weakest HW I have). I've gathered some statistics:
Notes. A "good" tuner there means the one that returns bigger chunks on average and generally consumes less CPU cycles with a vanilla (not patched) VDR. CPU usage means the CPU usage of the TS buffer thread reported by top -d2. Bit rates are average and coarse. I didn't include unconditional delay dumps. They are "smooth" like those 8 and 15 Mbit/s ARM figures. CPU usage on x86_64 is typically 1.5-2 times less.
Some observations: 1. Small reads are really expensive and must be avoided. 2. "Good" tuners are not that "good" at higher bit rates. 3. An "non-delayed" read() (the last read() in the sequence poll()->read()->sleep()->poll()->read()->poll()->read()) is very likely to return a small chunk.
With bit rates higher that 30 Mbit/s I got device buffer overflows on ARM - ring buffer processing couldn't catch up.
So, generally, it solves the issue, but is not as efficient as with an unconditional delay. Compared to the unconditional patch: 1. CPU usage is higher and not as steady. 2. The read() sizes are not smooth too.
Those CPU usage percents (without the patch) may look small, but it's 20-35% of the whole VDR CPU usage and are several times less with the patch, conditional or not.
I understand, an unconditional delay here looks a bit scaring. How about increasing the threshold value to, say, 100000, or 500 * TS_SIZE or even higher? In other words, treat delayed read() as a normal operation, and not delayed one as an emergency case. I can test it on x86_64 with 50-60 Mbit/s. What do you think of it?
Best, glenvt18.
2016-03-10 12:41 GMT+03:00 Klaus Schmidinger Klaus.Schmidinger@tvdr.de:
On 10.03.2016 02:54, glenvt18 wrote:
Hi folks,
I've found that with some DVB tuner drivers poll() returns when there are only small (1-3) number of TS packets available to read(). This results in high CPU usage of the TS buffer thread which is busy reading small chunks of data in cTSBuffer::Action(). 2 out of 3 tuners I tested were affected with this issue. Even with a "good" tuner TS buffer thread CPU usage is up to 10% per HD stream on ARM Cortex-A7. With the proposed patch it is below 1-2% on all ARM and x86_64 platforms I've tested. The delay value of 10 ms can be considered safe:
media/dvb-core/dmxdev.h:109 #define DVR_BUFFER_SIZE (10*188*1024)
It will take a tuner to receive (10*188*1024)*8*(1000/10) / 1000000 = 1540 Mbit/s to overflow the device buffer within 10 ms interval. A smaller delay is not enough for ARM. cDvbDevice has a ring buffer of 5MB which is larger.
This patch was made against VDR 2.3.1, but it can be applied to VDR 2.2.0 as well.
Please review. glenvt18
Index: b/device.c
--- a/device.c 2015-09-18 01:04:12.000000000 +0300 +++ b/device.c 2016-03-10 03:38:50.078400715 +0300 @@ -1768,6 +1768,8 @@ break; } }
else
cCondWait::SleepMs(10); } } }
I'm not too fond of the idea of introducing an additional, unconditional wait here. The actual problem should be fixed within the drivers. However, maybe waiting in case there is only a small number of TS packets available is acceptable:
--- device.c 2015/09/05 11:42:17 4.2 +++ device.c 2016/03/10 09:34:11 @@ -1768,6 +1768,8 @@ break; } }
else if (r < MIN_TS_PACKETS_FOR_FRAME_DETECTOR * TS_SIZE)
cCondWait::SleepMs(10); } } }
The number MIN_TS_PACKETS_FOR_FRAME_DETECTOR * TS_SIZE is just a random pick to avoid using a concrete literal number here. Can you please test if this still solves your problem?
Klaus
vdr mailing list vdr@linuxtv.org http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr