Hi,
Jon Burgess wrote:
I don't think that it is worth a try as it tests every byte while the above code tests most of the time only every third byte.
I agree that your algorithm is clever and does greatly cut down the number of comparisons as compared to the old code.
The glibc memchr() implementation does the comparisons 4 bytes at a time using a clever algorithm. It also has assembler optimised variants for some CPU's. I don't think that only doing a comparison of every 3rd byte wins you anything over memchr().
I believe the bulk of the time taken by the routine is transferring all the data from memory into the CPU. Every byte of the data will have to be read into the CPU caches due to cacheline effects. I believe that the asm optimisations will take into account the possibilities of speculative readahead etc. I've not looked into the assembler to see whether it actually exploits this.
I've atached the quickly hacked up test program that I wrote. The output is the time taken for many iterations of the 2 different algorithms. For me the difference is within the measurement noise. It certainly isn't any slower. I'd be interested to know whether it makes any difference on your EPIA, both in the test program and in VDR.
You were right. Using memchr() reduces CPU load on my 600 MHz EPIA System by 1 % for channel ZDF and by 4 % for the HDTV channel HDFORUM. The numbers were taken by just running VDR in transfer mode for the mentioned channel (= no xine attached to VDR).
I also gave memmem() a try but the CPU load was increased by this change.
Attached you'll find an updated patch according to your suggestion.
Bye.