Ralf Müller wrote:
Artur Skawina schrieb:
well, vdr w/ the recent cUnbufferedFile changes was flushing the data buffers in huge burst; this was even worse than slowly filling up the caches -- the large (IIRC ~10M) bursts caused latency problems (apps visibly freezing etc).
Does this freezing apply to local disk access or only to network filesystems. My personal VDR is a dedicated to VDR usage system which uses a local hard disk for storage. So I don't have applications parallel to vdr which can freeze nor I can actually test behaviour on network devices. Seems you have both of this extra features so it would be nice to know more about this.
the freezing certainly applies to NFS -- it shows clearly if you have some kind of monitor app graphing network traffic. It may just be the huge amount of data shifted and associated cpu load, but the delays are noticeable for non-rt apps running on the same machine. It's rather obvious when eg watching tv using xawtv while recording. As to the local disk case -- i'm not sure of the impact -- most of my vdr data goes over NFS, and this was what made me look at the code. There could be less of a problem w/ local disks, or I simply didn't realize the correlation w/ vdr activity as i, unlike network traffic, do not have a local IO graph on screen :)
(i _think_ i verified w/ vmstat that local disks were not immune to this, but right now i no longer remember the details, so can't really be sure)
For local usage I found that IO interruptions of less then a second (10 MB burst writes on disks which give a hell lot more then 10MB/sec) have no negative side effects. But I can imagine that on 10Mbit ethernet it could be hard to have these bursts ... I did not think about this when writing the initial patch ...
it's a problem even on 100mbit -- while the fileserver certainly can accept sustained 10M/s data for several seconds (at least), it's the client, ie vdr-box, that does not behave well -- it sits almost completely idle for minutes (zero network traffic, no writeback at all), and then goes busy for a second or so. I first tried various priority changes, but didn't see any visible improvement. Having vdr running at low prio isn't really an option anyway.
Another issue could be the fsync calls -- at least on ext3 these apparently behave very similar to sync(2)...
This patch makes vdr use a much more aggressive disk access strategy. Writes are flushed out almost immediately and the IO is more evenly distributed. While recording and/or replaying the caches do not grow and when vdr is done accessing a video file all cached data from that file is dropped.
Actually with the patch you attached my cache _does_ grow. It does not only grow - it displaces the inode cache, to avoid this the initial patch has been created. To make it worse - when cutting a recording and have the newly cut recording replayed at the same time I have major hangs in replay.
oh, the cutting-trashes-cache-a-bit isn't really such a big surprise -- i was seeing something like that while testing the code -- I had hoped the extra fadvice every 10M would fix that, but i wanted to get the recording and replay cases right first. (the issue when cutting is simply that we need: a) start the writeback, and b) drop the cached data after it has hit the disk. The problem is that we don't really know when to do b... For low write rates the heuristic seems to work, for high rates it might fail. Yes, fdatasync obviously will work, but this is the sledgehammer approach :) The fadvise(0,0) solution was a first try at using a slightly smaller hammer. Keeping a dirty-list and flushing it after some time would be the next step if fadvise isn't enough.)
How does the cache behave when _not_ cutting? Over here it looks ok, i've done several recordings while playing back others, and the cache was basically staying the same. (as this is not a dedicated vdr box it is however sometimes hard to be sure)
I had a look at your patch - it looked very well. But for whatever reason it doesn't do what it is supposed to do at my VDR. I currently don't know why it doesn't work here for replay - the code there looked good.
in v1 i was using a relatively small readahead window -- maybe for a slow disk it was _too_ small. In v2 it's a little bigger, maybe that will help (i increased it to make sure the readahead worked for fast-forward, but so far i haven't been able to see much difference). But I don't usually replay anything while cutting, so this hasn't really been tested...
(BTW, with the added readahead in the v2 patch, vdr seems to come close to saturating a 100M connection when cutting. Even when _both_ the source and destination are on the same NFSv3 mounted disk, which kind of surprised me. LocalDisk->NFS rate and v/v seems to be limited by the network. I didn't check localdisk->localdisk (lack of sufficient diskpace). Didn't do any real benchmarking, these are estimations based on observing the free diskspace decrease rate and network traffic)
I like the heuristics you used to deal with read ahead - but maybe these lead to the leaks I experience here. I will have a look at it. Maybe I can find out something about it ...
Please do, I did and posted this to get others to look at that code and hopefully come up w/ a strategy which works for everyone. For cutting I was going to switch to O_DIRECT, until i realized we then would still need a fallback strategy, for old kernels and NFS...
The current vdr behavior isn't really acceptable -- at the very least the fsyncs have to be configurable -- even a few hundred megabytes needlessly dirtied by vdr is still much better than the bursts of traffic, disk and cpu usage. I personally don't mind the cache trashing so much; it would be enough to keep vdr happily running in the background without disturbing other tasks. (one of the reasons is that while keeping the recording list in cache seems to help local disks, it doesn't really help for NFS -- you still get lots of NFS traffic every time vdr decides to reread the directory structure. As both the client and server could fit the dir tree in ram the limiting factor becomes the network latency)
I've tested this w/ both local disks and NFS mounted ones, and it seems to do the right thing. Writes get flushed every 1..2s at a rate of .5..1M/s instead of the >10M bursts.
To be honest - I did not found the place where writes get flushed in your patch. posix_fadvise() doesn't seem to influence flushing at all.
Hmm, what glibc/kernel? It works here w/ glibc-2.3.90 and linux-2.6.14.
Here's "vmstat 1" output; vdr (patched 1.3.36) is currently doing a recording to local disk:
procs -----------memory---------- ---swap-- -----io---- --system------cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 1 0 9168 202120 592 22540 0 0 0 0 3584 1350 0 1 99 0 0 0 9168 202492 592 22052 0 0 0 800 3596 1330 1 0 99 0 0 0 9168 202368 592 22356 0 0 0 0 3576 1342 1 0 99 0 0 0 9168 202492 592 21836 0 0 0 804 3628 1350 0 0 100 0 0 0 9168 202492 592 22144 0 0 0 0 3573 1346 1 1 98 0 0 0 9168 202244 592 22452 0 0 0 0 3629 1345 1 0 99 0 1 0 9168 202492 592 21956 0 0 0 800 3562 1350 0 0 100 0 0 0 9168 202368 592 22260 0 0 0 0 3619 1353 1 0 99 0 0 0 9168 202120 592 22568 0 0 0 0 3616 1357 1 1 98 0 0 0 9168 202492 592 22044 0 0 0 952 3617 1336 0 0 100 0 0 0 9168 202368 596 22352 0 0 0 0 3573 1356 1 0 99 0 1 0 9168 202616 596 21724 0 0 0 660 3609 1345 0 0 100 0 0 0 9168 202616 596 22000 0 0 0 0 3569 1338 1 1 98 0 0 0 9168 202368 596 22304 0 0 0 0 3573 1335 1 0 99 0 1 0 9168 202492 596 21956 0 0 0 896 3644 1360 0 1 99 0 0 0 9168 202492 596 22232 0 0 0 0 3592 1327 1 0 99 0 0 0 9168 202120 596 22536 0 0 0 0 3571 1333 0 0 100 0 0 0 9168 202616 596 21968 0 0 0 800 3575 1329 11 3 86 0 0 0 9168 202368 596 22244 0 0 0 0 3604 1350 1 0 99 0 0 0 9168 202492 596 21756 0 0 0 820 3585 1326 0 1 99 0 0 0 9168 202492 612 22060 0 0 8 140 3632 1369 1 1 89 9 0 0 9168 202244 612 22336 0 0 0 0 3578 1328 1 0 99 0 0 0 9168 202492 612 21796 0 0 0 784 3619 1360 0 0 100 0 0 0 9168 202492 628 22072 0 0 8 104 3559 1317 2 0 96 2 0 0 9168 202244 632 22376 0 0 0 0 3604 1348 1 0 99 0 0 0 9168 202492 632 21904 0 0 0 800 3695 1402 0 0 100 0 0 0 9168 202368 632 22180 0 0 0 0 3775 1456 1 1 98 0 0 0 9168 202120 632 22484 0 0 0 0 3699 1416 0 1 99 0 0 0 9168 202492 632 21992 0 0 0 804 3774 1465 1 0 99 0 1 0 9168 202236 632 22268 32 0 32 0 3810 1570 3 1 93 3 0 0 9168 202360 632 21776 0 0 0 820 3896 1690 1 1 98 0
the 'bo' column shows the writeout caused by vdr. Also note the 'free' and 'cache' field fluctuate a bit, but do not grow. Hmm, now i noticed the slowly growing 'buff' -- is this causing you problems? I didn't mind this here, as there's clearly plenty of free RAM around. Will have to investigate what happens under some memory pressure.
Are saying you don't get any writeback activity w/ my patch?
With no posix_fadvice and no fdatasync calls in the write path i get almost no writeout with multi-megabyte bursts every minute (triggered probably by ext3 journal commit (interval set to 60s) and/or memory pressure).
It only applies to already written buffers. So the normal write
/usr/src/linux/mm/fadvise.c should contain the implementation of the various fadvice modes in a linux 2.6 kernel. It certainly does trigger writeback here. Both in the local disk case, and on NFS, where it causes a similar traffic pattern.
strategie is used with your patch - collect data until the kernel decides to write it to disk. This leads to "collect about 300MB" here and have an up to 300MB burst then. This is a bit more heavy then the 10MB bursts before ;)
See vmstat output above. Are you sure you have a working posix_fadvise? If not, that would also explain the hang during playback as no readahead was actually taking place... (to be honest, i don't think that you need any manual readahead at all in a normal-playback situation; especially as the kernel will by default do some. It's only when the disk is getting busier that the benefits of readahead show up. At least this is what i saw here) What happens when you start a replay and then end it? is the memory freed immediately?
Thanks for testing and the feedback.
Regards,
artur