Ralf Müller wrote:
On Montag 21 November 2005 02:15, Artur Skawina wrote:
client, ie vdr-box, that does not behave well -- it sits almost completely idle for minutes (zero network traffic, no writeback at all), and then goes busy for a second or so.
But this very much sounds like a NFS-problem - and much less like a VDR problem ...
this is perfectly normal behavior; it's the same as for the local disk case. The problem is that since the vdr box isn't under any memory pressure it collects all the writes. If not for the fdatasyncs it would start writing the data asynchronously after some time, when it would need some RAM or had too many dirty pages. The problem is that vdr does not let it do that -- after 10M it asks the system to commit all the data to disk and return status. So the box does just that -- flushes the data as fast as possible in order to complete the synchronous request. This is were fadvise(WONTNEED) helps -- it tells the system that we're not going to access the written data any time soon, so it starts committing that buffered data back to disk immediately. Just as it would if it was under memory pressure, except now there is none; and once the data gets to disk it no longer needs to be treated as dirty and can be easily freed.
[...] I had hoped the extra fadvice every 10M would fix that, but i wanted to get the recording and replay cases right first. (the issue when cutting is simply that we need: a) start the writeback, and b) drop the cached data after it has hit the disk. The problem is that we don't really know when to do b...
Thats exactly the problem here ... without special force my kernel seems to prefer to use memory instead of disk ...
if you have told it to do exactly that, using that reiserfs setting mentioned below, well, i guess it tries to do it's best to obey :)
For low write rates the heuristic seems to work, for high rates it might fail. Yes, fdatasync obviously will work, but this is the sledgehammer approach :)
I know. I also don't like this approach. But at least it worked (here).
The fadvise(0,0) solution was a first try at using a slightly smaller hammer. Keeping a dirty-list and flushing it after some time would be the next step if fadvise isn't enough.)
How do you know what is still dirty in case of writes?
The strategy currently is this: after writing some data to the file (~1M) we use fadvice to make the kernel start writing it to disk; after some time we call fadvice on the same data _again_, now hopefully it has already hit the disk, is clean and will be dropped. (I actually call fadvice three, not two, times just to be sure). This seems to work fine for slow sequential writes, such as when recording; for cutting we create the dirty data faster than it can be written back to disk - this is where the global fadvise(WONTNEED) was supposed to help, and in the few cutting tests i did seemed to be enough.
How does the cache behave when _not_ cutting? Over here it looks ok, i've done several recordings while playing back others, and the cache was basically staying the same. (as this is not a dedicated vdr box it is however sometimes hard to be sure)
With the active read ahead I even have leaks when only reading - the initiated non-blocking reads of the WILL_NEED seem to keep pages in the buffer caches.
maybe another reiserfs issue? does it occur when sequentially reading, ie on normal playback? Or only when also seeking around in the file? In the latter case i was seeing some small leaks too, that was the reason for the fadvice calls every X jumps.
My initial intention when trying to use an active read ahead has been to have no hangs even when another disks needs to spin up. On my system I sometimes have this problem and it is annoying. So a read ahead of several megabytes would be needed here - but even without such a huge read ahead I get this annoying leaks here. For normal operation
hmm, the readahead is only per-file -- do you have filesystems spanning several disks, _some_ of which are spun down?
(replay) they could be avoided by increasing the region which has to be cleared to at least the size of the read ahead.
Isn't this exactly what is currently happening (both w/o and with my patch)?
The current vdr behavior isn't really acceptable -- at the very least the fsyncs have to be configurable -- even a few hundred megabytes needlessly dirtied by vdr is still much better than the bursts of traffic, disk and cpu usage. I personally don't mind the cache trashing so much; it would be enough to keep vdr happily running in the background without disturbing other tasks.
Depends on the use case. You are absolutely right in the NFS case. In the "dedicated to VDR standalone" case this is different. By throwing
A config option "Write strategy: NORMAL|STREAMING|BURST" would be enough for everyone :) (where STREAMING is what my patch does, at least here, BURST is with the fdatasyncs followed by fadvice(WONTNEED), and normal is w/o both)
away the inode cache it makes usage of big recording archives uncomfortable - it takes up to 20 seconds to scan my local recordings directory. Thats a long time when you just want to select a recording ...
It seemed much longer than 20s here :) Now that vdr caches the list, it's not a big problem anymore.
Are saying you don't get any writeback activity w/ my patch?
Correct. It starts writing back when memory is filled. Not a single second earlier.
With no posix_fadvice and no fdatasync calls in the write path i get almost no writeout with multi-megabyte bursts every minute (triggered probably by ext3 journal commit (interval set to 60s) and/or memory pressure).
Using reiserfs here. I remember having configured it for lazy disk operations ... maybe this is the source for the above results. The idea has been to collect system writes - to not spin up the disks if not absolutely necessary. But this obviously also results in collecting VDR writes ... anyway I think this is a valid case too. At least for dedicated "multimedia" stations ... A bit more control about VDR IO would be a great thing to have.
reiserfs collecting all writes would explain the behavior; whether it's a good thing or not in this scenario i'm not sure. Apparently this does not give you any way to force disk writes, other than a synchronous flush (ie fdatasync)?...
i don't think that you need any manual readahead at all in a normal-playback situation; especially as the kernel will by default do some. It's only when the disk is getting busier that the benefits of readahead show up. At least this is what i saw here)
Remember - you switched off read ahead: POSIX_FADV_RANDOM ;)
Just before posting v2 :) Most test were w/ POSIX_FADV_SEQUENTIAL, but as we do the readahead manually i decided to see if the kernel wasn't interfering too much. So far haven't seen much difference. What did not work was having a large unconditional readahead -- this fails spectacularly w/ fast-rewind.
Anyway - it seems the small read ahead in your patch doesn't had the sightest chance against the multi megabyte write back triggered when buffer cache was on its limits.
well, yes, the readahead is adjusted to the write rate :)
However, one thing that could make a large difference is hardware. I have two local ATA disks in the vdr machine, both seagates, one older 80G and a newer 40M (came w/ the machine, i was too lazy to pull it out so it stayed there) Both are alone on an IDE channel, both have 2M cache, both are AFAICT identically configured, both have ext3 fs. However the 40M disk is significantly slower, and the difference is huge -- you can easily tell when vdr starts using that disk, because the increase in latency for unrelated read requests is so large. OTOH the 80G disk seems not only way faster, but also much more fair to random read requests while writes are going on. Weird.
Regards,
artur