Re: [vdr] mdadm software raid5 arrays?

List overview All Threads
Download

newer

older

Odd filesystem errors

livebuffer patch improvements...

Simon Baxter

13 Nov 2009 13 Nov '09

12:06 a.m.

Thanks Alex. I think I've decided to go RAID 1+0 rather than RAID 5 as I'm worried about the write speed.

I often record 3 or 4 channels at once and do see some "slow down" on OSD responsiveness during this.

What's your experience with RAID5?

----- Original Message ----- From: Alex Betis To: VDR Mailing List Sent: Friday, November 13, 2009 1:03 AM Subject: Re: [vdr] mdadm software raid5 arrays?

Simon,

Pay attention that /boot can be installed only on a single disk or RAID-1 where every disk can actually work as a stand alone disk.

I personally decided to use RAID-5 on 3 disks with RAID-1 on 3xsmall partitions for /boot and RAID-5 on the rest. RAID-5 also allows easier expansion in the future.

On Tue, Nov 10, 2009 at 8:48 PM, Simon Baxter linuxtv@nzbaxters.com wrote:

Thanks - very useful!

So what I'll probably do is as follows... * My system has 4x SATA ports on the motherboard, to which I'll connect my 4x 1.5TB drives. * Currently 1 drive is in use with ~30G for / /boot and swap and ~1.4TB for /media * I'll create /dev/md2, using mdadm, in RAID1 across 2 ~1.4TB partitions on 2 drives * move all active recordings (~400G) to /dev/md2 * split /dev/md2 and create a raid 1+0 (/dev/md1) using 4x partitions of ~1.4TB across 4 drives

At this point I have preserved all my data, and created a raid1+0 for recordings and media.

I should now use the remaining ~100G on each drive for raid protection for (root) / and /boot. I've read lots on the web on this, but what's your recommendation? RAID1 mirror across 2 of the disks for / (/dev/md0) and install grub (/boot) on both so either will boot?

On Tue, Nov 10, 2009 at 09:46:52PM +1300, Simon Baxter wrote:

What about a simple raid 1 mirror set?

Ok.. short comparison, using a single disk as baseline.

using 2 disks raid0: (striping) ++ double read throughput, ++ double write throughput, -- half the reliability (read: only use with good backup!)

raid1: (mirroring) ++ double read throughput. o same write throughput ++ double the reliability

using 3 disks:

raid0: striping +++ tripple read performance +++ tripple write performance --- third of reliability

raid1: mirroring +++ tripple read performance o same write throughput +++ tripple reliability

raid5: (distributed parity) +++ tripple read performance - lower write performance (not due to the second write but due to the necessary reads) + sustains failure of any one drive in the set

using 4 disks:

raid1+0: ++++ four times the read performance ++ double write performance ++ double reliability

please note: these are approximations and depending on your hardware they may be off by quite a bit.

cheers -henrik

Show replies by date

Alex Betis

13 Nov 13 Nov

7:59 a.m.

New subject: mdadm software raid5 arrays?

In general good experience. I don't record much, so I don't worry about speed. There are many web pages about raid5 speed optimizations. The slowdown in raid5 writes mostly happen when a part of a strip (chunk of data) has to be written, so the driver has to read the strip, and write it back. The optimizations talk about alignment of file system block size with raid strip size.

Since we're talking about movie recordings (huge files), then big file system blocks will not create much waste. Smaller strip size will probably reduce the read performance a bit, but will increase write speed since there will be less cases where not the whole strip has to be updated.

In one sentence, you won't know if it's slow until you'll try :) RAID 10 will obviously give better write speed, but I'm not yet convinced that raid 5 can't handle 4 recordings at the same time.

If we're talking about HD recording, it's about 3Gigs/hour, meaning less than MByte per second. Don't think there should be a problem to write 3-4 MByte/sec without any raid.

By the way, I had a very bad experience with LVM on top of raid in latest distros, so if you want to save some hairs on your head, don't try it :)

On Fri, Nov 13, 2009 at 1:06 AM, Simon Baxter linuxtv@nzbaxters.com wrote:

...

Thanks Alex. I think I've decided to go RAID 1+0 rather than RAID 5 as I'm worried about the write speed.

I often record 3 or 4 channels at once and do see some "slow down" on OSD responsiveness during this.

What's your experience with RAID5?

----- Original Message ----- From: Alex Betis To: VDR Mailing List Sent: Friday, November 13, 2009 1:03 AM Subject: Re: [vdr] mdadm software raid5 arrays?

Simon,

Pay attention that /boot can be installed only on a single disk or RAID-1 where every disk can actually work as a stand alone disk.

I personally decided to use RAID-5 on 3 disks with RAID-1 on 3xsmall partitions for /boot and RAID-5 on the rest. RAID-5 also allows easier expansion in the future.

On Tue, Nov 10, 2009 at 8:48 PM, Simon Baxter linuxtv@nzbaxters.com wrote:

Thanks - very useful!

So what I'll probably do is as follows...

My system has 4x SATA ports on the motherboard, to which I'll connect my

4x 1.5TB drives.

Currently 1 drive is in use with ~30G for / /boot and swap and ~1.4TB

for /media

I'll create /dev/md2, using mdadm, in RAID1 across 2 ~1.4TB partitions

on 2 drives

move all active recordings (~400G) to /dev/md2

split /dev/md2 and create a raid 1+0 (/dev/md1) using 4x partitions of

~1.4TB across 4 drives

At this point I have preserved all my data, and created a raid1+0 for recordings and media.

I should now use the remaining ~100G on each drive for raid protection for (root) / and /boot. I've read lots on the web on this, but what's your recommendation? RAID1 mirror across 2 of the disks for / (/dev/md0) and install grub (/boot) on both so either will boot?

On Tue, Nov 10, 2009 at 09:46:52PM +1300, Simon Baxter wrote:

What about a simple raid 1 mirror set?

Ok.. short comparison, using a single disk as baseline.

using 2 disks raid0: (striping) ++ double read throughput, ++ double write throughput, -- half the reliability (read: only use with good backup!)

raid1: (mirroring) ++ double read throughput. o same write throughput ++ double the reliability

using 3 disks:

raid0: striping +++ tripple read performance +++ tripple write performance --- third of reliability

raid1: mirroring +++ tripple read performance o same write throughput +++ tripple reliability

raid5: (distributed parity) +++ tripple read performance

lower write performance (not due to the second write but due to the necessary reads)

sustains failure of any one drive in the set

using 4 disks:

raid1+0: ++++ four times the read performance ++ double write performance ++ double reliability

please note: these are approximations and depending on your hardware they may be off by quite a bit.

cheers -henrik

vdr mailing list vdr@linuxtv.org http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr

Magnus Hörlin

8:50 a.m.

New subject: mdadm software raid5 arrays?

I too have only good experiences with md raid5 and have used it for years on my vdr server. Its not uncommon that I record 7-8 programs simultaneously, including HD channels, and I have never had any problems with that. On the other hand, this is just a VDR/NFS server serving two diskless VDR/XBMC frontends so maybe it would affect OSD performance if I had it on the same machine.

/Magnus H

_____

Från: vdr-bounces@linuxtv.org [mailto:vdr-bounces@linuxtv.org] För Alex Betis Skickat: den 13 november 2009 08:00 Till: VDR Mailing List Ämne: Re: [vdr] mdadm software raid5 arrays?

In one sentence, you won't know if it's slow until you'll try :) RAID 10 will obviously give better write speed, but I'm not yet convinced that raid 5 can't handle 4 recordings at the same time.

If we're talking about HD recording, it's about 3Gigs/hour, meaning less than MByte per second. Don't think there should be a problem to write 3-4 MByte/sec without any raid.

By the way, I had a very bad experience with LVM on top of raid in latest distros, so if you want to save some hairs on your head, don't try it :)

On Fri, Nov 13, 2009 at 1:06 AM, Simon Baxter linuxtv@nzbaxters.com wrote:

Thanks Alex. I think I've decided to go RAID 1+0 rather than RAID 5 as I'm worried about the write speed.

I often record 3 or 4 channels at once and do see some "slow down" on OSD responsiveness during this.

What's your experience with RAID5?

----- Original Message ----- From: Alex Betis To: VDR Mailing List Sent: Friday, November 13, 2009 1:03 AM Subject: Re: [vdr] mdadm software raid5 arrays?

Simon,

Pay attention that /boot can be installed only on a single disk or RAID-1 where every disk can actually work as a stand alone disk.

I personally decided to use RAID-5 on 3 disks with RAID-1 on 3xsmall partitions for /boot and RAID-5 on the rest. RAID-5 also allows easier expansion in the future.

On Tue, Nov 10, 2009 at 8:48 PM, Simon Baxter linuxtv@nzbaxters.com wrote:

Thanks - very useful!

At this point I have preserved all my data, and created a raid1+0 for recordings and media.

On Tue, Nov 10, 2009 at 09:46:52PM +1300, Simon Baxter wrote:

What about a simple raid 1 mirror set?

Ok.. short comparison, using a single disk as baseline.

using 2 disks raid0: (striping) ++ double read throughput, ++ double write throughput, -- half the reliability (read: only use with good backup!)

raid1: (mirroring) ++ double read throughput. o same write throughput ++ double the reliability

using 3 disks:

raid0: striping +++ tripple read performance +++ tripple write performance --- third of reliability

raid1: mirroring +++ tripple read performance o same write throughput +++ tripple reliability

raid5: (distributed parity) +++ tripple read performance - lower write performance (not due to the second write but due to the necessary reads) + sustains failure of any one drive in the set

using 4 disks:

raid1+0: ++++ four times the read performance ++ double write performance ++ double reliability

please note: these are approximations and depending on your hardware they may be off by quite a bit.

cheers -henrik

_______________________________________________ vdr mailing list vdr@linuxtv.org http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr

No virus found in this incoming message. Checked by AVG - www.avg.com Version: 8.5.425 / Virus Database: 270.14.58/2493 - Release Date: 11/12/09 07:38:00

Steve

17 Nov 17 Nov

4:34 p.m.

New subject: mdadm software raid5 arrays?

Alex Betis wrote:

...

I don't record much, so I don't worry about speed.

While there's no denying that RAID5 *at best* has a write speed equivalent to about 1.3x a single disk and if you're not careful with stride/block settings can be a lot slower, that's no worse for our purposes that, erm, having a single disk in the first place. And reading is *always* faster...

Example. I'm not bothered about write speed (only having 3 tuners) so I didn't get too carried away setting up my 3-active disk 3TB RAID5 array, accepting all the default values.

Rough speed test: #dd if=/dev/zero of=/srv/test/delete.me bs=1M count=1024 1073741824 bytes (1.1 GB) copied, 13.6778 s, 78.5 MB/s

#dd if=/srv/test/delete.me of=/dev/null bs=1M count=1024 1073741824 bytes (1.1 GB) copied, 1.65427 s, 649 MB/s

Don't know about anyone else's setup, but if I were to record all streams from all tuners, there would still be I/O bandwidth left. Highest DVB-T channel bandwidth possible appears to be 31.668Mb/s, so for my 3 tuners equates to about 95Mb/s - that's less than 12 MB/s. The 78MB/s of my RAID5 doesn't seem to be much of an issue then.

Steve

H. Langos

18 Nov 18 Nov

6:28 p.m.

New subject: mdadm software raid5 arrays?

Hi Alex,

On Tue, Nov 17, 2009 at 03:34:59PM +0000, Steve wrote:

...

Thanks for putting some numbers out there. My estimate was more theory driven. :-)

...

Depending on the amount of RAM, the cache can screw up your results quite badly. For something a little more realistic try:

sync; dd if=/dev/zero of=foo bs=1M count=1024 conv=fsync

The first sync writes out fs cache so that you start with a clean cache and the "conv=fsync" makes sure that "dd" doesn't finish until it has written its data back to disk.

After the write you need to make sure that your read cache is not still full of the data you just wrote. 650 MB/s would mean 223 MB/s per disk. That sounds a bit too high.

Try to read something different (and big) from that disk before running the second test.

...

Well, I guess DVB-S2 has higher bandwidth. (numbers anybody?) But more importantly: The rough speedtests that you used were under zero I/O load. I/O-load can have some nasty effects. E.g. if your heads have to jump back and forth between an area from where you are reading and an area to which you are recording. In the case of one read stream and several write streams in theory you could adjust the filesystem's allocation strategy so that available areas near your read region are used for writing (though I doubt that anybody ever implemented this strategy in a mainstream fs) but when you are reading several streams even caching, smart io schedulers, and NCQ can not completely mask the fact that in raid5 you basically have one set of read/write heads.

In a raid1 setup you have two sets of heads that you can work with. (Or more if you are willing to put in more disks.)

Basically raid5 and raid1+0 scale differently if you add more disks.

If you put in more disks into raid5 you gain * more capacity (each additional disk counts fully) and * more linear read performance.

If you put in more disks into raid1+0 it depends on where you put the additional disks to work. If you grow the _number of mirrors_ you get * more read performance (linear and random) * more redundancy If you grow the _number of stripes_ you get * more read and write performance (linear and random) * more capacity (but only half of the additonal for 2 disk mirror sets)

cheers -henrik

Steve

19 Nov 19 Nov

2:31 p.m.

New subject: mdadm software raid5 arrays?

H. Langos wrote:

...

Depending on the amount of RAM, the cache can screw up your results quite badly. For something a little more realistic try:

Good point!

...

sync; dd if=/dev/zero of=foo bs=1M count=1024 conv=fsync

Interestingly, not much difference:

# sync; dd if=/dev/zero of=/srv/test/delete.me bs=1M count=1024 conv=fsync 1073741824 bytes (1.1 GB) copied, 14.6112 s, 73.5 MB/s

Steve

Udo Richter

22 Nov 22 Nov

1:44 a.m.

New subject: mdadm software raid5 arrays?

On 18.11.2009 18:28, H. Langos wrote:

...

I remember reading some tests about file system write strategies that showed major differences between file systems when writing several file streams in parallel. IIRC the old EXT2/3 was way at the lower end, while XFS scored more at the upper end.

One major point here is to avoid heavy seeking, by massive use of write caching and read ahead caching. Another one is a smart allocation strategy so that the files don't get interleaved too much, and so that metadata doesn't have to be read/written too often. (-> extents)

...

In a raid1 setup you have two sets of heads that you can work with. (Or more if you are willing to put in more disks.)

In theory yes, but I would really like to know whether raid systems are actually smart enough to split their read operations between the heads in an efficient way. For example, while reading two data streams, its probably the best to use one head for each stream. Unless one of the streams needs a higher bandwidth, in which case it would be more wise to use one head exclusively, and let the other jump between the streams. And what if there are several small reads in parallel? Which head should be interrupted?

In the end you can probably put a lot of strategy fine tuning into this, and there will still be situations where a different strategy would still improve performance in some scenarios - or in others not.

Cheers,

Udo

Pasi Kärkkäinen

19 Nov 19 Nov

11:53 a.m.

New subject: mdadm software raid5 arrays?

On Tue, Nov 17, 2009 at 03:34:59PM +0000, Steve wrote:

...

Alex Betis wrote:

...
I don't record much, so I don't worry about speed.

While there's no denying that RAID5 *at best* has a write speed equivalent to about 1.3x a single disk and if you're not careful with stride/block settings can be a lot slower, that's no worse for our purposes that, erm, having a single disk in the first place. And reading is *always* faster...

Example. I'm not bothered about write speed (only having 3 tuners) so I didn't get too carried away setting up my 3-active disk 3TB RAID5 array, accepting all the default values.

Rough speed test: #dd if=/dev/zero of=/srv/test/delete.me bs=1M count=1024 1073741824 bytes (1.1 GB) copied, 13.6778 s, 78.5 MB/s

You should use oflag=direct to make it actually write the file to disk..

...

#dd if=/srv/test/delete.me of=/dev/null bs=1M count=1024 1073741824 bytes (1.1 GB) copied, 1.65427 s, 649 MB/s

And now most probably the file will come from linux kernel cache. Use iflag=direct to read it actually from the disk.

-- Pasi

...

Don't know about anyone else's setup, but if I were to record all streams from all tuners, there would still be I/O bandwidth left. Highest DVB-T channel bandwidth possible appears to be 31.668Mb/s, so for my 3 tuners equates to about 95Mb/s - that's less than 12 MB/s. The 78MB/s of my RAID5 doesn't seem to be much of an issue then.

Steve

vdr mailing list vdr@linuxtv.org http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr

Steve

2:37 p.m.

New subject: mdadm software raid5 arrays?

Pasi Kärkkäinen wrote:

...

However, in the real world data _is_ going to be cached via the kernel cache, at least (we hope) a stride's worth minimum. We're talking about recording video aren't we, and that's surely almost always sequentially written, not random seeks everywhere?

For completeness, the results are:

#dd if=/dev/zero of=/srv/test/delete.me bs=1M count=1024 oflag=direct 1073741824 bytes (1.1 GB) copied, 25.2477 s, 42.5 MB/s

# dd if=/srv/test/delete.me of=/dev/null bs=1M count=1024 iflag=direct 1073741824 bytes (1.1 GB) copied, 4.92771 s, 218 MB/s

So, still no issue with recording entire transponders; using 1/4 of the available raw bandwidth with no buffering.

Interesting stuff, this :)

Steve

H. Langos

4:01 p.m.

New subject: mdadm software raid5 arrays?

On Thu, Nov 19, 2009 at 01:37:46PM +0000, Steve wrote:

...

Pasi Kärkkäinen wrote:

...
You should use oflag=direct to make it actually write the file to disk.. And now most probably the file will come from linux kernel cache. Use iflag=direct to read it actually from the disk.

However, in the real world data _is_ going to be cached via the kernel cache, at least (we hope) a stride's worth minimum. We're talking about recording video aren't we, and that's surely almost always sequentially written, not random seeks everywhere?

True. Video is going to be written and read sequentially. However the effects of cache are always that of a short time gain. E.g. write caches mask a slow disk by signaling "ready" to the application while in reality the kernel is still holding the data in RAM. If you continue to write at a speed faster than the disk can handle, then cache will fill up and at some point in time your application's write requests will be slowed down to what the disk can handle.

If however your application writes to the same block again, before the cache has been written to disk, then your cache truely has gained you performance even in the long run, by avoiding writing data that already has been replaced.

Same thing with read caches. They only help if you are reading the same data again.

The effect that you _will_ see is that of reading ahead. That helps if your application reads one block, and then another and the kernel has already looked ahead and fetched more blocks than originally requested from the disk.

This also has the effect of avoiding too many seeks if you are reading from more than one place on the disk at once .. but again. The effect in regard to read throughput however fades away as you read large amounts of data only once.

What it boils down to is this:

Caches improve latency, not throughput.

What read-ahead and write-caches will do in this scenario, is to help you mask the effects of seeks on your disk by reading ahead and by aggregating write requests and sorting them in a way that reduces seek times. In this regard writing multiple streams is easier than reading. When writing stuff, you can let your kernel decide to keep some of the data 10 or 15 seconds in RAM before commiting it to disk. However if you are _reading_ you will be pretty miffed if your video stalls for 15 seconds because the kernel found something more interesting to read first :-)

...

For completeness, the results are:

#dd if=/dev/zero of=/srv/test/delete.me bs=1M count=1024 oflag=direct 1073741824 bytes (1.1 GB) copied, 25.2477 s, 42.5 MB/s

Interesting. The difference between this and the "oflag=fsync" is that in the later the kernel gets to sort all of the write requests more or less as its wants to. So I guess for recording video, the 73MB/s will be your bandwidth, while this test here shows the performance that a data integrity focused application like e.g. a database will get from your RAID.

...

# dd if=/srv/test/delete.me of=/dev/null bs=1M count=1024 iflag=direct 1073741824 bytes (1.1 GB) copied, 4.92771 s, 218 MB/s

So, still no issue with recording entire transponders; using 1/4 of the available raw bandwidth with no buffering.

Well, using 1/4 bandwidth by one client or shared by multiple clients can make all the difference.

How about making some tests with "cstream" ? I only did a quick apt-cache search but it seems like cstream could be used to simulate clients with various bandwidth needs and for measuring the bandwidth that is left.

...

Interesting stuff, this :)

Very interesting indeed. Thanks for enriching this discussion with real data!

cheers -henrik

5643

Age (days ago)

5653

Last active (days ago)

vdr@linuxtv.org

9 comments

7 participants

tags (0)

participants (7)

Alex Betis
H. Langos
Magnus Hörlin
Pasi Kärkkäinen
Simon Baxter
Steve
Udo Richter