Speeding up RAID resync

    http://www.cyberciti.biz/tips/linux-raid-increase-resync-rebuild-speed.html

    Current RAID settings:

    --------------------------------------------------------------------S
    root# sysctl -a | grep raid
    dev.raid.speed_limit_max = 200000
    dev.raid.speed_limit_min = 1000
    --------------------------------------------------------------------E

    The /proc/sys/dev/raid/speed_limit_max reflects the current "goal" rebuild
    speed for times when no non-rebuild activity is current on an array.
    My default is 200,000.

    The /proc/sys/dev/raid/speed_limit_min reflects the current "goal"
    rebuild speed for times when non-rebuild activity is current on an array.
    The speed is in Kbytes per second, and is a per-device rate, not a
    per-array rate.  My default is 1000 -- set it to 50000:

    --------------------------------------------------------------------S
    root# sysctl -w dev.raid.speed_limit_min=50000
    --------------------------------------------------------------------E

    If you want to override the default you could add this to /etc/sysctl.conf:

    --------------------------------------------------------------------S
    dev.raid.speed_limit_min = 50000
    --------------------------------------------------------------------E

    Someone has a similar drive setup (5x 2TB in a RAID5).  He tweaks the
    drive and volume like so:

    --------------------------------------------------------------------S
    #!/bin/bash

    blockdev --setra 16384 /dev/sd[abcdefg]

    echo 1024 > /sys/block/sda/queue/read_ahead_kb
    echo 1024 > /sys/block/sdb/queue/read_ahead_kb
    echo 1024 > /sys/block/sdc/queue/read_ahead_kb
    echo 1024 > /sys/block/sdd/queue/read_ahead_kb
    echo 1024 > /sys/block/sde/queue/read_ahead_kb

    echo 256 > /sys/block/sda/queue/nr_requests
    echo 256 > /sys/block/sdb/queue/nr_requests
    echo 256 > /sys/block/sdc/queue/nr_requests
    echo 256 > /sys/block/sdd/queue/nr_requests
    echo 256 > /sys/block/sde/queue/nr_requests

    # Set read-ahead.
    echo "Setting read-ahead to 64 MiB for /dev/md0"
    blockdev --setra 65536 /dev/md0

    # Set stripe-cache_size for RAID5.
    echo "Setting stripe_cache_size to 16 MiB for /dev/md0"
    echo 16384 > /sys/block/md0/md/stripe_cache_size
    echo 8192 > /sys/block/md0/md/stripe_cache_active

    # Disable NCQ on all disks.
    echo "Disabling NCQ on all disks..."
    echo 1 > /sys/block/sda/device/queue_depth
    echo 1 > /sys/block/sdb/device/queue_depth
    echo 1 > /sys/block/sdc/device/queue_depth
    echo 1 > /sys/block/sdd/device/queue_depth
    echo 1 > /sys/block/sde/device/queue_depth

    exit 0
    --------------------------------------------------------------------E

    Ran this version for now:

    --------------------------------------------------------------------S
    #!/bin/sh
    # raise some limits on RAID drives

    export PATH=/bin:/sbin:/usr/sbin:/usr/bin
    blockdev --setra 16384 /dev/sd[cdefghi]     # contradicts read_ahead_kb!

    for dsk in sdc sdd sde sdf sdg sdh sdi
    do
        echo 256  > /sys/block/$dsk/queue/nr_requests
        echo 1024 > /sys/block/$dsk/queue/read_ahead_kb
    done

    exit 0
    --------------------------------------------------------------------E

Set up partitions

    --------------------------------------------------------------------S
    root# cat bigfs
    #!/bin/ksh -x
    # make filesystems with fewer inodes, larger files.

    export PATH=/sbin:/bin:/usr/bin
    date; mkfs.ext3                           /dev/sdj1
    date; mkswap -L SWAP-sdj2                 /dev/sdj2
    date; mkfs.ext3 -J size=400 -i 65536 -m 2 /dev/sdj3; date
    exit 0


    root# ./bigfs
    + PATH=/sbin:/bin:/usr/bin
    + export PATH
    + date
    Wed Apr 13 23:57:13 EDT 2011
    + mkfs.ext3 /dev/sdj1
    mke2fs 1.39 (29-May-2006)
    Filesystem label=
    OS type: Linux
    Block size=4096 (log=2)
    Fragment size=4096 (log=2)
    2375680 inodes, 4751215 blocks
    237560 blocks (5.00%) reserved for the super user
    First data block=0
    Maximum filesystem blocks=4294967296
    145 block groups
    32768 blocks per group, 32768 fragments per group
    16384 inodes per group
    Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
        4096000

    Writing inode tables: done
    Creating journal (32768 blocks): done
    Writing superblocks and filesystem accounting information: done

    This filesystem will be automatically checked every 32 mounts or
    180 days, whichever comes first.  Use tune2fs -c or -i to override.
    + date
    Wed Apr 13 23:59:32 EDT 2011
    + mkswap -L SWAP-sdj2 /dev/sdj2
    Setting up swapspace version 1, size = 1028153 kB
    LABEL=SWAP-sdj2, no uuid
    + date
    Wed Apr 13 23:59:32 EDT 2011
    + mkfs.ext3 -J size=400 -i 65536 -m 2 /dev/sdj3
    mke2fs 1.39 (29-May-2006)
    Filesystem label=
    OS type: Linux
    Block size=4096 (log=2)
    Fragment size=4096 (log=2)
    22577152 inodes, 361205460 blocks
    7224109 blocks (2.00%) reserved for the super user
    First data block=0
    Maximum filesystem blocks=4294967296
    11024 block groups
    32768 blocks per group, 32768 fragments per group
    2048 inodes per group
    Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
        4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
        102400000, 214990848

    Writing inode tables: done
    Creating journal (102400 blocks): done
    Writing superblocks and filesystem accounting information: done

    This filesystem will be automatically checked every 30 mounts or
    180 days, whichever comes first.  Use tune2fs -c or -i to override.
    + date
    Thu Apr 14 00:10:27 EDT 2011
    + exit 0
    --------------------------------------------------------------------E

Duplicate partition layout over all RAID drives

    --------------------------------------------------------------------S
    root# sfdisk -d /dev/sdj
    # partition table of /dev/sdj
    unit: sectors

    /dev/sdj1 : start=       63, size= 38009727, Id=83, bootable
    /dev/sdj2 : start= 38009790, size=  2008125, Id=82
    /dev/sdj3 : start= 40017915, size=2889643680, Id=83
    /dev/sdj4 : start=        0, size=        0, Id= 0

    root# foreach x (c d e f g h i)
    foreach> sed -e "s/sdj/sd$x/" sdj > sd$x
    foreach> end

    root# cat sdc sde
    # partition table of /dev/sdc
    unit: sectors

    /dev/sdc1 : start=       63, size= 38009727, Id=83, bootable
    /dev/sdc2 : start= 38009790, size=  2008125, Id=82
    /dev/sdc3 : start= 40017915, size=2889643680, Id=83
    /dev/sdc4 : start=        0, size=        0, Id= 0
    # partition table of /dev/sde
    unit: sectors

    /dev/sde1 : start=       63, size= 38009727, Id=83, bootable
    /dev/sde2 : start= 38009790, size=  2008125, Id=82
    /dev/sde3 : start= 40017915, size=2889643680, Id=83
    /dev/sde4 : start=        0, size=        0, Id= 0
    --------------------------------------------------------------------E

    Resulting partitions look like this:

    --------------------------------------------------------------------S
    Filesystem           1M-blocks      Used Available Use% Mounted on
    /dev/sdj1                18269       173     17169   1% /mnt
    /dev/sdj3              1408111       470   1379422   1% /mnt
    --------------------------------------------------------------------E

    Used the new partition tables to set up all remaining RAID drives.
    Left out the warning "If you created or changed a DOS partition..."

    --------------------------------------------------------------------S
    root# foreach x (c d e f g h i)
    foreach> sfdisk /dev/sd$x < sd$x
    foreach> end
    Checking that no-one is using this disk right now ...
    OK

    Disk /dev/sdc: 182363 cylinders, 255 heads, 63 sectors/track
    Old situation:
    Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0

       Device Boot Start     End   #cyls    #blocks   Id  System
    /dev/sdc1   *      0+     16      17-    136521   83  Linux
    /dev/sdc2         17    2507    2491   20008957+  83  Linux
    /dev/sdc3       2508    2632     125    1004062+  82  Linux swap / Solaris
    /dev/sdc4       2633  182362  179730  1443681225   5  Extended
    /dev/sdc5       2633+   3629     997-   8008371   83  Linux
    /dev/sdc6       3630+ 182362  178733- 1435672791  fd  Linux raid autodetect
    New situation:
    Units = sectors of 512 bytes, counting from 0

       Device Boot    Start       End   #sectors  Id  System
    /dev/sdc1   *        63  38009789   38009727  83  Linux
    /dev/sdc2      38009790  40017914    2008125  82  Linux swap / Solaris
    /dev/sdc3      40017915 2929661594 2889643680  83  Linux
    /dev/sdc4             0         -          0   0  Empty
    Successfully wrote the new partition table

    Re-reading the partition table ...

    If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)
    to zero the first 512 bytes:  dd if=/dev/zero of=/dev/foo7 bs=512 count=1
    (See fdisk(8).)

    Checking that no-one is using this disk right now ...
    OK

    Disk /dev/sdd: 182363 cylinders, 255 heads, 63 sectors/track
    Old situation:
    Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0

       Device Boot Start     End   #cyls    #blocks   Id  System
    /dev/sdd1   *      0+     16      17-    136521   83  Linux
    /dev/sdd2         17    2507    2491   20008957+  83  Linux
    /dev/sdd3       2508    2632     125    1004062+  82  Linux swap / Solaris
    /dev/sdd4       2633  182362  179730  1443681225   5  Extended
    /dev/sdd5       2633+   3629     997-   8008371   83  Linux
    /dev/sdd6       3630+ 182362  178733- 1435672791  fd  Linux raid autodetect
    New situation:
    Units = sectors of 512 bytes, counting from 0

       Device Boot    Start       End   #sectors  Id  System
    /dev/sdd1   *        63  38009789   38009727  83  Linux
    /dev/sdd2      38009790  40017914    2008125  82  Linux swap / Solaris
    /dev/sdd3      40017915 2929661594 2889643680  83  Linux
    /dev/sdd4             0         -          0   0  Empty
    Successfully wrote the new partition table

    Re-reading the partition table ...

    Checking that no-one is using this disk right now ...
    OK

    Disk /dev/sde: 182363 cylinders, 255 heads, 63 sectors/track
    Old situation:
    Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0

       Device Boot Start     End   #cyls    #blocks   Id  System
    /dev/sde1   *      0+     16      17-    136521   83  Linux
    /dev/sde2         17    2507    2491   20008957+  83  Linux
    /dev/sde3       2508    2632     125    1004062+  82  Linux swap / Solaris
    /dev/sde4       2633  182362  179730  1443681225   5  Extended
    /dev/sde5       2633+   3629     997-   8008371   83  Linux
    /dev/sde6       3630+ 182362  178733- 1435672791  fd  Linux raid autodetect
    New situation:
    Units = sectors of 512 bytes, counting from 0

       Device Boot    Start       End   #sectors  Id  System
    /dev/sde1   *        63  38009789   38009727  83  Linux
    /dev/sde2      38009790  40017914    2008125  82  Linux swap / Solaris
    /dev/sde3      40017915 2929661594 2889643680  83  Linux
    /dev/sde4             0         -          0   0  Empty
    Successfully wrote the new partition table

    Re-reading the partition table ...

    Checking that no-one is using this disk right now ...
    OK

    Disk /dev/sdf: 182363 cylinders, 255 heads, 63 sectors/track
    Old situation:
    Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0

       Device Boot Start     End   #cyls    #blocks   Id  System
    /dev/sdf1   *      0+     16      17-    136521   83  Linux
    /dev/sdf2         17    2507    2491   20008957+  83  Linux
    /dev/sdf3       2508    2632     125    1004062+  82  Linux swap / Solaris
    /dev/sdf4       2633  182362  179730  1443681225   5  Extended
    /dev/sdf5       2633+   3629     997-   8008371   83  Linux
    /dev/sdf6       3630+ 182362  178733- 1435672791  fd  Linux raid autodetect
    New situation:
    Units = sectors of 512 bytes, counting from 0

       Device Boot    Start       End   #sectors  Id  System
    /dev/sdf1   *        63  38009789   38009727  83  Linux
    /dev/sdf2      38009790  40017914    2008125  82  Linux swap / Solaris
    /dev/sdf3      40017915 2929661594 2889643680  83  Linux
    /dev/sdf4             0         -          0   0  Empty
    Successfully wrote the new partition table

    Re-reading the partition table ...

    Checking that no-one is using this disk right now ...
    OK

    Disk /dev/sdg: 182363 cylinders, 255 heads, 63 sectors/track
    Old situation:
    Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0

       Device Boot Start     End   #cyls    #blocks   Id  System
    /dev/sdg1   *      0+     16      17-    136521   83  Linux
    /dev/sdg2         17    2507    2491   20008957+  83  Linux
    /dev/sdg3       2508    2632     125    1004062+  82  Linux swap / Solaris
    /dev/sdg4       2633  182362  179730  1443681225   5  Extended
    /dev/sdg5       2633+   3629     997-   8008371   83  Linux
    /dev/sdg6       3630+ 182362  178733- 1435672791  fd  Linux raid autodetect
    New situation:
    Units = sectors of 512 bytes, counting from 0

       Device Boot    Start       End   #sectors  Id  System
    /dev/sdg1   *        63  38009789   38009727  83  Linux
    /dev/sdg2      38009790  40017914    2008125  82  Linux swap / Solaris
    /dev/sdg3      40017915 2929661594 2889643680  83  Linux
    /dev/sdg4             0         -          0   0  Empty
    Successfully wrote the new partition table

    Re-reading the partition table ...

    Checking that no-one is using this disk right now ...
    OK

    Disk /dev/sdh: 182363 cylinders, 255 heads, 63 sectors/track
    Old situation:
    Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0

       Device Boot Start     End   #cyls    #blocks   Id  System
    /dev/sdh1   *      0+     16      17-    136521   83  Linux
    /dev/sdh2         17    2507    2491   20008957+  83  Linux
    /dev/sdh3       2508    2632     125    1004062+  82  Linux swap / Solaris
    /dev/sdh4       2633  182362  179730  1443681225   5  Extended
    /dev/sdh5       2633+   3629     997-   8008371   83  Linux
    /dev/sdh6       3630+ 182362  178733- 1435672791  fd  Linux raid autodetect
    New situation:
    Units = sectors of 512 bytes, counting from 0

       Device Boot    Start       End   #sectors  Id  System
    /dev/sdh1   *        63  38009789   38009727  83  Linux
    /dev/sdh2      38009790  40017914    2008125  82  Linux swap / Solaris
    /dev/sdh3      40017915 2929661594 2889643680  83  Linux
    /dev/sdh4             0         -          0   0  Empty
    Successfully wrote the new partition table

    Re-reading the partition table ...

    Checking that no-one is using this disk right now ...
    OK

    Disk /dev/sdi: 182363 cylinders, 255 heads, 63 sectors/track
    Old situation:
    Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0

       Device Boot Start     End   #cyls    #blocks   Id  System
    /dev/sdi1   *      0+     16      17-    136521   83  Linux
    /dev/sdi2         17    2507    2491   20008957+  83  Linux
    /dev/sdi3       2508    2632     125    1004062+  82  Linux swap / Solaris
    /dev/sdi4       2633  182362  179730  1443681225   5  Extended
    /dev/sdi5       2633+   3629     997-   8008371   83  Linux
    /dev/sdi6       3630+ 182362  178733- 1435672791  fd  Linux raid autodetect
    New situation:
    Units = sectors of 512 bytes, counting from 0

       Device Boot    Start       End   #sectors  Id  System
    /dev/sdi1   *        63  38009789   38009727  83  Linux
    /dev/sdi2      38009790  40017914    2008125  82  Linux swap / Solaris
    /dev/sdi3      40017915 2929661594 2889643680  83  Linux
    /dev/sdi4             0         -          0   0  Empty
    Successfully wrote the new partition table

    Re-reading the partition table ...
    --------------------------------------------------------------------E

    The kernel log for sdc looks like this.  Others are similar:

    --------------------------------------------------------------------S
    Apr 14 16:01:33 bk002 kernel: SCSI device sdc: 2929666048 512-byte
        hdwr sectors (1499989 MB)
    Apr 14 16:01:33 bk002 kernel: sdc: Write Protect is off
    Apr 14 16:01:33 bk002 kernel: sdc: Mode Sense: 23 00 00 00
    Apr 14 16:01:33 bk002 kernel: SCSI device sdc: drive cache: none
    Apr 14 16:01:33 bk002 kernel: sdc: sdc1 sdc2 sdc3 sdc4 < sdc5 sdc6 >
    Apr 14 16:01:36 bk002 kernel: SCSI device sdc: 2929666048 512-byte
        hdwr sectors (1499989 MB)
    Apr 14 16:01:36 bk002 kernel: sdc: Write Protect is off
    Apr 14 16:01:36 bk002 kernel: sdc: Mode Sense: 23 00 00 00
    Apr 14 16:01:36 bk002 kernel: SCSI device sdc: drive cache: none
    Apr 14 16:01:36 bk002 kernel:  sdc: sdc1 sdc2 sdc3
    --------------------------------------------------------------------E

Simple performance test

    --------------------------------------------------------------------S
    me% dd if=/dev/zero of=test.dat bs=1M count=1024 oflag=direct
    1024+0 records in
    1024+0 records out
    1073741824 bytes (1.1 GB) copied, 94.4423 s, 11.4 MB/s

    me% dd if=test.dat of=/dev/null bs=1M iflag=direct
    1024+0 records in
    1024+0 records out
    1073741824 bytes (1.1 GB) copied, 5.68675 s, 189 MB/s

    me% dd if=/dev/zero of=test.dat bs=4K count=256K oflag=direct
    262144+0 records in
    262144+0 records out
    1073741824 bytes (1.1 GB) copied, 2301.75 s, 466 kB/s

    me% dd if=test.dat of=/dev/null bs=4K iflag=direct
    262144+0 records in
    262144+0 records out
    1073741824 bytes (1.1 GB) copied, 35.7054 s, 30.1 MB/s
    --------------------------------------------------------------------E

    Compare to a plain partition:

    --------------------------------------------------------------------S
    me% dd if=/dev/zero of=test.dat bs=1M count=1024 oflag=direct
    1024+0 records in
    1024+0 records out
    1073741824 bytes (1.1 GB) copied, 173.661 s, 6.2 MB/s

    me% dd if=test.dat of=/dev/null bs=1M iflag=direct
    1024+0 records in
    1024+0 records out
    1073741824 bytes (1.1 GB) copied, 9.43734 s, 114 MB/s

    me% dd if=/dev/zero of=test.dat bs=4K count=256K oflag=direct
    262144+0 records in
    262144+0 records out
    1073741824 bytes (1.1 GB) copied, 2209.6 s, 486 kB/s

    me% dd if=test.dat of=/dev/null bs=4K iflag=direct
    262144+0 records in
    262144+0 records out
    1073741824 bytes (1.1 GB) copied, 33.5308 s, 32.0 MB/s
    --------------------------------------------------------------------E

    The kernlog directory holds kernel-log output during testing.  Apparently
    I'm tickling a bug:

    --------------------------------------------------------------------S
    kernel: md: bug in file drivers/md/md.c, line 1659
    --------------------------------------------------------------------E

    According to some similar Google hits, a counter is reaching zero
    and confusing the rest of the system.  It's not really a bug, but it
    still might be time to try a later version of mdadm.  Then try creating
    a filesystem using stride, stripe-width, and a larger journal.

Test script

    --------------------------------------------------------------------S
    #!/bin/ksh
    # time tests with small and large blocks

    export PATH=/usr/local/bin:/bin:/usr/bin:/sbin

    runs='
      bs=1M   count=1K
      bs=256K count=4K
      bs=64K  count=16K
      bs=16K  count=64K
      bs=4K   count=256K
    '

    echo "$runs" | while read bs count
    do
        test -z "$bs" && continue
        echo $bs $count
        dd if=/dev/zero of=test.dat $bs $count oflag=direct
        dd if=test.dat of=/dev/null $bs iflag=direct
        echo
    done

    exit 0
    --------------------------------------------------------------------E

    Try a smaller stripesize and higher commit time:

    --------------------------------------------------------------------S
    root# lvcreate --size 4G --name lvt3 --stripesize 128 --stripes 4 vg1
      Logical volume "lvt3" created
    root# mkfs.ext3 -v -i 65536 /dev/vg1/lvt3
    root# mount -o commit=60 /dev/vg1/lvt3 /lv3
    --------------------------------------------------------------------E

    Results:

    --------------------------------------------------------------------S
    bs=1M count=1K
    1024+0 records in
    1024+0 records out
    1073741824 bytes (1.1 GB) copied, 50.4096 s, 21.3 MB/s     [ 0.8 min ]
    1024+0 records in
    1024+0 records out
    1073741824 bytes (1.1 GB) copied, 4.47079 s, 240 MB/s

    bs=256K count=4K
    4096+0 records in
    4096+0 records out
    1073741824 bytes (1.1 GB) copied, 84.8414 s, 12.7 MB/s     [ 1.4 min ]
    4096+0 records in
    4096+0 records out
    1073741824 bytes (1.1 GB) copied, 5.69598 s, 189 MB/s

    bs=64K count=16K
    16384+0 records in
    16384+0 records out
    1073741824 bytes (1.1 GB) copied, 157.154 s, 6.8 MB/s      [ 2.6 min ]
    16384+0 records in
    16384+0 records out
    1073741824 bytes (1.1 GB) copied, 8.3078 s, 129 MB/s

    bs=16K count=64K
    65536+0 records in
    65536+0 records out
    1073741824 bytes (1.1 GB) copied, 807.51 s, 1.3 MB/s       [ 13.5 min ]
    65536+0 records in
    65536+0 records out
    1073741824 bytes (1.1 GB) copied, 15.2816 s, 70.3 MB/s
    --------------------------------------------------------------------E

    OK, we have a good stripesize: 64, the default when lv2 was created.
    Test copying over the network to any drive.  Then create two physical
    volumes instead of one; md1-2 for the first, md3-4 for the second.
    This has two advantages:

    * Drive failure in md1-2 does NOT affect the other physical volume,
      so I could do something like move logical volumes elsewhere until the
      drive was repaired.

    * The two sets of 4 drives can be treated as separate devices, so I should
      be able to run two sets of network copies at the same time as long as
      I aim at different volumes.