1. Introduction
  2. USB storage devices
  3. Testing the drive
  4. Configuring USB devices
  5. Enabling NTFS
  6. Using ports on FreeBSD
  7. Installing the FUSE/NTFS port
  8. Add an entry to /etc/fstab
  9. Copy the backup files
  10. Create some parity files
  11. Repairing a file: regular hard-drive
  12. Repairing a file: SSD
  13. Create multiple archive files
  14. Copy the full backup
  15. Copy incrementals
  16. Copy parity files
  17. Unmount the drive and clean up
  18. Scripts
  19. Feedback

1. Introduction

This is how I copied a full backup with parity files to a removable drive. The instructions are for a FreeBSD system, but other than the USB stuff, they should be (more or less) usable on Linux.

All the scripts I used are available at the end of the article.

Here's the removable drive I bought:

https://www.westerndigital.com/products/portable-drives/
  wd-easystore-portable-3-0-hdd

Capacity:  1 TB
Interface: USB 3.2 Gen 1
Connector: Micro B
S/N:       WDBAJN0010BBK-WESE

System Requirements
    * Formatted NTFS
    * Windows 10+
    * Reformatting required for macOS 11+.

Note: Compatibility may vary depending on user's hardware configuration
and operating system.

Dimensions (L x W x H): 4.33" x 3.21" x 0.5"

In The Box
    * Portable hard drive
    * SuperSpeed USB-A cable (5Gbps)
    * Software2 for device management and backup
    * Quick install guide

2. USB storage devices

Support for USB storage devices is built into the GENERIC kernel. For a custom kernel, be sure that the following lines are present in the kernel configuration file:

device scbus    # SCSI bus (required for ATA/SCSI)
device da  # Direct Access (disks)
device pass     # Passthrough device (direct ATA/SCSI access)
device uhci     # provides USB 1.x support
device ohci     # provides USB 1.x support
device ehci     # provides USB 2.0 support
device xhci     # provides USB 3.0 support
device usb # USB Bus (required)
device umass    # Disks/Mass storage - Requires scbus and da
device cd  # needed for CD and DVD burners

Local check:

me% uname -snvm
FreeBSD hairball FreeBSD 13.2-RELEASE-p4 GENERIC amd64

me% ls -l /usr/src/sys/amd64/conf
-rw-r--r-- 1 root wheel   412 06-Apr-2023 20:34:41 DEFAULTS
-rw-r--r-- 1 root wheel 15260 06-Apr-2023 20:34:41 GENERIC
-rw-r--r-- 1 root wheel    68 06-Apr-2023 20:34:41 GENERIC-KASAN
...
-rw-r--r-- 1 root wheel  5451 06-Apr-2023 20:34:41 MINIMAL
-rw-r--r-- 1 root wheel 19154 06-Apr-2023 20:34:41 NOTES

me% cat /tmp/devs
device  cd
device  da
device  ehci
device  ohci
device  pass
device  scbus
device  uhci
device  umass
device  usb
device  xhci

me% expand -1 /usr/src/sys/amd64/conf/GENERIC | grep -f /tmp/devs -
device  scbus  # SCSI bus (required for ATA/SCSI)
device  da     # Direct Access (disks)
device  cd     # CD
device  pass   # Passthrough device (direct ATA/SCSI access)
device  uhci   # UHCI PCI->USB interface
device  ohci   # OHCI PCI->USB interface
device  ehci   # EHCI PCI->USB interface (USB 2.0)
device  xhci   # XHCI PCI->USB interface (USB 3.0)
device  usb    # USB Bus (required)
device  umass  # Disks/Mass storage - Requires scbus and da

3. Testing the drive

To test the USB configuration, plug in the USB device. Use dmesg to confirm that the drive appears in the system message buffer:

[566112] usb_msc_auto_quirk: UQ_MSC_NO_TEST_UNIT_READY set for USB
    mass storage device Western Digital easystore 2648 (0x1058:0x2648)
[566113] usb_msc_auto_quirk: UQ_MSC_NO_PREVENT_ALLOW set for USB mass
    storage device Western Digital easystore 2648 (0x1058:0x2648)
[566113] ugen0.3: <Western Digital easystore 2648> at usbus0
[566113] umass1 on uhub1
[566113] umass1: <Western Digital easystore 2648, class 0/0, rev
    3.20/10.34, addr 2> on usbus0
[566113] umass1:  SCSI over Bulk-Only; quirks = 0x8001
[566113] umass1:7:1: Attached to scbus7
[566113] pass7 at umass-sim1 bus 1 scbus7 target 0 lun 0
[566113] pass7: <WD easystore 2648 1034> Fixed Direct Access SPC-4
    SCSI device
[566113] pass7: Serial Number 575833324137314556444435
[566113] pass7: 400.000MB/s transfers
[566113] da1 at umass-sim1 bus 1 scbus7 target 0 lun 0
da1: <WD easystore 2648 1034> Fixed Direct Access SPC-4 SCSI device
da1: Serial Number 575833324137314556444435
da1: 400.000MB/s transfers
da1: 953837MB (1953458176 512 byte sectors)
da1: quirks=0x2<NO_6_BYTE>
[566114] da1: Delete methods: <UNMAP(*),ZERO>
[566114] GEOM: new disk da1
[566114] pass8 at umass-sim1 bus 1 scbus7 target 0 lun 1
[566114] pass8: <WD SES Device 1034> Fixed Enclosure Services SPC-4
    SCSI device
[566114] pass8: Serial Number 575833324137314556444435
[566114] pass8: 400.000MB/s transfers
[566114] ses1 at umass-sim1 bus 1 scbus7 target 0 lun 1
ses1: <WD SES Device 1034> Fixed Enclosure Services SPC-4 SCSI device
ses1: Serial Number 575833324137314556444435
ses1: 400.000MB/s transfers
ses1: SES Device

Bracketed numbers are seconds since last reboot. I track reboots in /var/log/reboot:

me% cat -n /var/log/reboot
 1  freebsd-13.2-release 2023-06-27 01:57:48 -0400
 2  freebsd-13.2-release 2023-06-27 03:19:50 -0400
...
24  freebsd-13.2-release-p4 2024-06-01 04:33:56 -0400
25  freebsd-13.2-release-p4 2024-06-01 14:21:26 -0400

me% date -d '2024-06-01 14:21:26' '+%s'
1717266086

me% echo 1717266086 + 566112 | bc
1717832198

me% date -d @1717832198
Sat Jun  8 03:36:38 EDT 2024

It's just about right. System sees the device:

root# ls -lF /dev/da1*
crw-r----- 1 root operator 0, 195 08-Jun-2024 03:36:16 /dev/da1
crw-r----- 1 root operator 0, 196 08-Jun-2024 03:36:16 /dev/da1p1

4. Configuring USB devices

Since the USB device is seen as a SCSI one, camcontrol can be used to list the USB storage devices attached to the system:

root# camcontrol devlist
...
<WD easystore 2648 1034>     at scbus7 target 0 lun 0 (da1,pass7)
<WD SES Device 1034>         at scbus7 target 0 lun 1 (ses1,pass8)

Alternately, usbconfig can be used to list the device. Refer to usbconfig(8) for more information about this command.

root# usbconfig
ugen0.3: <Western Digital easystore 2648> at usbus0, cfg=0 md=HOST
    spd=SUPER (5.0Gbps) pwr=ON (224mA)

5. Enabling NTFS

NTFS is the most portable and reliable way I've seen for writing to a removable drive. FreeBSD has dropped native support for NTFS, so use the sysutils/fusefs-ntfs port.

Before I can use FUSE for anything, I have to make sure it's loaded. Previous versions of FreeBSD required "fuse.ko"; now it's "fusefs.ko":

root# kldstat
Id Refs Address                Size Name
 1   40 0xffffffff80200000  1f3e2d0 kernel
 2    1 0xffffffff8213f000     3728 coretemp.ko
 3    1 0xffffffff82143000   59e008 zfs.ko
 4    1 0xffffffff826e2000     2870 accf_data.ko
 5    1 0xffffffff826e5000     a4a0 cryptodev.ko
 6    1 0xffffffff82ce5000     3530 fdescfs.ko
 7    1 0xffffffff82ce9000     3060 mac_portacl.ko
 8    1 0xffffffff82ced000    31a80 linux.ko
 9    1 0xffffffff82d1f000     be88 linux_common.ko
10    1 0xffffffff82d2b000    14b98 netlink.ko
11    1 0xffffffff82d40000     3250 ichsmb.ko
12    1 0xffffffff82d44000     2180 smbus.ko
13    1 0xffffffff82d47000     2a08 mac_ntpd.ko

root# kldload fusefs

root# kldstat
Id Refs Address                Size Name
 1   42 0xffffffff80200000  1f3e2d0 kernel
 2    1 0xffffffff8213f000     3728 coretemp.ko
..
13    1 0xffffffff82d47000     2a08 mac_ntpd.ko
14    1 0xffffffff82d4a000    11cd8 fusefs.ko

If you need fusefs(5) during boot put this in /boot/loader.conf:

fusefs_load="YES"

If you don't need it at boot time, use this in /etc/rc.conf:

kld_list="fusefs"

6. Using ports on FreeBSD

I use a function called "bsdpath" to switch to a BSD-style build environment, because my default setup includes GNU make and that won't work:

bsdpath () {
    x="$HOME/.bsdpath"
    if test -f "$x"
    then
        echo "Setting BSD-build PATH"
        eval setenv $(head -1 "$x")
        export BSDPATH=1
    else
        echo "Sorry, $x not found"
    fi
    echo "PATH=$PATH"
}

Here's $HOME/.bsdpath -- it's a Bernstein-style configuration file:

PATH /sbin:/usr/sbin:/bin:/usr/bin:/usr/local/bin:/usr/local/sbin:/root/bin

Use this for compiling ports, etc. or anything requiring BSD make.

The "defpath" function moves me back to my default build environment:

defpath () {
    x="$HOME/.path"
    if test -f "$x"
    then
        echo "Setting default PATH"
        eval setenv $(head -1 "$x")
        unset BSDPATH
    else
        echo "Sorry, $x not found"
    fi
    echo "PATH=$PATH"
}

Here's $HOME/.path:

PATH /root/bin:/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/bin:/usr/bin

7. Installing the FUSE/NTFS port

Set up the build environment and run make:

root# bsdpath

root# echo $PATH
/sbin:/usr/sbin:/bin:/usr/bin:/usr/local/bin:/usr/local/sbin:/root/bin

root# cd /usr/ports/sysutils/fusefs-ntfs

root# make

You can see the full make output here

root# make package
==>  Building package for fusefs-ntfs-2022.10.3

root# ls -l work/pkg
total 1
-rw-r--r--  1 root  wheel  556432 Jun  8 04:07 fusefs-ntfs-2022.10.3.pkg

root# mv work/pkg/fusefs-ntfs-2022.10.3.pkg /usr/packages/local

root# make install
==>  Installing for fusefs-ntfs-2022.10.3
==>  Checking if fusefs-ntfs is already installed
==>   Registering installation for fusefs-ntfs-2022.10.3
Installing fusefs-ntfs-2022.10.3...

Jun  8 04:07:43 hairball pkg-static[68428]: fusefs-ntfs-2022.10.3 installed
NTFS-3G has been installed.  It requires fusefs(5) support to operate,
so issue the ``kldload fusefs'' command or ``sysrc kld_list+=fusefs''
to make it load automatically when the system starts.

For further information, implementation details, and known issues see
the FreeBSD README (/usr/local/share/doc/ntfs-3g/README.FreeBSD) in
addition to the official README (contains some Linux-specific parts).

root# make clean
==>  Cleaning for libublio-20070103_3
==>  Cleaning for fusefs-libs-2.9.9_2
==>  Cleaning for fusefs-ntfs-2022.10.3

root# defpath

8. Add an entry to /etc/fstab

To mount a USB device, any /etc/fstab entry must use ntfs-3g:

/dev/da1p1 /mnt ntfs mountprog=/usr/local/bin/ntfs-3g,noauto,late,rw 0 0

After plugging the drive in, I tried mounting:

root# mkdir -p /media/usb

root# ls -lF /media
drwxr-xr-x 2 root wheel 2 08-Jun-2024 04:27:09 usb/

root# ntfs-3g /dev/da1p1 /media/usb

root# df /media/usb
Filesystem 1M-blocks   Used  Avail Capacity  Mounted on
/dev/da1p1    953834    115 953719     0%    /media/usb

root# mount|grep usb
/dev/da1p1 on /media/usb (fusefs)

root# cd /media/usb
root# ls
'Install Western Digital Software for Mac.dmg'
'Install Western Digital Software for Windows.exe'*

ALWAYS unmount the filesystem, or you're damned likely to lose anything copied to that drive!

9. Copy the backup files

Next, do a full backup to that drive. I want belt and suspenders; copy regular tarballs to the drive, and add PAR2 parity files to protect against corruption.

I installed par2cmdline from FreeBSD ports -- you can probably find a package for your system or just build from source:

root# cd /usr/ports/archivers/par2cmdline
root# make
root# make install
root# make clean

My backups are compressed tarballs under /archive/tmp:

-rw-r--r--  1 vogelke mis 26325997473 09-May-2024 17:15:22 doc.tgz
-rw-r--r--  1 vogelke mis 34948146727 09-May-2024 17:44:15 home.tgz
-rw-r--r--  1 vogelke mis  4374343284 08-May-2024 10:46:18 root.tgz
-rw-r--r--  1 vogelke sys 10154228294 13-May-2024 03:55:35 search.tgz
-rw-r--r--  1 vogelke mis 22855671471 11-May-2024 05:45:44 src1.tgz
-rw-r--r--  1 vogelke mis 18746714446 11-May-2024 06:08:28 src2.tgz
-rw-r--r--  1 vogelke mis 15312047928 11-May-2024 06:20:16 src3.tgz
-rw-r--r--  1 vogelke mis     1657280 11-May-2024 05:32:33 src4.tgz
-rw-r--r--  1 vogelke mis  8043896411 09-May-2024 17:58:45 usr-local.tgz
-rw-r--r--  1 vogelke mis  5308775187 09-May-2024 18:05:15 usr-ports.tgz
-rw-r--r--  1 vogelke mis   260641384 09-May-2024 18:05:34 var-locate.tgz
-rw-r--r--  1 vogelke mis   730648893 09-May-2024 19:03:25 var-log.tgz

I couldn't store a single tarball for my source directory -- not enough room, so I split it up. I used a script similar to this for my other datasets:

me% cat bkup
cd /backup/full/2022/1222 || exit 1
date; mkdir -p doc
date; ( cd doc && tar xzf /archive/tmp/doc.tgz )
date; mkdir -p home
date; ( cd home && tar xzf /archive/tmp/home.tgz )
date; mkdir -p search
date; ( cd search && tar xzf /archive/tmp/search.tgz )
date; mkdir -p src
date; ( cd src && tar xzf /archive/tmp/src.tgz )
date; mkdir -p usr/local
date; ( cd usr/local && tar xzf /archive/tmp/usr-local.tgz )
date; mkdir -p var/locate
date; ( cd var/locate && tar xzf /archive/tmp/var-locate.tgz )
date; mkdir -p var/log
date; ( cd var/log && tar xzf /archive/tmp/var-log.tgz )
date; exit 0

A directory called PERMS holds files storing the directory permissions. Those are the ones that most often need fixing after restoring from any type of archive.

See this for how it's done.

10. Create some parity files

I moved the tarballs into their own directories so I could create parity files for each one.

/archive/tmp
    +--PERMS
    |   +--doc
    |   +--home
    |   +--root
    |   +--search
    |   +--src
    |   +--usr-local
    |   +--usr-ports
    |   +--var-locate
    |   +--var-log
    +--doc
    |   +--doc.tgz
    +--home
    |   +--home.tgz
    +--root
    |   +--root.tgz
    +--search
    |   +--search.tgz
    +--src
    |   +--src1.tgz
    |   +--src2.tgz
    |   +--src3.tgz
    |   +--src4.tgz
    +--usr-local
    |   +--usr-local.tgz
    +--usr-ports
    |   +--usr-ports.tgz
    +--var-locate
    |   +--var-locate.tgz
    +--var-log
    |   +--var-log.tgz

I want 30% redundancy for each tarball. Try a small file first.

root# mkdir /archive/work
root# chmod 1777 /archive/work
me% cp /archive/tmp/usr-local/usr-local.tgz /archive/work/tst.tgz

me% cd /archive/work
me% ls -l
-rw-r--r-- 1 vogelke wheel 8043896411 09-Jun-2024 03:54:20 tst.tgz

Ran the first par2create test in the same directory to get an idea of elapsed time. I'll compare using a 1-Tb SSD.

me% date; par2create -r30 tst.tgz; date
Sun Jun  9 03:55:50 EDT 2024

Block size: 4021952
Source file count: 1
Source block count: 2000
Recovery block count: 600
Recovery file count: 10

Opening: tst.tgz
Computing Reed Solomon matrix.
Constructing: done.
Wrote 134217600 bytes to disk
Wrote 134217600 bytes to disk
Wrote 134217600 bytes to disk
Wrote 134217600 bytes to disk
Wrote 134217600 bytes to disk
Wrote 134217600 bytes to disk
Wrote 134217600 bytes to disk
Wrote 134217600 bytes to disk
Wrote 134217600 bytes to disk
Wrote 134217600 bytes to disk
Wrote 134217600 bytes to disk
Wrote 134217600 bytes to disk
Wrote 134217600 bytes to disk
Wrote 134217600 bytes to disk
Wrote 134217600 bytes to disk
Wrote 134217600 bytes to disk
Wrote 134217600 bytes to disk
Wrote 131472000 bytes to disk
Writing recovery packets
Writing verification packets
Done
Sun Jun  9 04:07:15 EDT 2024

Files created:

 8043896411 09-Jun-2024 03:54:20 tst.tgz
      40404 09-Jun-2024 04:07:15 tst.tgz.par2
    4062424 09-Jun-2024 04:07:15 tst.tgz.vol000+001.par2
    8124744 09-Jun-2024 04:07:15 tst.tgz.vol001+002.par2
   16209084 09-Jun-2024 04:07:15 tst.tgz.vol003+004.par2
   32337464 09-Jun-2024 04:07:15 tst.tgz.vol007+008.par2
   64553924 09-Jun-2024 04:07:15 tst.tgz.vol015+016.par2
  128946544 09-Jun-2024 04:07:15 tst.tgz.vol031+032.par2
  257691484 09-Jun-2024 04:07:15 tst.tgz.vol063+064.par2
  515141064 09-Jun-2024 04:07:15 tst.tgz.vol127+128.par2
 1029999924 09-Jun-2024 04:07:15 tst.tgz.vol255+256.par2
  358241984 09-Jun-2024 04:07:15 tst.tgz.vol511+089.par2

11. Repairing a file: regular hard-drive

Now corrupt the file and check it:

me% md5sum tst.tgz
3248ba7cd057ae3a26380b7885ea8b72  tst.tgz

me% dd if=/dev/zero seek=1000   bs=1 count=10000 of=tst.tgz conv=notrunc
10000+0 records in
10000+0 records out
10000 bytes (10 kB, 9.8 KiB) copied, 0.0258182 s, 387 kB/s

me% dd if=/dev/zero seek=10000  bs=1 count=10000 of=tst.tgz conv=notrunc
10000+0 records in
10000+0 records out
10000 bytes (10 kB, 9.8 KiB) copied, 0.0256597 s, 390 kB/s

me% dd if=/dev/zero seek=100000 bs=1 count=10000 of=tst.tgz conv=notrunc
10000+0 records in
10000+0 records out
10000 bytes (10 kB, 9.8 KiB) copied, 0.0255272 s, 392 kB/s

me% md5sum tst.tgz
a171d62aa0e6568f8e30d5116944d133  tst.tgz

See if it can be fixed.

me% par2verify tst.tgz.par2
Loading "tst.tgz.par2".
Loaded 4 new packets
Loading "tst.tgz.vol127+128.par2".
Loaded 128 new packets including 128 recovery blocks
Loading "tst.tgz.vol000+001.par2".
Loaded 1 new packets including 1 recovery blocks
Loading "tst.tgz.vol015+016.par2".
Loaded 16 new packets including 16 recovery blocks
Loading "tst.tgz.vol007+008.par2".
Loaded 8 new packets including 8 recovery blocks
Loading "tst.tgz.vol063+064.par2".
Loaded 64 new packets including 64 recovery blocks
Loading "tst.tgz.vol003+004.par2".
Loaded 4 new packets including 4 recovery blocks
Loading "tst.tgz.vol511+089.par2".
Loaded 89 new packets including 89 recovery blocks
Loading "tst.tgz.vol031+032.par2".
Loaded 32 new packets including 32 recovery blocks
Loading "tst.tgz.vol001+002.par2".
Loaded 2 new packets including 2 recovery blocks
Loading "tst.tgz.vol255+256.par2".
Loaded 256 new packets including 256 recovery blocks
Loading "tst.tgz.par2".
No new packets found

There are 1 recoverable files and 0 other files.
The block size used was 4021952 bytes.
There are a total of 2000 data blocks.
The total size of the data files is 8043896411 bytes.

Verifying source files:

Opening: "tst.tgz"
Target: "tst.tgz" - damaged. Found 1999 of 2000 data blocks.

Scanning extra files:

Repair is required.
1 file(s) exist but are damaged.
You have 1999 out of 2000 data blocks available.
You have 600 recovery blocks available.
Repair is possible.
You have an excess of 599 recovery blocks.
1 recovery blocks will be used to repair.

I have a ton of spare repair capacity, which is good. Try to repair the archive:

me% date; par2repair tst.tgz.par2; date
Sun Jun  9 04:18:33 EDT 2024
Loading "tst.tgz.par2".
Loaded 4 new packets
Loading "tst.tgz.vol127+128.par2".
Loaded 128 new packets including 128 recovery blocks
Loading "tst.tgz.vol000+001.par2".
Loaded 1 new packets including 1 recovery blocks
Loading "tst.tgz.vol015+016.par2".
Loaded 16 new packets including 16 recovery blocks
Loading "tst.tgz.vol007+008.par2".
Loaded 8 new packets including 8 recovery blocks
Loading "tst.tgz.vol063+064.par2".
Loaded 64 new packets including 64 recovery blocks
Loading "tst.tgz.vol003+004.par2".
Loaded 4 new packets including 4 recovery blocks
Loading "tst.tgz.vol511+089.par2".
Loaded 89 new packets including 89 recovery blocks
Loading "tst.tgz.vol031+032.par2".
Loaded 32 new packets including 32 recovery blocks
Loading "tst.tgz.vol001+002.par2".
Loaded 2 new packets including 2 recovery blocks
Loading "tst.tgz.vol255+256.par2".
Loaded 256 new packets including 256 recovery blocks
Loading "tst.tgz.par2".
No new packets found

There are 1 recoverable files and 0 other files.
The block size used was 4021952 bytes.
There are a total of 2000 data blocks.
The total size of the data files is 8043896411 bytes.

Verifying source files:

Opening: "tst.tgz"
Target: "tst.tgz" - damaged. Found 1999 of 2000 data blocks.

Scanning extra files:

Repair is required.
1 file(s) exist but are damaged.
You have 1999 out of 2000 data blocks available.
You have 600 recovery blocks available.
Repair is possible.
You have an excess of 599 recovery blocks.
1 recovery blocks will be used to repair.

Computing Reed Solomon matrix.
Constructing: done.
Solving: done.

Wrote 8043896411 bytes to disk

Verifying repaired files:

Opening: "tst.tgz"
Target: "tst.tgz" - found.

Repair complete.
Sun Jun  9 04:21:43 EDT 2024

3 min to repair 8 Gb on spinning rust isn't bad.

12. Repairing a file: SSD

Mess up the file more -- since I have about 30% redundancy, copy random crap over 1Gb of the file and see what happens.

me% cd /tmp
me% dd if=/dev/urandom of=junk bs=1M count=30
me% cat junk junk > 2 ; mv 2 junk
me% cat junk junk > 2 ; mv 2 junk

me% ls -l junk
-rw-r--r-- 1 vogelke wheel 125829120 09-Jun-2024 04:30:54 /tmp/junk

Trying to create repair files in one directory for data files in another directory is too much trouble -- I kept getting messages asking me to specify a repair file or list of data files. Easier to just copy them to /var/tmp/work.

Looks like this is more CPU-bound than IO-bound, which is fine. Test script:

me% cat doit
date; par2create -r30 tst.tgz; date

dd if=/tmp/junk seek=500  bs=1M of=tst.tgz conv=notrunc
dd if=/tmp/junk seek=1000 bs=1M of=tst.tgz conv=notrunc
dd if=/tmp/junk seek=1500 bs=1M of=tst.tgz conv=notrunc
dd if=/tmp/junk seek=2000 bs=1M of=tst.tgz conv=notrunc

date; par2verify tst.tgz.par2; date
date; par2repair tst.tgz.par2; date

Abbreviated results from running the script:

me% sh -x ./doit
+ date
Sun Jun  9 06:02:17 EDT 2024
+ par2create -r30 tst.tgz

Block size: 4021952
Source file count: 1
Source block count: 2000
Recovery block count: 600
Recovery file count: 10

Opening: tst.tgz
Computing Reed Solomon matrix.
Constructing: done.
Wrote 134217600 bytes to disk
[...]
Wrote 131472000 bytes to disk
Writing recovery packets
Writing verification packets
Done
+ date
Sun Jun  9 06:12:27 EDT 2024

+ dd 'if=/tmp/junk' 'seek=500' 'bs=1M' 'of=tst.tgz' 'conv=notrunc'
120+0 records in
120+0 records out
125829120 bytes (126 MB, 120 MiB) copied, 0.0285935 s, 4.4 GB/s
[...]
+ date
Sun Jun  9 06:12:27 EDT 2024

+ par2verify tst.tgz.par2
Loading "tst.tgz.par2".
Loaded 4 new packets
Loading "tst.tgz.vol063+064.par2".
Loaded 64 new packets including 64 recovery blocks
[...]

There are 1 recoverable files and 0 other files.
The block size used was 4021952 bytes.
There are a total of 2000 data blocks.
The total size of the data files is 8043896411 bytes.

Verifying source files:

Opening: "tst.tgz"
Target: "tst.tgz" - damaged. Found 1872 of 2000 data blocks.

Scanning extra files:

Repair is required.
1 file(s) exist but are damaged.
You have 1872 out of 2000 data blocks available.
You have 600 recovery blocks available.
Repair is possible.
You have an excess of 472 recovery blocks.
128 recovery blocks will be used to repair.
+ date
Sun Jun  9 06:13:33 EDT 2024

+ par2repair tst.tgz.par2
Loading "tst.tgz.par2".
Loaded 4 new packets
Loading "tst.tgz.vol063+064.par2".
Loaded 64 new packets including 64 recovery blocks
[...]

There are 1 recoverable files and 0 other files.
The block size used was 4021952 bytes.
There are a total of 2000 data blocks.
The total size of the data files is 8043896411 bytes.

Verifying source files:

Opening: "tst.tgz"
Target: "tst.tgz" - damaged. Found 1872 of 2000 data blocks.

Scanning extra files:
Repair is required.
1 file(s) exist but are damaged.
You have 1872 out of 2000 data blocks available.
You have 600 recovery blocks available.
Repair is possible.
You have an excess of 472 recovery blocks.
128 recovery blocks will be used to repair.

Computing Reed Solomon matrix.
Constructing: done.
Solving: done.
[...]
Repair complete.
+ date
Sun Jun  9 06:18:27 EDT 2024

Here's the complete script output.

30% seems to be overkill. If space is a problem, use 20% instead. The overhead is about 30%, which makes sense -- no free lunch.

13. Create multiple archive files

I wrote a script called doparity to handle multiple archive files. It writes short lines to syslog and full output under /var/log/usb.

Jun 10 01:37:02 doparity[9273]: start
Jun 10 01:37:02 doparity[10124]: par2create -r30 var-locate.tgz
Jun 10 01:37:24 doparity[12120]: par2verify var-locate.tgz.par2
Jun 10 01:37:26 doparity[12765]: par2create -r30 var-log.tgz
Jun 10 01:38:20 doparity[14475]: par2verify var-log.tgz.par2
Jun 10 01:38:25 doparity[15129]: done

There are carriage-returns in the output which makes reading difficult, and using "-q" removes any useful output, so I clean it up with a sed script. Here are the results.

Running this on a big archive (26G) gives my system a workout:

me% w
 2:45AM  up 8 days, 12:24, 1 user, load averages: 3.07, 1.53, 0.83

me% top
...
  PID USERNAME   THR PRI NICE   SIZE    RES STATE       WCPU COMMAND
32836 vogelke      4  52    0   220M   142M zio->i   380.68% par2

Script started at 0239, finished at 0757. Abbreviated output:

2024-06-10 02:39:16.476-04 doparity: par2create -r30 doc.tgz
Block size: 13163000
Source file count: 1
Source block count: 2000
Recovery block count: 600
Recovery file count: 10

Opening: doc.tgz
Computing Reed Solomon matrix.
Writing recovery packets
Writing verification packets
Done

2024-06-10 03:23:59.006-04 doparity: par2verify doc.tgz.par2
Loading "doc.tgz.par2".
Loading "doc.tgz.vol003+004.par2".
...
Loading "doc.tgz.vol007+008.par2".
Loading "doc.tgz.par2".

There are 1 recoverable files and 0 other files.
The block size used was 13163000 bytes.
There are a total of 2000 data blocks.
The total size of the data files is 26325997473 bytes.

Verifying source files:
Opening: "doc.tgz"
All files are correct, repair is not required.

[...]

2024-06-10 03:31:32.439-04 doparity: par2create -r30 home.tgz
2024-06-10 05:05:30.548-04 doparity: par2create -r30 root.tgz
2024-06-10 05:15:20.715-04 doparity: par2create -r30 search.tgz
2024-06-10 05:36:40.943-04 doparity: par2create -r30 src1.tgz
2024-06-10 06:25:33.170-04 doparity: par2create -r30 src2.tgz
2024-06-10 07:03:40.108-04 doparity: par2create -r30 src3.tgz
2024-06-10 07:34:13.293-04 doparity: par2create -r30 src4.tgz
2024-06-10 07:34:13.729-04 doparity: par2create -r30 usr-local.tgz
2024-06-10 07:49:01.738-04 doparity: par2create -r30 usr-ports.tgz
2024-06-10 07:56:50.077-04 doparity: par2create -r30 var-locate.tgz
2024-06-10 07:56:53.829-04 doparity: par2create -r30 var-log.tgz
2024-06-10 07:57:06.615-04 doparity: done

14. Copy the full backup

Copied new files to the removable drive. I have to mount it as root, but I can copy files as myself.

me% ntfs-3g /dev/da1p1 /media/usb
Error opening read-only '/dev/da1p1': Permission denied
Failed to mount '/dev/da1p1': Permission denied
Please check '/dev/da1p1' and the ntfs-3g binary permissions,
and the mounting user ID. More explanation is provided at
https://github.com/tuxera/ntfs-3g/wiki/NTFS-3G-FAQ

me% sudo /usr/local/bin/ntfs-3g /dev/da1p1 /media/usb

me% df /media/usb
Filesystem 1M-blocks Used  Avail Capacity  Mounted on
/dev/da1p1    953834  115 953719     0%    /media/usb

I tried cpio first because that's my go-to for copies. I got as far as home.tgz after 30 minutes:

me% cd /archive/bkup
find PERMS doc home root search src usr-local \
    usr-ports var-locate var-log |
    sort | cpio  -pdumv /media/usb/full/2024-0513
/media/usb/full/2024-0513/PERMS
/media/usb/full/2024-0513/PERMS/doc
/media/usb/full/2024-0513/PERMS/home
/media/usb/full/2024-0513/PERMS/root
/media/usb/full/2024-0513/PERMS/search
/media/usb/full/2024-0513/PERMS/src
/media/usb/full/2024-0513/PERMS/usr-local
/media/usb/full/2024-0513/PERMS/usr-ports
/media/usb/full/2024-0513/PERMS/var-locate
/media/usb/full/2024-0513/PERMS/var-log
/media/usb/full/2024-0513/doc
/media/usb/full/2024-0513/doc/doc.tgz
/media/usb/full/2024-0513/doc/doc.tgz.par2
/media/usb/full/2024-0513/doc/doc.tgz.vol000+001.par2
/media/usb/full/2024-0513/doc/doc.tgz.vol001+002.par2
/media/usb/full/2024-0513/doc/doc.tgz.vol003+004.par2
/media/usb/full/2024-0513/doc/doc.tgz.vol007+008.par2
/media/usb/full/2024-0513/doc/doc.tgz.vol015+016.par2
/media/usb/full/2024-0513/doc/doc.tgz.vol031+032.par2
/media/usb/full/2024-0513/doc/doc.tgz.vol063+064.par2
/media/usb/full/2024-0513/doc/doc.tgz.vol127+128.par2
/media/usb/full/2024-0513/doc/doc.tgz.vol255+256.par2
/media/usb/full/2024-0513/doc/doc.tgz.vol511+089.par2
/media/usb/full/2024-0513/home
/media/usb/full/2024-0513/home/home.tgz^C

Using cp went a bit faster:

me% cat dosave
#!/bin/sh
list='PERMS doc home root search src usr-local usr-ports var-locate var-log'
for d in $list
do
    t=$(date "+%F %T")
    printf "$t: $d\n"
    cp -r --preserve=timestamps $d /media/usb/full/2024-0513
done

t=$(date "+%F %T")
printf "$t: done\n"
exit 0

me% sh ./dosave
2024-06-10 22:40:55: PERMS
2024-06-10 22:40:56: doc
2024-06-10 22:57:03: home
2024-06-10 23:18:22: root
2024-06-10 23:21:00: search
2024-06-10 23:27:02: src
2024-06-10 23:57:41: usr-local
2024-06-11 00:02:18: usr-ports
2024-06-11 00:04:59: var-locate
2024-06-11 00:05:06: var-log
2024-06-11 00:05:24: done

I compared hashes for copies -- if cp failed, it should have said something:

me% cd /archive/bkup
me% find . -name '*.tgz.par2' -print | sort | xargs md5sum > /tmp/sums

me% cd /media/usb/full/2024-0513
me% md5sum -c /tmp/sums
./doc/doc.tgz.par2: OK
./home/home.tgz.par2: OK
./root/root.tgz.par2: OK
./search/search.tgz.par2: OK
./src/src1.tgz.par2: OK
./src/src2.tgz.par2: OK
./src/src3.tgz.par2: OK
./src/src4.tgz.par2: OK
./usr-local/usr-local.tgz.par2: OK
./usr-ports/usr-ports.tgz.par2: OK
./var-locate/var-locate.tgz.par2: OK
./var-log/var-log.tgz.par2: OK

me% cd
me% df /media/usb
Filesystem 1M-blocks   Used  Avail Capacity  Mounted on
/dev/da1p1    953834 182507 771327    19%    /media/usb

Now create a smaller directory for any incrementals since the full backups were run. Copy those over, unmount the drive.

15. Copy incrementals

Next steps: copy the daily incrementals over, merge them into monthly tarballs, and make the parity files.

me% cd /archive/bkup/inc
me% ls -l
-rw-r--r-- 1 vogelke wheel  8108871680 12-Jun-2024 03:17:34 2024-01.tar
-rw-r--r-- 1 vogelke wheel  8011304960 12-Jun-2024 03:19:42 2024-02.tar
-rw-r--r-- 1 vogelke wheel 12224337920 12-Jun-2024 03:21:28 2024-03.tar
-rw-r--r-- 1 vogelke wheel 14371584000 12-Jun-2024 03:23:30 2024-04.tar
-rw-r--r-- 1 vogelke wheel  5588633600 12-Jun-2024 03:24:16 2024-05.tar
-rw-r--r-- 1 vogelke wheel  1546833920 12-Jun-2024 03:24:29 2024-06.tar

me% foreach x (*.tar)
foreach> echo $x
foreach> date
foreach> par2create -r30 $x
foreach> end
2024-01.tar
Wed Jun 12 06:32:30 EDT 2024

Block size: 4054436
Source file count: 1
Source block count: 2000
Recovery block count: 600
Recovery file count: 10

Opening: 2024-01.tar
Computing Reed Solomon matrix.
Constructing: done.
Wrote 134217600 bytes to disk
Wrote 16744800 bytes to disk
Writing recovery packets
Writing verification packets
Done

[...]
2024-06.tar
Wed Jun 12 07:53:05 EDT 2024

Block size: 773420
Source file count: 1
Source block count: 2000
Recovery block count: 600
Recovery file count: 10

Opening: 2024-06.tar
Computing Reed Solomon matrix.
Constructing: done.
Wrote 134217600 bytes to disk
Wrote 61399200 bytes to disk
Writing recovery packets
Writing verification packets
Done

Wed Jun 12 07:55:10 EDT 2024

I use a small awk script to show the relative sizes:

me% bytes *.tar
6 files 49851566080 bytes

me% bytes *.tar*
72 files 64820112016 bytes

16. Copy parity files

This took just under 90 minutes:

me% mkdir -p /media/usb/inc/2024-0611
me% cd /archive/bkup/inc

me% cp -v --preserve=timestamps * /media/usb/inc/2024-0611
'2024-01.tar' -> '/media/usb/inc/2024-0611/2024-01.tar'
'2024-01.tar.par2' -> '/media/usb/inc/2024-0611/2024-01.tar.par2'
'2024-01.tar.vol000+001.par2' ->
    '/media/usb/inc/2024-0611/2024-01.tar.vol000+001.par2'
[...]
'2024-01.tar.vol511+089.par2' ->
    '/media/usb/inc/2024-0611/2024-01.tar.vol511+089.par2'
'2024-02.tar' -> '/media/usb/inc/2024-0611/2024-02.tar'
[...]
'2024-06.tar' -> '/media/usb/inc/2024-0611/2024-06.tar'
[...]
'2024-06.tar.vol511+089.par2' ->
    '/media/usb/inc/2024-0611/2024-06.tar.vol511+089.par2'

Finished at 0920. Abbreviated directory tree:

me% df /media/usb
Filesystem 1M-blocks   Used  Avail Capacity  Mounted on
/dev/da1p1    953834 244324 709510    26%    /media/usb

me% dtree -as /media/usb
/media
|   +--usb
|   |   +--full
|   |   |   +--2024-0513
|   |   |   |   +--doc
|   |   |   |   |   +--doc.tgz
|   |   |   |   |   +--doc.tgz.par2
|   |   |   |   |   +--doc.tgz.vol000+001.par2
|   |   |   |   |   +--doc.tgz.vol001+002.par2
|   |   |   |   |   +--doc.tgz.vol003+004.par2
|   |   |   |   |   +--doc.tgz.vol007+008.par2
|   |   |   |   |   +--doc.tgz.vol015+016.par2
|   |   |   |   |   +--doc.tgz.vol031+032.par2
|   |   |   |   |   +--doc.tgz.vol063+064.par2
|   |   |   |   |   +--doc.tgz.vol127+128.par2
|   |   |   |   |   +--doc.tgz.vol255+256.par2
|   |   |   |   |   +--doc.tgz.vol511+089.par2
|   |   |   |   +--home
|   |   |   |   |   +--home.tgz
|   |   |   |   |   +--home.tgz.par2
|   |   |   |   |   +--home.tgz.vol000+001.par2
|   |   |   |   |   +--home.tgz.vol001+002.par2
|   |   |   |   |   +--home.tgz.vol003+004.par2
|   |   |   |   |   +--home.tgz.vol007+008.par2
|   |   |   |   |   +--home.tgz.vol015+016.par2
|   |   |   |   |   +--home.tgz.vol031+032.par2
|   |   |   |   |   +--home.tgz.vol063+064.par2
|   |   |   |   |   +--home.tgz.vol127+128.par2
|   |   |   |   |   +--home.tgz.vol255+256.par2
|   |   |   |   |   +--home.tgz.vol511+089.par2
[...]
|   |   |   |   +--var-log
|   |   |   |   |   +--var-log.tgz
|   |   |   |   |   +--var-log.tgz.par2
|   |   |   |   |   +--var-log.tgz.vol000+001.par2
|   |   |   |   |   +--var-log.tgz.vol001+002.par2
|   |   |   |   |   +--var-log.tgz.vol003+004.par2
|   |   |   |   |   +--var-log.tgz.vol007+008.par2
|   |   |   |   |   +--var-log.tgz.vol015+016.par2
|   |   |   |   |   +--var-log.tgz.vol031+032.par2
|   |   |   |   |   +--var-log.tgz.vol063+064.par2
|   |   |   |   |   +--var-log.tgz.vol127+128.par2
|   |   |   |   |   +--var-log.tgz.vol255+256.par2
|   |   |   |   |   +--var-log.tgz.vol511+089.par2
|   |   +--inc
|   |   |   +--2024-0611
|   |   |   |   +--2024-01.tar
|   |   |   |   +--2024-01.tar.par2 [...]
|   |   |   |   +--2024-02.tar
|   |   |   |   +--2024-02.tar.par2
|   |   |   |   +--2024-03.tar
|   |   |   |   +--2024-03.tar.par2
|   |   |   |   +--2024-04.tar
|   |   |   |   +--2024-04.tar.par2
|   |   |   |   +--2024-05.tar
|   |   |   |   +--2024-05.tar.par2
|   |   |   |   +--2024-06.tar
|   |   |   |   +--2024-06.tar.par2

17. Unmount the drive and clean up

We're done.

root# umount /media/usb

18. Scripts

Here are the scripts I used.

Hope someone finds this useful.

19. Feedback

Feel free to send comments.


Generated from article.t2t by txt2tags
$Revision: 1.5 $
$UUID: fae996a7-8575-31a8-911d-3ff83a67bcd9 $