This is how I copied a full backup with parity files to a removable drive. The instructions are for a FreeBSD system, but other than the USB stuff, they should be (more or less) usable on Linux.
All the scripts I used are available at the end of the article.
Here's the removable drive I bought:
https://www.westerndigital.com/products/portable-drives/ wd-easystore-portable-3-0-hdd Capacity: 1 TB Interface: USB 3.2 Gen 1 Connector: Micro B S/N: WDBAJN0010BBK-WESE System Requirements * Formatted NTFS * Windows 10+ * Reformatting required for macOS 11+. Note: Compatibility may vary depending on user's hardware configuration and operating system. Dimensions (L x W x H): 4.33" x 3.21" x 0.5" In The Box * Portable hard drive * SuperSpeed USB-A cable (5Gbps) * Software2 for device management and backup * Quick install guide
Support for USB storage devices is built into the GENERIC kernel. For a custom kernel, be sure that the following lines are present in the kernel configuration file:
device scbus # SCSI bus (required for ATA/SCSI) device da # Direct Access (disks) device pass # Passthrough device (direct ATA/SCSI access) device uhci # provides USB 1.x support device ohci # provides USB 1.x support device ehci # provides USB 2.0 support device xhci # provides USB 3.0 support device usb # USB Bus (required) device umass # Disks/Mass storage - Requires scbus and da device cd # needed for CD and DVD burners
Local check:
me% uname -snvm FreeBSD hairball FreeBSD 13.2-RELEASE-p4 GENERIC amd64 me% ls -l /usr/src/sys/amd64/conf -rw-r--r-- 1 root wheel 412 06-Apr-2023 20:34:41 DEFAULTS -rw-r--r-- 1 root wheel 15260 06-Apr-2023 20:34:41 GENERIC -rw-r--r-- 1 root wheel 68 06-Apr-2023 20:34:41 GENERIC-KASAN ... -rw-r--r-- 1 root wheel 5451 06-Apr-2023 20:34:41 MINIMAL -rw-r--r-- 1 root wheel 19154 06-Apr-2023 20:34:41 NOTES me% cat /tmp/devs device cd device da device ehci device ohci device pass device scbus device uhci device umass device usb device xhci me% expand -1 /usr/src/sys/amd64/conf/GENERIC | grep -f /tmp/devs - device scbus # SCSI bus (required for ATA/SCSI) device da # Direct Access (disks) device cd # CD device pass # Passthrough device (direct ATA/SCSI access) device uhci # UHCI PCI->USB interface device ohci # OHCI PCI->USB interface device ehci # EHCI PCI->USB interface (USB 2.0) device xhci # XHCI PCI->USB interface (USB 3.0) device usb # USB Bus (required) device umass # Disks/Mass storage - Requires scbus and da
To test the USB configuration, plug in the USB device. Use dmesg to confirm that the drive appears in the system message buffer:
[566112] usb_msc_auto_quirk: UQ_MSC_NO_TEST_UNIT_READY set for USB mass storage device Western Digital easystore 2648 (0x1058:0x2648) [566113] usb_msc_auto_quirk: UQ_MSC_NO_PREVENT_ALLOW set for USB mass storage device Western Digital easystore 2648 (0x1058:0x2648) [566113] ugen0.3: <Western Digital easystore 2648> at usbus0 [566113] umass1 on uhub1 [566113] umass1: <Western Digital easystore 2648, class 0/0, rev 3.20/10.34, addr 2> on usbus0 [566113] umass1: SCSI over Bulk-Only; quirks = 0x8001 [566113] umass1:7:1: Attached to scbus7 [566113] pass7 at umass-sim1 bus 1 scbus7 target 0 lun 0 [566113] pass7: <WD easystore 2648 1034> Fixed Direct Access SPC-4 SCSI device [566113] pass7: Serial Number 575833324137314556444435 [566113] pass7: 400.000MB/s transfers [566113] da1 at umass-sim1 bus 1 scbus7 target 0 lun 0 da1: <WD easystore 2648 1034> Fixed Direct Access SPC-4 SCSI device da1: Serial Number 575833324137314556444435 da1: 400.000MB/s transfers da1: 953837MB (1953458176 512 byte sectors) da1: quirks=0x2<NO_6_BYTE> [566114] da1: Delete methods: <UNMAP(*),ZERO> [566114] GEOM: new disk da1 [566114] pass8 at umass-sim1 bus 1 scbus7 target 0 lun 1 [566114] pass8: <WD SES Device 1034> Fixed Enclosure Services SPC-4 SCSI device [566114] pass8: Serial Number 575833324137314556444435 [566114] pass8: 400.000MB/s transfers [566114] ses1 at umass-sim1 bus 1 scbus7 target 0 lun 1 ses1: <WD SES Device 1034> Fixed Enclosure Services SPC-4 SCSI device ses1: Serial Number 575833324137314556444435 ses1: 400.000MB/s transfers ses1: SES Device
Bracketed numbers are seconds since last reboot. I track reboots in /var/log/reboot:
me% cat -n /var/log/reboot 1 freebsd-13.2-release 2023-06-27 01:57:48 -0400 2 freebsd-13.2-release 2023-06-27 03:19:50 -0400 ... 24 freebsd-13.2-release-p4 2024-06-01 04:33:56 -0400 25 freebsd-13.2-release-p4 2024-06-01 14:21:26 -0400 me% date -d '2024-06-01 14:21:26' '+%s' 1717266086 me% echo 1717266086 + 566112 | bc 1717832198 me% date -d @1717832198 Sat Jun 8 03:36:38 EDT 2024
It's just about right. System sees the device:
root# ls -lF /dev/da1* crw-r----- 1 root operator 0, 195 08-Jun-2024 03:36:16 /dev/da1 crw-r----- 1 root operator 0, 196 08-Jun-2024 03:36:16 /dev/da1p1
Since the USB device is seen as a SCSI one, camcontrol can be used to list the USB storage devices attached to the system:
root# camcontrol devlist ... <WD easystore 2648 1034> at scbus7 target 0 lun 0 (da1,pass7) <WD SES Device 1034> at scbus7 target 0 lun 1 (ses1,pass8)
Alternately, usbconfig can be used to list the device. Refer to usbconfig(8) for more information about this command.
root# usbconfig ugen0.3: <Western Digital easystore 2648> at usbus0, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=ON (224mA)
NTFS is the most portable and reliable way I've seen for writing to a removable drive. FreeBSD has dropped native support for NTFS, so use the sysutils/fusefs-ntfs port.
Before I can use FUSE for anything, I have to make sure it's loaded. Previous versions of FreeBSD required "fuse.ko"; now it's "fusefs.ko":
root# kldstat Id Refs Address Size Name 1 40 0xffffffff80200000 1f3e2d0 kernel 2 1 0xffffffff8213f000 3728 coretemp.ko 3 1 0xffffffff82143000 59e008 zfs.ko 4 1 0xffffffff826e2000 2870 accf_data.ko 5 1 0xffffffff826e5000 a4a0 cryptodev.ko 6 1 0xffffffff82ce5000 3530 fdescfs.ko 7 1 0xffffffff82ce9000 3060 mac_portacl.ko 8 1 0xffffffff82ced000 31a80 linux.ko 9 1 0xffffffff82d1f000 be88 linux_common.ko 10 1 0xffffffff82d2b000 14b98 netlink.ko 11 1 0xffffffff82d40000 3250 ichsmb.ko 12 1 0xffffffff82d44000 2180 smbus.ko 13 1 0xffffffff82d47000 2a08 mac_ntpd.ko root# kldload fusefs root# kldstat Id Refs Address Size Name 1 42 0xffffffff80200000 1f3e2d0 kernel 2 1 0xffffffff8213f000 3728 coretemp.ko .. 13 1 0xffffffff82d47000 2a08 mac_ntpd.ko 14 1 0xffffffff82d4a000 11cd8 fusefs.ko
If you need fusefs(5) during boot put this in /boot/loader.conf:
fusefs_load="YES"
If you don't need it at boot time, use this in /etc/rc.conf:
kld_list="fusefs"
I use a function called "bsdpath" to switch to a BSD-style build environment, because my default setup includes GNU make and that won't work:
bsdpath () { x="$HOME/.bsdpath" if test -f "$x" then echo "Setting BSD-build PATH" eval setenv $(head -1 "$x") export BSDPATH=1 else echo "Sorry, $x not found" fi echo "PATH=$PATH" }
Here's $HOME/.bsdpath -- it's a Bernstein-style configuration file:
PATH /sbin:/usr/sbin:/bin:/usr/bin:/usr/local/bin:/usr/local/sbin:/root/bin Use this for compiling ports, etc. or anything requiring BSD make.
The "defpath" function moves me back to my default build environment:
defpath () { x="$HOME/.path" if test -f "$x" then echo "Setting default PATH" eval setenv $(head -1 "$x") unset BSDPATH else echo "Sorry, $x not found" fi echo "PATH=$PATH" }
Here's $HOME/.path:
PATH /root/bin:/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/bin:/usr/bin
Set up the build environment and run make:
root# bsdpath root# echo $PATH /sbin:/usr/sbin:/bin:/usr/bin:/usr/local/bin:/usr/local/sbin:/root/bin root# cd /usr/ports/sysutils/fusefs-ntfs root# make
You can see the full make output here
root# make package ==> Building package for fusefs-ntfs-2022.10.3 root# ls -l work/pkg total 1 -rw-r--r-- 1 root wheel 556432 Jun 8 04:07 fusefs-ntfs-2022.10.3.pkg root# mv work/pkg/fusefs-ntfs-2022.10.3.pkg /usr/packages/local root# make install ==> Installing for fusefs-ntfs-2022.10.3 ==> Checking if fusefs-ntfs is already installed ==> Registering installation for fusefs-ntfs-2022.10.3 Installing fusefs-ntfs-2022.10.3... Jun 8 04:07:43 hairball pkg-static[68428]: fusefs-ntfs-2022.10.3 installed NTFS-3G has been installed. It requires fusefs(5) support to operate, so issue the ``kldload fusefs'' command or ``sysrc kld_list+=fusefs'' to make it load automatically when the system starts. For further information, implementation details, and known issues see the FreeBSD README (/usr/local/share/doc/ntfs-3g/README.FreeBSD) in addition to the official README (contains some Linux-specific parts). root# make clean ==> Cleaning for libublio-20070103_3 ==> Cleaning for fusefs-libs-2.9.9_2 ==> Cleaning for fusefs-ntfs-2022.10.3 root# defpath
To mount a USB device, any /etc/fstab entry must use ntfs-3g:
/dev/da1p1 /mnt ntfs mountprog=/usr/local/bin/ntfs-3g,noauto,late,rw 0 0
After plugging the drive in, I tried mounting:
root# mkdir -p /media/usb root# ls -lF /media drwxr-xr-x 2 root wheel 2 08-Jun-2024 04:27:09 usb/ root# ntfs-3g /dev/da1p1 /media/usb root# df /media/usb Filesystem 1M-blocks Used Avail Capacity Mounted on /dev/da1p1 953834 115 953719 0% /media/usb root# mount|grep usb /dev/da1p1 on /media/usb (fusefs) root# cd /media/usb root# ls 'Install Western Digital Software for Mac.dmg' 'Install Western Digital Software for Windows.exe'*
ALWAYS unmount the filesystem, or you're damned likely to lose anything copied to that drive!
Next, do a full backup to that drive. I want belt and suspenders; copy regular tarballs to the drive, and add PAR2 parity files to protect against corruption.
I installed par2cmdline from FreeBSD ports -- you can probably find a package for your system or just build from source:
root# cd /usr/ports/archivers/par2cmdline root# make root# make install root# make clean
My backups are compressed tarballs under /archive/tmp:
-rw-r--r-- 1 vogelke mis 26325997473 09-May-2024 17:15:22 doc.tgz -rw-r--r-- 1 vogelke mis 34948146727 09-May-2024 17:44:15 home.tgz -rw-r--r-- 1 vogelke mis 4374343284 08-May-2024 10:46:18 root.tgz -rw-r--r-- 1 vogelke sys 10154228294 13-May-2024 03:55:35 search.tgz -rw-r--r-- 1 vogelke mis 22855671471 11-May-2024 05:45:44 src1.tgz -rw-r--r-- 1 vogelke mis 18746714446 11-May-2024 06:08:28 src2.tgz -rw-r--r-- 1 vogelke mis 15312047928 11-May-2024 06:20:16 src3.tgz -rw-r--r-- 1 vogelke mis 1657280 11-May-2024 05:32:33 src4.tgz -rw-r--r-- 1 vogelke mis 8043896411 09-May-2024 17:58:45 usr-local.tgz -rw-r--r-- 1 vogelke mis 5308775187 09-May-2024 18:05:15 usr-ports.tgz -rw-r--r-- 1 vogelke mis 260641384 09-May-2024 18:05:34 var-locate.tgz -rw-r--r-- 1 vogelke mis 730648893 09-May-2024 19:03:25 var-log.tgz
I couldn't store a single tarball for my source directory -- not enough room, so I split it up. I used a script similar to this for my other datasets:
me% cat bkup cd /backup/full/2022/1222 || exit 1 date; mkdir -p doc date; ( cd doc && tar xzf /archive/tmp/doc.tgz ) date; mkdir -p home date; ( cd home && tar xzf /archive/tmp/home.tgz ) date; mkdir -p search date; ( cd search && tar xzf /archive/tmp/search.tgz ) date; mkdir -p src date; ( cd src && tar xzf /archive/tmp/src.tgz ) date; mkdir -p usr/local date; ( cd usr/local && tar xzf /archive/tmp/usr-local.tgz ) date; mkdir -p var/locate date; ( cd var/locate && tar xzf /archive/tmp/var-locate.tgz ) date; mkdir -p var/log date; ( cd var/log && tar xzf /archive/tmp/var-log.tgz ) date; exit 0
A directory called PERMS holds files storing the directory permissions. Those are the ones that most often need fixing after restoring from any type of archive.
See this for how it's done.
I moved the tarballs into their own directories so I could create parity files for each one.
/archive/tmp +--PERMS | +--doc | +--home | +--root | +--search | +--src | +--usr-local | +--usr-ports | +--var-locate | +--var-log +--doc | +--doc.tgz +--home | +--home.tgz +--root | +--root.tgz +--search | +--search.tgz +--src | +--src1.tgz | +--src2.tgz | +--src3.tgz | +--src4.tgz +--usr-local | +--usr-local.tgz +--usr-ports | +--usr-ports.tgz +--var-locate | +--var-locate.tgz +--var-log | +--var-log.tgz
I want 30% redundancy for each tarball. Try a small file first.
root# mkdir /archive/work root# chmod 1777 /archive/work me% cp /archive/tmp/usr-local/usr-local.tgz /archive/work/tst.tgz me% cd /archive/work me% ls -l -rw-r--r-- 1 vogelke wheel 8043896411 09-Jun-2024 03:54:20 tst.tgz
Ran the first par2create test in the same directory to get an idea of elapsed time. I'll compare using a 1-Tb SSD.
me% date; par2create -r30 tst.tgz; date Sun Jun 9 03:55:50 EDT 2024 Block size: 4021952 Source file count: 1 Source block count: 2000 Recovery block count: 600 Recovery file count: 10 Opening: tst.tgz Computing Reed Solomon matrix. Constructing: done. Wrote 134217600 bytes to disk Wrote 134217600 bytes to disk Wrote 134217600 bytes to disk Wrote 134217600 bytes to disk Wrote 134217600 bytes to disk Wrote 134217600 bytes to disk Wrote 134217600 bytes to disk Wrote 134217600 bytes to disk Wrote 134217600 bytes to disk Wrote 134217600 bytes to disk Wrote 134217600 bytes to disk Wrote 134217600 bytes to disk Wrote 134217600 bytes to disk Wrote 134217600 bytes to disk Wrote 134217600 bytes to disk Wrote 134217600 bytes to disk Wrote 134217600 bytes to disk Wrote 131472000 bytes to disk Writing recovery packets Writing verification packets Done Sun Jun 9 04:07:15 EDT 2024
Files created:
8043896411 09-Jun-2024 03:54:20 tst.tgz 40404 09-Jun-2024 04:07:15 tst.tgz.par2 4062424 09-Jun-2024 04:07:15 tst.tgz.vol000+001.par2 8124744 09-Jun-2024 04:07:15 tst.tgz.vol001+002.par2 16209084 09-Jun-2024 04:07:15 tst.tgz.vol003+004.par2 32337464 09-Jun-2024 04:07:15 tst.tgz.vol007+008.par2 64553924 09-Jun-2024 04:07:15 tst.tgz.vol015+016.par2 128946544 09-Jun-2024 04:07:15 tst.tgz.vol031+032.par2 257691484 09-Jun-2024 04:07:15 tst.tgz.vol063+064.par2 515141064 09-Jun-2024 04:07:15 tst.tgz.vol127+128.par2 1029999924 09-Jun-2024 04:07:15 tst.tgz.vol255+256.par2 358241984 09-Jun-2024 04:07:15 tst.tgz.vol511+089.par2
Now corrupt the file and check it:
me% md5sum tst.tgz 3248ba7cd057ae3a26380b7885ea8b72 tst.tgz me% dd if=/dev/zero seek=1000 bs=1 count=10000 of=tst.tgz conv=notrunc 10000+0 records in 10000+0 records out 10000 bytes (10 kB, 9.8 KiB) copied, 0.0258182 s, 387 kB/s me% dd if=/dev/zero seek=10000 bs=1 count=10000 of=tst.tgz conv=notrunc 10000+0 records in 10000+0 records out 10000 bytes (10 kB, 9.8 KiB) copied, 0.0256597 s, 390 kB/s me% dd if=/dev/zero seek=100000 bs=1 count=10000 of=tst.tgz conv=notrunc 10000+0 records in 10000+0 records out 10000 bytes (10 kB, 9.8 KiB) copied, 0.0255272 s, 392 kB/s me% md5sum tst.tgz a171d62aa0e6568f8e30d5116944d133 tst.tgz
See if it can be fixed.
me% par2verify tst.tgz.par2 Loading "tst.tgz.par2". Loaded 4 new packets Loading "tst.tgz.vol127+128.par2". Loaded 128 new packets including 128 recovery blocks Loading "tst.tgz.vol000+001.par2". Loaded 1 new packets including 1 recovery blocks Loading "tst.tgz.vol015+016.par2". Loaded 16 new packets including 16 recovery blocks Loading "tst.tgz.vol007+008.par2". Loaded 8 new packets including 8 recovery blocks Loading "tst.tgz.vol063+064.par2". Loaded 64 new packets including 64 recovery blocks Loading "tst.tgz.vol003+004.par2". Loaded 4 new packets including 4 recovery blocks Loading "tst.tgz.vol511+089.par2". Loaded 89 new packets including 89 recovery blocks Loading "tst.tgz.vol031+032.par2". Loaded 32 new packets including 32 recovery blocks Loading "tst.tgz.vol001+002.par2". Loaded 2 new packets including 2 recovery blocks Loading "tst.tgz.vol255+256.par2". Loaded 256 new packets including 256 recovery blocks Loading "tst.tgz.par2". No new packets found There are 1 recoverable files and 0 other files. The block size used was 4021952 bytes. There are a total of 2000 data blocks. The total size of the data files is 8043896411 bytes. Verifying source files: Opening: "tst.tgz" Target: "tst.tgz" - damaged. Found 1999 of 2000 data blocks. Scanning extra files: Repair is required. 1 file(s) exist but are damaged. You have 1999 out of 2000 data blocks available. You have 600 recovery blocks available. Repair is possible. You have an excess of 599 recovery blocks. 1 recovery blocks will be used to repair.
I have a ton of spare repair capacity, which is good. Try to repair the archive:
me% date; par2repair tst.tgz.par2; date Sun Jun 9 04:18:33 EDT 2024 Loading "tst.tgz.par2". Loaded 4 new packets Loading "tst.tgz.vol127+128.par2". Loaded 128 new packets including 128 recovery blocks Loading "tst.tgz.vol000+001.par2". Loaded 1 new packets including 1 recovery blocks Loading "tst.tgz.vol015+016.par2". Loaded 16 new packets including 16 recovery blocks Loading "tst.tgz.vol007+008.par2". Loaded 8 new packets including 8 recovery blocks Loading "tst.tgz.vol063+064.par2". Loaded 64 new packets including 64 recovery blocks Loading "tst.tgz.vol003+004.par2". Loaded 4 new packets including 4 recovery blocks Loading "tst.tgz.vol511+089.par2". Loaded 89 new packets including 89 recovery blocks Loading "tst.tgz.vol031+032.par2". Loaded 32 new packets including 32 recovery blocks Loading "tst.tgz.vol001+002.par2". Loaded 2 new packets including 2 recovery blocks Loading "tst.tgz.vol255+256.par2". Loaded 256 new packets including 256 recovery blocks Loading "tst.tgz.par2". No new packets found There are 1 recoverable files and 0 other files. The block size used was 4021952 bytes. There are a total of 2000 data blocks. The total size of the data files is 8043896411 bytes. Verifying source files: Opening: "tst.tgz" Target: "tst.tgz" - damaged. Found 1999 of 2000 data blocks. Scanning extra files: Repair is required. 1 file(s) exist but are damaged. You have 1999 out of 2000 data blocks available. You have 600 recovery blocks available. Repair is possible. You have an excess of 599 recovery blocks. 1 recovery blocks will be used to repair. Computing Reed Solomon matrix. Constructing: done. Solving: done. Wrote 8043896411 bytes to disk Verifying repaired files: Opening: "tst.tgz" Target: "tst.tgz" - found. Repair complete. Sun Jun 9 04:21:43 EDT 2024
3 min to repair 8 Gb on spinning rust isn't bad.
Mess up the file more -- since I have about 30% redundancy, copy random crap over 1Gb of the file and see what happens.
me% cd /tmp me% dd if=/dev/urandom of=junk bs=1M count=30 me% cat junk junk > 2 ; mv 2 junk me% cat junk junk > 2 ; mv 2 junk me% ls -l junk -rw-r--r-- 1 vogelke wheel 125829120 09-Jun-2024 04:30:54 /tmp/junk
Trying to create repair files in one directory for data files in another directory is too much trouble -- I kept getting messages asking me to specify a repair file or list of data files. Easier to just copy them to /var/tmp/work.
Looks like this is more CPU-bound than IO-bound, which is fine. Test script:
me% cat doit date; par2create -r30 tst.tgz; date dd if=/tmp/junk seek=500 bs=1M of=tst.tgz conv=notrunc dd if=/tmp/junk seek=1000 bs=1M of=tst.tgz conv=notrunc dd if=/tmp/junk seek=1500 bs=1M of=tst.tgz conv=notrunc dd if=/tmp/junk seek=2000 bs=1M of=tst.tgz conv=notrunc date; par2verify tst.tgz.par2; date date; par2repair tst.tgz.par2; date
Abbreviated results from running the script:
me% sh -x ./doit + date Sun Jun 9 06:02:17 EDT 2024 + par2create -r30 tst.tgz Block size: 4021952 Source file count: 1 Source block count: 2000 Recovery block count: 600 Recovery file count: 10 Opening: tst.tgz Computing Reed Solomon matrix. Constructing: done. Wrote 134217600 bytes to disk [...] Wrote 131472000 bytes to disk Writing recovery packets Writing verification packets Done + date Sun Jun 9 06:12:27 EDT 2024 + dd 'if=/tmp/junk' 'seek=500' 'bs=1M' 'of=tst.tgz' 'conv=notrunc' 120+0 records in 120+0 records out 125829120 bytes (126 MB, 120 MiB) copied, 0.0285935 s, 4.4 GB/s [...] + date Sun Jun 9 06:12:27 EDT 2024 + par2verify tst.tgz.par2 Loading "tst.tgz.par2". Loaded 4 new packets Loading "tst.tgz.vol063+064.par2". Loaded 64 new packets including 64 recovery blocks [...] There are 1 recoverable files and 0 other files. The block size used was 4021952 bytes. There are a total of 2000 data blocks. The total size of the data files is 8043896411 bytes. Verifying source files: Opening: "tst.tgz" Target: "tst.tgz" - damaged. Found 1872 of 2000 data blocks. Scanning extra files: Repair is required. 1 file(s) exist but are damaged. You have 1872 out of 2000 data blocks available. You have 600 recovery blocks available. Repair is possible. You have an excess of 472 recovery blocks. 128 recovery blocks will be used to repair. + date Sun Jun 9 06:13:33 EDT 2024 + par2repair tst.tgz.par2 Loading "tst.tgz.par2". Loaded 4 new packets Loading "tst.tgz.vol063+064.par2". Loaded 64 new packets including 64 recovery blocks [...] There are 1 recoverable files and 0 other files. The block size used was 4021952 bytes. There are a total of 2000 data blocks. The total size of the data files is 8043896411 bytes. Verifying source files: Opening: "tst.tgz" Target: "tst.tgz" - damaged. Found 1872 of 2000 data blocks. Scanning extra files: Repair is required. 1 file(s) exist but are damaged. You have 1872 out of 2000 data blocks available. You have 600 recovery blocks available. Repair is possible. You have an excess of 472 recovery blocks. 128 recovery blocks will be used to repair. Computing Reed Solomon matrix. Constructing: done. Solving: done. [...] Repair complete. + date Sun Jun 9 06:18:27 EDT 2024
Here's the complete script output.
30% seems to be overkill. If space is a problem, use 20% instead. The overhead is about 30%, which makes sense -- no free lunch.
I wrote a script called doparity to handle multiple archive files. It writes short lines to syslog and full output under /var/log/usb.
Jun 10 01:37:02 doparity[9273]: start Jun 10 01:37:02 doparity[10124]: par2create -r30 var-locate.tgz Jun 10 01:37:24 doparity[12120]: par2verify var-locate.tgz.par2 Jun 10 01:37:26 doparity[12765]: par2create -r30 var-log.tgz Jun 10 01:38:20 doparity[14475]: par2verify var-log.tgz.par2 Jun 10 01:38:25 doparity[15129]: done
There are carriage-returns in the output which makes reading difficult, and using "-q" removes any useful output, so I clean it up with a sed script. Here are the results.
Running this on a big archive (26G) gives my system a workout:
me% w 2:45AM up 8 days, 12:24, 1 user, load averages: 3.07, 1.53, 0.83 me% top ... PID USERNAME THR PRI NICE SIZE RES STATE WCPU COMMAND 32836 vogelke 4 52 0 220M 142M zio->i 380.68% par2
Script started at 0239, finished at 0757. Abbreviated output:
2024-06-10 02:39:16.476-04 doparity: par2create -r30 doc.tgz Block size: 13163000 Source file count: 1 Source block count: 2000 Recovery block count: 600 Recovery file count: 10 Opening: doc.tgz Computing Reed Solomon matrix. Writing recovery packets Writing verification packets Done 2024-06-10 03:23:59.006-04 doparity: par2verify doc.tgz.par2 Loading "doc.tgz.par2". Loading "doc.tgz.vol003+004.par2". ... Loading "doc.tgz.vol007+008.par2". Loading "doc.tgz.par2". There are 1 recoverable files and 0 other files. The block size used was 13163000 bytes. There are a total of 2000 data blocks. The total size of the data files is 26325997473 bytes. Verifying source files: Opening: "doc.tgz" All files are correct, repair is not required. [...] 2024-06-10 03:31:32.439-04 doparity: par2create -r30 home.tgz 2024-06-10 05:05:30.548-04 doparity: par2create -r30 root.tgz 2024-06-10 05:15:20.715-04 doparity: par2create -r30 search.tgz 2024-06-10 05:36:40.943-04 doparity: par2create -r30 src1.tgz 2024-06-10 06:25:33.170-04 doparity: par2create -r30 src2.tgz 2024-06-10 07:03:40.108-04 doparity: par2create -r30 src3.tgz 2024-06-10 07:34:13.293-04 doparity: par2create -r30 src4.tgz 2024-06-10 07:34:13.729-04 doparity: par2create -r30 usr-local.tgz 2024-06-10 07:49:01.738-04 doparity: par2create -r30 usr-ports.tgz 2024-06-10 07:56:50.077-04 doparity: par2create -r30 var-locate.tgz 2024-06-10 07:56:53.829-04 doparity: par2create -r30 var-log.tgz 2024-06-10 07:57:06.615-04 doparity: done
Copied new files to the removable drive. I have to mount it as root, but I can copy files as myself.
me% ntfs-3g /dev/da1p1 /media/usb Error opening read-only '/dev/da1p1': Permission denied Failed to mount '/dev/da1p1': Permission denied Please check '/dev/da1p1' and the ntfs-3g binary permissions, and the mounting user ID. More explanation is provided at https://github.com/tuxera/ntfs-3g/wiki/NTFS-3G-FAQ me% sudo /usr/local/bin/ntfs-3g /dev/da1p1 /media/usb me% df /media/usb Filesystem 1M-blocks Used Avail Capacity Mounted on /dev/da1p1 953834 115 953719 0% /media/usb
I tried cpio first because that's my go-to for copies. I got as far as home.tgz after 30 minutes:
me% cd /archive/bkup find PERMS doc home root search src usr-local \ usr-ports var-locate var-log | sort | cpio -pdumv /media/usb/full/2024-0513 /media/usb/full/2024-0513/PERMS /media/usb/full/2024-0513/PERMS/doc /media/usb/full/2024-0513/PERMS/home /media/usb/full/2024-0513/PERMS/root /media/usb/full/2024-0513/PERMS/search /media/usb/full/2024-0513/PERMS/src /media/usb/full/2024-0513/PERMS/usr-local /media/usb/full/2024-0513/PERMS/usr-ports /media/usb/full/2024-0513/PERMS/var-locate /media/usb/full/2024-0513/PERMS/var-log /media/usb/full/2024-0513/doc /media/usb/full/2024-0513/doc/doc.tgz /media/usb/full/2024-0513/doc/doc.tgz.par2 /media/usb/full/2024-0513/doc/doc.tgz.vol000+001.par2 /media/usb/full/2024-0513/doc/doc.tgz.vol001+002.par2 /media/usb/full/2024-0513/doc/doc.tgz.vol003+004.par2 /media/usb/full/2024-0513/doc/doc.tgz.vol007+008.par2 /media/usb/full/2024-0513/doc/doc.tgz.vol015+016.par2 /media/usb/full/2024-0513/doc/doc.tgz.vol031+032.par2 /media/usb/full/2024-0513/doc/doc.tgz.vol063+064.par2 /media/usb/full/2024-0513/doc/doc.tgz.vol127+128.par2 /media/usb/full/2024-0513/doc/doc.tgz.vol255+256.par2 /media/usb/full/2024-0513/doc/doc.tgz.vol511+089.par2 /media/usb/full/2024-0513/home /media/usb/full/2024-0513/home/home.tgz^C
Using cp went a bit faster:
me% cat dosave #!/bin/sh list='PERMS doc home root search src usr-local usr-ports var-locate var-log' for d in $list do t=$(date "+%F %T") printf "$t: $d\n" cp -r --preserve=timestamps $d /media/usb/full/2024-0513 done t=$(date "+%F %T") printf "$t: done\n" exit 0 me% sh ./dosave 2024-06-10 22:40:55: PERMS 2024-06-10 22:40:56: doc 2024-06-10 22:57:03: home 2024-06-10 23:18:22: root 2024-06-10 23:21:00: search 2024-06-10 23:27:02: src 2024-06-10 23:57:41: usr-local 2024-06-11 00:02:18: usr-ports 2024-06-11 00:04:59: var-locate 2024-06-11 00:05:06: var-log 2024-06-11 00:05:24: done
I compared hashes for copies -- if cp failed, it should have said something:
me% cd /archive/bkup me% find . -name '*.tgz.par2' -print | sort | xargs md5sum > /tmp/sums me% cd /media/usb/full/2024-0513 me% md5sum -c /tmp/sums ./doc/doc.tgz.par2: OK ./home/home.tgz.par2: OK ./root/root.tgz.par2: OK ./search/search.tgz.par2: OK ./src/src1.tgz.par2: OK ./src/src2.tgz.par2: OK ./src/src3.tgz.par2: OK ./src/src4.tgz.par2: OK ./usr-local/usr-local.tgz.par2: OK ./usr-ports/usr-ports.tgz.par2: OK ./var-locate/var-locate.tgz.par2: OK ./var-log/var-log.tgz.par2: OK me% cd me% df /media/usb Filesystem 1M-blocks Used Avail Capacity Mounted on /dev/da1p1 953834 182507 771327 19% /media/usb
Now create a smaller directory for any incrementals since the full backups were run. Copy those over, unmount the drive.
Next steps: copy the daily incrementals over, merge them into monthly tarballs, and make the parity files.
me% cd /archive/bkup/inc me% ls -l -rw-r--r-- 1 vogelke wheel 8108871680 12-Jun-2024 03:17:34 2024-01.tar -rw-r--r-- 1 vogelke wheel 8011304960 12-Jun-2024 03:19:42 2024-02.tar -rw-r--r-- 1 vogelke wheel 12224337920 12-Jun-2024 03:21:28 2024-03.tar -rw-r--r-- 1 vogelke wheel 14371584000 12-Jun-2024 03:23:30 2024-04.tar -rw-r--r-- 1 vogelke wheel 5588633600 12-Jun-2024 03:24:16 2024-05.tar -rw-r--r-- 1 vogelke wheel 1546833920 12-Jun-2024 03:24:29 2024-06.tar me% foreach x (*.tar) foreach> echo $x foreach> date foreach> par2create -r30 $x foreach> end 2024-01.tar Wed Jun 12 06:32:30 EDT 2024 Block size: 4054436 Source file count: 1 Source block count: 2000 Recovery block count: 600 Recovery file count: 10 Opening: 2024-01.tar Computing Reed Solomon matrix. Constructing: done. Wrote 134217600 bytes to disk Wrote 16744800 bytes to disk Writing recovery packets Writing verification packets Done [...] 2024-06.tar Wed Jun 12 07:53:05 EDT 2024 Block size: 773420 Source file count: 1 Source block count: 2000 Recovery block count: 600 Recovery file count: 10 Opening: 2024-06.tar Computing Reed Solomon matrix. Constructing: done. Wrote 134217600 bytes to disk Wrote 61399200 bytes to disk Writing recovery packets Writing verification packets Done Wed Jun 12 07:55:10 EDT 2024
I use a small awk script to show the relative sizes:
me% bytes *.tar 6 files 49851566080 bytes me% bytes *.tar* 72 files 64820112016 bytes
This took just under 90 minutes:
me% mkdir -p /media/usb/inc/2024-0611 me% cd /archive/bkup/inc me% cp -v --preserve=timestamps * /media/usb/inc/2024-0611 '2024-01.tar' -> '/media/usb/inc/2024-0611/2024-01.tar' '2024-01.tar.par2' -> '/media/usb/inc/2024-0611/2024-01.tar.par2' '2024-01.tar.vol000+001.par2' -> '/media/usb/inc/2024-0611/2024-01.tar.vol000+001.par2' [...] '2024-01.tar.vol511+089.par2' -> '/media/usb/inc/2024-0611/2024-01.tar.vol511+089.par2' '2024-02.tar' -> '/media/usb/inc/2024-0611/2024-02.tar' [...] '2024-06.tar' -> '/media/usb/inc/2024-0611/2024-06.tar' [...] '2024-06.tar.vol511+089.par2' -> '/media/usb/inc/2024-0611/2024-06.tar.vol511+089.par2'
Finished at 0920. Abbreviated directory tree:
me% df /media/usb Filesystem 1M-blocks Used Avail Capacity Mounted on /dev/da1p1 953834 244324 709510 26% /media/usb me% dtree -as /media/usb /media | +--usb | | +--full | | | +--2024-0513 | | | | +--doc | | | | | +--doc.tgz | | | | | +--doc.tgz.par2 | | | | | +--doc.tgz.vol000+001.par2 | | | | | +--doc.tgz.vol001+002.par2 | | | | | +--doc.tgz.vol003+004.par2 | | | | | +--doc.tgz.vol007+008.par2 | | | | | +--doc.tgz.vol015+016.par2 | | | | | +--doc.tgz.vol031+032.par2 | | | | | +--doc.tgz.vol063+064.par2 | | | | | +--doc.tgz.vol127+128.par2 | | | | | +--doc.tgz.vol255+256.par2 | | | | | +--doc.tgz.vol511+089.par2 | | | | +--home | | | | | +--home.tgz | | | | | +--home.tgz.par2 | | | | | +--home.tgz.vol000+001.par2 | | | | | +--home.tgz.vol001+002.par2 | | | | | +--home.tgz.vol003+004.par2 | | | | | +--home.tgz.vol007+008.par2 | | | | | +--home.tgz.vol015+016.par2 | | | | | +--home.tgz.vol031+032.par2 | | | | | +--home.tgz.vol063+064.par2 | | | | | +--home.tgz.vol127+128.par2 | | | | | +--home.tgz.vol255+256.par2 | | | | | +--home.tgz.vol511+089.par2 [...] | | | | +--var-log | | | | | +--var-log.tgz | | | | | +--var-log.tgz.par2 | | | | | +--var-log.tgz.vol000+001.par2 | | | | | +--var-log.tgz.vol001+002.par2 | | | | | +--var-log.tgz.vol003+004.par2 | | | | | +--var-log.tgz.vol007+008.par2 | | | | | +--var-log.tgz.vol015+016.par2 | | | | | +--var-log.tgz.vol031+032.par2 | | | | | +--var-log.tgz.vol063+064.par2 | | | | | +--var-log.tgz.vol127+128.par2 | | | | | +--var-log.tgz.vol255+256.par2 | | | | | +--var-log.tgz.vol511+089.par2 | | +--inc | | | +--2024-0611 | | | | +--2024-01.tar | | | | +--2024-01.tar.par2 [...] | | | | +--2024-02.tar | | | | +--2024-02.tar.par2 | | | | +--2024-03.tar | | | | +--2024-03.tar.par2 | | | | +--2024-04.tar | | | | +--2024-04.tar.par2 | | | | +--2024-05.tar | | | | +--2024-05.tar.par2 | | | | +--2024-06.tar | | | | +--2024-06.tar.par2
We're done.
root# umount /media/usb
Here are the scripts I used.
Hope someone finds this useful.
Feel free to send comments.
Generated from article.t2t by
txt2tags
$Revision: 1.5 $
$UUID: fae996a7-8575-31a8-911d-3ff83a67bcd9 $