1. Introduction
  2. Description
  3. Source code
  4. Timing results on spinning rust
  5. Basic idea
  6. Feedback

1. Introduction

I monitor some medium-to-large (multi-Gb) files for changes, and I'd rather not run a full hash on the whole thing. It's time-consuming, and if they're not on a ZFS filesystem, I can't take advantage of the automatic checksumming to warn me about corruption.

I use a script called *chunkhash* to read blocks at intervals in the file, store their SHA1 hashes and output a final hash generated from the intermediate ones. I'm not looking for crypto-level security, just an indication of when something's changed.

2. Description

For large files (256 Mb and up):

open the file
read and hash 1 Mb
skip 63 Mb
read and hash 1 Mb
skip 63 Mb
lather, rinse, repeat...

For intermediate files (4-256 Mb), it reads 256k and skips 2Mb. Small files (<4 Mb) are completely hashed.

This idea is certainly not original with me; maybe it'll scratch an itch for someone out there. It took about 90 seconds to check 393 Gbytes:

me% date; chunkhash */*.tgz; date
Sat Mar 28 04:44:14 EDT 2026
69t3+P4ZfcHUR5QtbS764e+dsf0  archive-iso/part.01.tgz
Rp3kNmgfIGH4whjjZYkcIXGixDM  archive-iso/part.02.tgz
9bqyWAteNYuCFF3Vo+SLl+20UMo  archive-iso/part.03.tgz
Ph1KMSvK8lj421jFWQcbiOl2gGU  archive-iso/part.04.tgz
...
sBa9CvupF9Qw23nAWHWapCx0Itk  var-log/part.01.tgz
J9HbZau8M5ZMvVs1y7jl5ETS0vU  var-log/part.02.tgz
bfDv1AjS2TB9AvmooORcJZHTwds  var-log/part.03.tgz
k+xj9H8cvNOeQoiJrLsMl9T/gsg  var-tmp/part.01.tgz
Sat Mar 28 04:45:46 EDT 2026

3. Source code

Download the script here.

4. Timing results on spinning rust

 Size           Filename              SHA1 hash         real/user/sys
=====================================================================
 1,860,173,824  AlmaLinux-9.3min.iso  w+kS5...4Jl0JC8  0.59/0.07/0.01
 2,038,824,960  debian-11.7.0.iso     9CDN8...gpih2s0  0.67/0.08/0.00
 2,085,617,664  AlmaLinux-8.5min.iso  8VhtZ...oyoNBYM  0.54/0.08/0.00
 2,260,729,856  MX-25.1.iso           26eL5...cJdJGkQ  0.67/0.08/0.01
 2,401,763,328  latest-nixos.iso      B6Dhm...QU1O2tw  0.70/0.08/0.02
 2,678,560,768  24.01.1-XFCE.iso      akSGM...oMbrfnA  0.10/0.10/0.00
 2,773,874,688  live-server.iso       WDKjt...YYxF4TM  0.74/0.10/0.00
 2,960,867,328  lmde-7-cinnamon.iso   rzgpN...igttJZU  0.85/0.09/0.02
 2,997,185,540  FreeBSD-13.2.iso.xz   8uqWO...79jG+KU  0.92/0.10/0.01
 3,033,710,592  linuxmint-22.3.iso    sfyp3...RiColh8  0.85/0.11/0.00
 3,343,018,440  FreeBSD-12.3.iso.xz   L48tY...Gp8NC5c  0.85/0.11/0.02
 3,413,260,180  amd64-dvd1.iso.xz     59hOa...1xxe7M4  0.86/0.10/0.03
 3,461,158,896  FreeBSD-12.4.iso.xz   49DSY...fSxHR+A  0.88/0.10/0.03
 3,659,560,960  FreeBSD-11.3.iso      8Kuup...j7v6QK0  0.84/0.12/0.02
 3,888,513,024  debian-11.3.0-1.iso   +J9NH...aravDY4  0.85/0.14/0.00
 4,415,711,232  debian-11.3.0-3.iso   h/9R9...Wzyv1ko  1.31/0.16/0.00
 4,684,036,096  debian-11.3.0-2.iso   +3Y4E...8g8OLaE  1.01/0.16/0.00
 4,692,592,640  debian-11.3.0-4.iso   DQo99...CL2FYM4  1.15/0.15/0.01
 4,693,522,432  debian-11.3.0-5.iso   E1SS/...unk0LGw  1.05/0.16/0.00
 4,786,749,440  OracleLinux-7.8.iso   iCimI...qQTUiSk  1.29/0.17/0.00
 4,857,004,032  OracleLinux-7.9.iso   XIJPh...c9Z39s4  1.18/0.18/0.00
 5,954,863,104  Qubes-R4.1.0.iso      hsZAy...3MNd4yA  1.35/0.20/0.01
 6,203,355,136  desktop-amd64.iso     czY1X...qzeLbJ0  2.40/0.22/0.00
10,627,317,760  AlmaLinux-8.5.iso     97mCQ...o0q68l4  2.81/0.34/0.04
10,916,724,736  AlmaLinux-9.3.iso     +1npY...4CyXw94  3.02/0.33/0.04

5. Basic idea

begin

open file

while NOT DONE
do
    sysread shortbuf

    if (return == 0)
    then
        we're done
        break out of while
    fi

    if (return > 0)
    then
        tmp = hash of shortbuf
        append tmp to array holding hashes
        break out of while
    fi

    skip ahead in file
    if (skip ahead fails)
    then
        we're done
        break out of while
    fi
done

skip to end of file or error exit

skip back length of shortbuf or error exit

sysread shortbuf
if (return > 0)
then
    tmp = hash of shortbuf
    append tmp to array holding hashes
fi

close file

if (array holding hashes is not empty)
then
    get sha1sum of array holding hashes
    result = that sha1sum plus filename
else
    result = error message
fi

print result

end

6. Feedback

Feel free to send comments.

Generated from article.t2t by txt2tags
$Revision: 1.2 $
$UUID: 10d96439-5333-376c-a66b-3a4f0f992392 $