You're working away, and suddenly your edit session goes to hell because /tmp is full.
It's one thing when it happens on your workstation; it's a much bigger deal on a server with actual paying customers. Here are some scripts to make your life easier.
Use something like this for your crontab file if you want to check diskspace every 5 minutes around the clock:
# Set these environment variables within cron: CRON=yes # Scripts know when they're being run via cron. # MAILTO=yourname # Who to send mail to. Leave blank if no mail is to be sent. # # Environment variables set by cron: # SHELL=/bin/sh # USER=yourname # PATH=/usr/bin:/bin # PWD=/home/yourname # SHLVL=1 # HOME=/home/yourname # LOGNAME=yourname # # To test, uncomment this line: ## * * * * * /bin/env > /tmp/env$$ #=================================================================== # Everything on a line is separated by blanks or tabs. # #+----------------------------- Minute (0-59) #| +----------------------- Hour (0-23) #| | +----------------- Day (1-31) #| | | +------------- Month (1-12) #| | | | +--------- Day of week (0-6, 0=Sunday) #| | | | | +---- Command to be run #| | | | | | #v v v v v v #=================================================================== # Keep an eye on drives and disk space. Run every 5 minutes. 4-54/5 * * * * $HOME/cron/checkdrives
#!/bin/ksh # # $Revision: 1.3 $ $Date: 2010-11-10 13:20:42-05 $ # $UUID: ca583930-f781-3100-b878-5542c05bace9 $ # #<checkdrives: send mail if a filesystem gets too full # Try to avoid depending on GNU software being installed. PATH=/bin:/usr/bin BLOCKSIZE=1m BLOCK_SIZE=1048576 export PATH BLOCKSIZE BLOCK_SIZE tag=${0##*/} # Portability and configuration stuff here. subject='drives getting full' to='admin-urgent' host=$(hostname | cut -f1 -d.) work=$HOME/var/drives max=96 # more than this percent == drive is too full. # What df should we use? case "$(uname -s)" in SunOS) DF='/usr/xpg4/bin/df -F ufs' ;; FreeBSD) DF='df -t ufs' ;; *) DF='df' ;; esac # Real work starts here. Run df, skip the header, kill %-sign, # and list filesystems that are too full. filesys=$($DF | sed -n -e '2,$p' | tr -d '%' | awk -v max=$max '$5 >= max {print $6}') case "X$filesys" in X) exit 0 ;; *) ;; esac # Keep current and previous drive status. if test ! -d $work; then mkdir -p $work 2> /dev/null if test ! -d $work; then echo "$host: $tag: mkdir $work failed" | mailx $to exit 1 fi fi # Don't send the same message repeatedly. cd $work (echo $host; $DF $filesys) > cur if test -f prev; then cmp -s cur prev || mailx -s "$host $subject" $to < cur else mailx -s "$host $subject" $to < cur fi mv cur prev exit 0
Notice that the script uses mail to tell you about problems; just replace mailx with something to send a popup message if you're running this on the same host that's being checked.
If you have several hosts to keep track of, it's better to set up a mail address that will automatically send you a popup message or alert of some type upon receipt of a message. Procmail will handle that very nicely.
These can be incredibly annoying, so I don't use them unless there's something requiring immediate attention. If you use X-Windows, have a look at the xalarm package. If not, "write" will do the trick:
#!/bin/ksh # # $Revision: 1.3 $ $Date: 2011-09-25 19:51:09-04 $ # $UUID: 97362b8a-57af-3b67-b751-ce8712d62c27 $ # #<popup: send a quick popup message. export PATH=/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin export USER=yourname # If the user isn't taking calls, exit. test -f "$HOME/.nopopup" && exit 0 # If no message, exit. case "$#" in 0) exit 0 ;; *) str=${1+"$@"} ;; esac # If running under X use xalarm, else use write. case "$DISPLAY" in "") set X $(who | grep pts/ | head -1) tty="$3" echo "$str" | write $USER $tty ;; *) set X $(date) today="$4 $3 $5" msg=$(echo "$today @ $str" | tr '@' '\012') export DISPLAY xalarm -name xmemo -time +0 -geometry +20-40 -nowarn "$msg" ;; esac exit 0
If you know your system was fine a few hours ago, it's handy to have a timeline to see where things started going to hell. The examples below are run under Linux, but you only need trivial changes to use it under Solaris or FreeBSD.
Since "adm" is usually responsible for accounting stuff, I run these scripts under that userid. Here's the crontab file:
SHELL=/bin/bash PATH=/sbin:/bin:/usr/sbin:/usr/bin MAILTO=yourname HOME=/var/log/sa # # Need this for performance log archives PERFLOG=/var/log/perflog #=================================================================== # Everything on a line is separated by blanks or tabs. # #+--------------------------- Minute (0-59) #| +----------------------- Hour (0-23) #| | +----------------- Day (1-31) #| | | +------------- Month (1-12) #| | | | +--------- Day of week (0-6, 0=Sunday) #| | | | | +---- Command to be run #| | | | | | #v v v v v v #=================================================================== # Run performance log every 10 min. 1-51/10 * * * * /usr/local/cron/perflog $PERFLOG #------------------------------------------------------------------- # Summarize files just before midnight. 55 23 * * * run-parts /etc/cron.perflog
My /var/log/perflog directory looks like this:
/var/log/perflog: drwxr-s--- 3 adm mis 4096 Sep 24 00:01 2011/ drwxrwsr-x 2 adm mis 4096 Sep 23 23:55 2011.n/ /var/log/perflog/2011: drwxr-s--- 145 adm mis 4096 Sep 24 23:51 0924/ /var/log/perflog/2011/0924: drwxr-s--- 2 adm mis 4096 Sep 24 00:01 0001/ drwxr-s--- 2 adm mis 4096 Sep 24 00:11 0011/ drwxr-s--- 2 adm mis 4096 Sep 24 00:21 0021/ [...] drwxr-s--- 2 adm mis 4096 Sep 24 23:51 2351/ /var/log/perflog/2011/0924/0001: -rw-r----- 1 adm mis 754 Sep 24 00:01 cache -rw-r----- 1 adm mis 1424 Sep 24 00:01 df -rw-r----- 1 adm mis 848 Sep 24 00:01 ifconfig -rw-r----- 1 adm mis 771 Sep 24 00:01 meminfo -rw-r----- 1 adm mis 171 Sep 24 00:01 netstat -rw-r----- 1 adm mis 1198 Sep 24 00:01 ping -rw-r----- 1 adm mis 10724 Sep 24 00:01 ps -rw-r----- 1 adm mis 3245 Sep 24 00:01 smbstatus -rw-r----- 1 adm mis 104 Sep 24 00:01 swap -rw-r----- 1 adm mis 84 Sep 24 00:01 uname -rw-r----- 1 adm mis 71 Sep 24 00:01 uptime /var/log/perflog/2011/0924/0011: -rw-r----- 1 adm mis 754 Sep 24 00:11 cache -rw-r----- 1 adm mis 1424 Sep 24 00:11 df -rw-r----- 1 adm mis 848 Sep 24 00:11 ifconfig -rw-r----- 1 adm mis 771 Sep 24 00:11 meminfo -rw-r----- 1 adm mis 171 Sep 24 00:11 netstat -rw-r----- 1 adm mis 1197 Sep 24 00:11 ping -rw-r----- 1 adm mis 10776 Sep 24 00:11 ps -rw-r----- 1 adm mis 3568 Sep 24 00:11 smbstatus -rw-r----- 1 adm mis 104 Sep 24 00:11 swap -rw-r----- 1 adm mis 84 Sep 24 00:11 uname -rw-r----- 1 adm mis 71 Sep 24 00:11 uptime /var/log/perflog/2011/0924/0021: -rw-r----- 1 adm mis 754 Sep 24 00:21 cache -rw-r----- 1 adm mis 1424 Sep 24 00:21 df -rw-r----- 1 adm mis 848 Sep 24 00:21 ifconfig -rw-r----- 1 adm mis 58362 Sep 24 01:19 iostat -rw-r----- 1 adm mis 771 Sep 24 00:21 meminfo -rw-r----- 1 adm mis 10752 Sep 24 01:20 mpstat -rw-r----- 1 adm mis 171 Sep 24 00:21 netstat -rw-r----- 1 adm mis 1197 Sep 24 00:21 ping -rw-r----- 1 adm mis 10580 Sep 24 00:21 ps -rw-r----- 1 adm mis 3435 Sep 24 00:21 smbstatus -rw-r----- 1 adm mis 104 Sep 24 00:21 swap -rw-r----- 1 adm mis 84 Sep 24 00:21 uname -rw-r----- 1 adm mis 71 Sep 24 00:21 uptime -rw-r----- 1 adm mis 10654 Sep 24 01:19 vmstat [...] /var/log/perflog/2011/0924/2351: -rw-r----- 1 adm mis 754 Sep 24 23:51 cache -rw-r----- 1 adm mis 1424 Sep 24 23:51 df -rw-r----- 1 adm mis 848 Sep 24 23:51 ifconfig -rw-r----- 1 adm mis 771 Sep 24 23:51 meminfo -rw-r----- 1 adm mis 171 Sep 24 23:51 netstat -rw-r----- 1 adm mis 1197 Sep 24 23:51 ping -rw-r----- 1 adm mis 10281 Sep 24 23:51 ps -rw-r----- 1 adm mis 2651 Sep 24 23:41 smbstatus -rw-r----- 1 adm mis 104 Sep 24 23:51 swap -rw-r----- 1 adm mis 84 Sep 24 23:51 uname -rw-r----- 1 adm mis 72 Sep 24 23:51 uptime
Each file holds output from one specific command.
For example, the file /var/log/perflog/2011/0924/0001/cache holds output from "vmstat -s" at 12:01am, 9/24/2011:
1943948 total memory 1892424 used memory 49772 active memory 1806208 inactive memory 51524 free memory 6592 buffer memory 1819552 swap cache 2096472 total swap 75892 used swap 2020580 free swap 137464830 non-nice user cpu ticks 45415 nice user cpu ticks 7180476 system cpu ticks 651831801 idle cpu ticks 69403053 IO-wait cpu ticks 66205 IRQ cpu ticks 962645 softirq cpu ticks 0 stolen cpu ticks 1345948021 pages paged in 967598390 pages paged out 4522472 pages swapped in 4535965 pages swapped out 1806195040 interrupts 1877621550 CPU context switches 1312589033 boot time 2675983 forks
Every 20 minutes, output from iostat and mpstat is included:
Linux ... (server.com) 09/24/11 _i686_ (2 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 15.86 0.01 0.95 8.01 0.00 75.18 Device: r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 60.90 2.41 323.08 119.35 6.99 0.59 9.36 1.67 10.60 sdb 28.92 0.31 789.33 20.64 27.71 0.88 30.22 3.53 10.31 sdc 1.22 0.17 173.42 60.09 168.97 0.06 45.14 2.78 0.38 sdd 38.05 0.28 914.60 68.24 25.64 0.20 5.32 3.40 13.03 sde 0.13 0.14 22.63 51.03 271.50 0.05 166.80 3.48 0.09 sdf 6.47 0.14 675.25 55.67 110.54 0.10 14.80 2.60 1.72 sdg 17.62 0.22 705.99 71.35 43.56 0.59 33.05 4.55 8.11
Just before midnight, I like to jam the day's entries into one file to reduce storage space. One easy way is to use "head":
==> 0924/0001/df <== Filesystem 1M-blocks Used Available Use% Mounted /dev/sda1 15873 9821 5234 66% / /dev/sda2 7933 2628 4896 35% /var /dev/sda5 7933 247 7277 4% /home /dev/sda6 335728 252096 66303 80% /rd01 tmpfs 950 0 950 0% /dev/shm tmpfs 950 64 887 7% /tmp /dev/sdb6 341144 233036 90780 72% /rd02 /dev/sdc6 341144 217470 106345 68% /rd03 /dev/sdd6 341144 225758 98058 70% /rd04 /dev/sdf6 341144 263800 60015 82% /rd07 /dev/sdg6 341144 244507 79308 76% /rd08 /dev/sde6 341144 28520 295295 9% /rd05 Filesystem Inodes IUsed IFree IUse% Mounted /dev/sda1 4198176 446524 3751652 11% / /dev/sda2 2097152 5429 2091723 1% /var /dev/sda5 2097152 1387 2095765 1% /home /dev/sda6 88735744 232375 88503369 1% /rd01 tmpfs 191235 1 191234 1% /dev/shm tmpfs 191235 12 191223 1% /tmp /dev/sdb6 44367872 187957 44179915 1% /rd02 /dev/sdc6 44367872 44983 44322889 1% /rd03 /dev/sdd6 44367872 276423 44091449 1% /rd04 /dev/sdf6 44367872 196609 44171263 1% /rd07 /dev/sdg6 44367872 147284 44220588 1% /rd08 /dev/sde6 44367872 6438 44361434 1% /rd05 ==> 0924/0001/ifconfig <== [...] lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:539988 errors:0 dropped:0 overruns:0 frame:0 TX packets:539988 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:44423496 (42.3 MiB) TX bytes:44423496 (42.3 MiB) ==> 0924/0001/meminfo <== MemTotal: 1943948 kB MemFree: 53076 kB Buffers: 6180 kB Cached: 1798304 kB SwapCached: 3256 kB [...]
These files compress very nicely under a separate directory tree:
/var/log/perflog/2011.n: -rw-r--r-- 1 adm mis 405844 Jun 1 23:55 0601.xz -rw-r--r-- 1 adm mis 398520 Jun 2 23:55 0602.xz [...] -rw-r--r-- 1 adm mis 394736 Sep 21 23:55 0921.xz -rw-r--r-- 1 adm mis 429668 Sep 22 23:55 0922.xz -rw-r--r-- 1 adm mis 4629315 Sep 23 23:55 0923
To create the summaries and do the cleanup, I have three scripts that run under their own directory:
me% cd /etc/cron.perflog me% ls -lF -rwxr-xr-x 1 root mis 1506 Mar 12 2011 100.perf-reduce* -rwxr-xr-x 1 root mis 1462 Mar 12 2011 110.perf-clean* -rwxr-xr-x 1 root mis 1161 Mar 12 2011 120.perf-compress* me% grep '#<' * | cut -f2 -d'<' 100.perf-reduce: merges separate perflog files to save space. 110.perf-clean: removes old perflog directory if "reduce" worked. 120.perf-compress: runs xz on yesterday's logfile.
If you want to keep an eye on who (or what) uses the most space over time, you can put this script under /etc/cron.daily.
This writes du summary output to a file named after the current date.
#!/bin/ksh # # $Revision: 1.4 $ $Date: 2010-11-09 15:11:35-05 $ # $UUID: 31a46065-7dd1-3f41-b27c-bc96ce22c12d $ # #<dirsize: see how big each top-level group directory is. # usage: dirsize [etc-file [output-file]] export PATH=/usr/local/bin:/bin:/usr/bin export BLOCKSIZE=1m export BLOCK_SIZE=1048576 # BLOCK* sets du output to Mbytes. umask 022 tag=$(basename $0) host=$(hostname | cut -f1 -d.) out='/var/adm/sa/du' # Format output in consistent-width columns. # Argument is the number of columns you want. layout () { case "$#" in 0) k=1 ;; *) k=$1 ;; esac case "$k" in [1-9]) ;; *) echo 'layout botch'; exit 1 ;; esac awk '{printf "%6s %s\n", $1, $2}' | pr -o1 -w88 -${k}t | expand } say () { echo; echo "$(date '+%Y-%m-%d %T'): $*"; echo } warn () { echo "WARN: $(date '+%Y-%m-%d %T'): $*" } logmsg () { logger -t $tag -p local1.info "$@" } die () { logmsg "FATAL: $*"; exit 1 } # Check the input settings file. Set an optional output file. ofile= case "$#" in 0) ifile="/usr/local/etc/$tag" ;; 1) ifile="$1" ;; 2) ifile="$1"; ofile="$2" ;; esac test -f "$ifile" || die "$ifile not found" # Figure out the date. logmsg start set X $(date "+%Y %m%d"); shift yr=$1 mday=$2 # Set up the output file. test -d "$out/$yr" || mkdir -p $out/$yr test -d "$out/$yr" || die "unable to mkdir $out/$yr" # Redirect all stdout and stderr output. case "$ofile" in "") ofile="$out/$yr/$mday" ;; *) ;; esac exec > $ofile exec 2>&1 # Real work starts here. Read the directories, columns, etc. grep '^[1-9]' $ifile | while read depth columns dir do if test -d "$dir" then say Directory $dir else warn "$dir: not a directory" continue fi # NOTE: after awk, we could put "sort -nr" or "cat" depending # on whether you wanted output sorted by directory size. # Ignore anything under 10 Mb. ( cd $dir find . -mindepth $depth -maxdepth $depth -type d -print | sort | tr '\012' '\000' | xargs -0 du -s | awk '{ if ($1 > 9) print }' | layout $columns | sed -e 's! ./! !g' ) done say done logmsg done exit 0
Some sample output from 9/24/2011:
2011-09-24 04:27:56: Directory /fs1b/server5/2008 ** 935 0104 495 0325 381 0530 407 0807 296 1015 897 0107 223 0328 230 0602 228 0813 260 1021 502 0110 441 0331 544 0605 789 0819 646 1024 435 0116 480 0403 282 0611 197 0822 387 1027 790 0122 440 0409 245 0617 276 0825 446 1030 561 0125 138 0415 231 0620 286 0828 177 1105 599 0128 277 0418 204 0623 308 0903 263 1114 425 0131 246 0421 131 0626 2602 0909 864 1117 409 0206 660 0424 461 0702 164 0912 576 1120 396 0212 283 0430 264 0708 322 0915 352 1126 556 0215 938 0506 513 0711 435 0918 596 1202 620 0221 358 0509 713 0714 37 0921 746 1205 574 0227 554 0512 338 0717 132 0924 431 1208 503 0304 688 0515 326 0723 625 0930 745 1211 204 0307 11 0518 252 0729 440 1003 288 1217 591 0310 355 0521 376 0801 544 1006 118 1223 307 0313 512 0527 284 0804 1069 1009 126 1229 435 0319 2011-09-24 04:32:28: Directory /fs1b/server5/2009 594 0707 174 0812 121 0921 627 1027 113 1202 536 0715 194 0820 203 0925 186 1104 112 1210 22 0719 345 0824 280 0929 275 1112 315 1214 227 0723 267 0828 252 1007 174 1116 104 1218 311 0727 672 0901 494 1015 322 1120 76 1222 469 0731 136 0909 188 1019 283 1124 38 1230 427 0804 303 0917 840 1023 **
For example, the directory holding backups for server5 on 1/4/2008 takes up 935 Mb. The directory holding backups for server5 on 10/23/2009 takes up 840 Mb.
"dirsize" reads directory and layout information from the file /usr/local/etc/dirsize:
# $Revision: 1.1 $ $Date: 2011-08-09 18:52:30-04 $ # $UUID: d9f49e1d-111c-35dd-9265-8d81644455c8 $ # # Expand this list into additional directories to check. # Field 1: min/max depth of directories to traverse # Field 2: number of columns to print # Field 3: starting directory # # EXAMPLE: # "2 3 /usr" runs "cd /usr; find . -mindepth 2 -maxdepth 2 -type d" # and prints 3-column output. 1 5 /fs1b/server5/2008 1 5 /fs1b/server5/2009
The lines for "server5" tell the script to descend one level into the given directory and print the results in 5 columns.
If I were doing this over again, I'd divvy up the work a bit differently. Instead of writing the report in one script, I'd store raw du output without any formatting in one directory, and have separate scripts to read that and write something suitable for a webpage display or database import.
Feel free to send comments.
Generated from disk-space.t2t by
txt2tags
$Revision: 1.8 $