# Monitoring System Statistics - Shell Scripting

One of the core responsibilities of the Linux system administrator is to ensure that the system is running properly. To accomplish this task, there are lots of different system statistics that you must monitor. Creating automated shell scripts to monitor specific situations can be a lifesaver.
This section shows how to create simple shell scripts that monitor and report problems as soon as possible, without you even having to be logged in to the Linux system.

Monitoring disk free space

One of the biggest problems with multi-user Linux systems is the amount of available disk space. In some situations, such as in a file sharing server, disk space can fill up almost immediately just because of one careless user. This shell script will monitor the available disk space on a specific volume, and send out an e-mail message if the available disk space goes below a set threshold.

The required functions

To automatically monitor the available disk space, you’ll first need to use a command that can display that value. The best command to monitor disk space is the df command.
The basic output of the df command looks like this:

$df Filesystem 1K-blocks Used Available Use% Mounted on /dev/hda1 3197228 2453980 580836 81% / varrun 127544 100 127444 1% /var/run varlock 127544 4 127540 1% /var/lock udev 127544 44 127500 1% /dev devshm 127544 0 127544 0% /dev/shm /dev/hda3 801636 139588 621328 19% /home$

The df command shows the current disk space statistics for all of the real and virtual disks on the system. For the purposes of this exercise, we’ll monitor the size of the root filesystem. We’ll first need to parse the results of the df command to extract only the line for the root filesystem.

There are a couple of different ways to do this. Probably the easiest is to use the sed command to search for the line that ends with a forward slash:

/dev/hda1 3197228 2453980 580836 81% /

The sed script to build this uses the dollar sign (to indicate the end of the line), and the forward slash (which you’ll need to escape since it’s a sed special character). That will look like this:

$df | sed -n ’//$/p’
/dev/hda1 3197228 2453980 580836 81% /
$Now that you’ve got the statistics for the root filesystem volume, the next step is to isolate the percentage used value in the line. To do that, you’ll need to use the gawk command:$ df | sed -n ’//$/p’ | gawk ’{print$5}’
81%
$This is close, but there’s still one small problem. The gawk command let you filter out the fifth data field, but the value also includes the percent symbol. You’ll need to remove that so you can use the value in a mathematical equation. That’s easily accomplished by using the sed command again:$ df | sed -n ’//$/p’ | gawk ’{print$5}’ | sed ’s/%//’
81
$Now you’re in business! The next step is to use this information to create the script. Creating the script Now that you know how to extract the used disk space value, you can use that formula to store the value to a variable. You can then check the variable against a predetermined number to indicate when the used space has exceeded your set limit. Here’s an example of code using this technique:$ cat diskmon
#!/bin/bash
# monitor available disk space
SPACE=df | sed -n ’//$/p’ | gawk ’{print$5}’ | sed ’s/%//
if [ $SPACE -ge 90 ] then echo "Disk space on root at$SPACE% used" | mail -s "Disk warning"
rich
fi
$And there you have it. A simple shell script that’ll check the available disk space on the root filesystem and send an e-mail message if the used space is at 90% or more. Running the script Before having the diskmon script run automatically, you’ll want to test it out a few times manually to ensure that it does what you think it should do. To test it, change the value that it checks against to a value lower than the current disk usage percentage: if [$SPACE -ge 40 ]
When you run the script, it should send you a mail message:
$./diskmon$ mail
Mail version 8.1.2 01/15/2001. Type ? for help.
"/var/mail/rich": 1 message 1 new
›N 1 rich@testbox Tue Feb 5 06:22 16/672 Disk warning
&
Message 1:
From rich@testbox Tue Feb 5 06:22:26 2008
Date: Tue, 5 Feb 2008 06:22:26 -0500
From: rich ‹rich@testbox›
To: rich@localhost.localdomain
Subject: Disk warning
Disk space on root at 81% used
&q
$It worked! Now you can set the shell script to execute at a set number of times to monitor the disk activity. You do this using the cron table. How often you need to run this script depends on how active your file server is. For a low-volume file server, you may only have to run the script once a day: 30 0 * * * /home/rich/diskmon This cron table entry runs the shell script every day at 12:30 AM. For a high-volume file server environment, you may have to monitor this a few times a day: 30 0,8,12,16 * * * /home/rich/diskmon This cron table entry runs the shell script four times a day, at 12:30 AM, 8:30 AM, 12:30 PM, and 4:30 PM. Catching disk hogs If you’re responsible for a Linux server with lots of users, one problem that you’ll often bump up against is who’s using all of the disk space. This age-old administration question is sometimes harder to figure out than others. Unfortunately, for the importance of tracking user disk space usage, there’s no one Linux command that can provide that information for you. Instead, you need to write a shell script piecing other commands together to extract the information you’re looking for. This section walks you through this process. The required functions The first tool you’ll need to use is the du command. This command displays the disk usage for individual files and directories. The -s option lets you summarize totals at the directory level. This will come in handy when calculating the total disk space used by an individual user. Just use this command for the /home directory contents to summarize for each user’s$HOME directory:

# du -s /home/*
40 /home/barbara
9868 /home/jessica
40 /home/katie
40 /home/lost+found
107340 /home/rich
5124 /home/test
#

Okay, that’s a start. You can now see the total listing (in KB) for the $HOME directory totals. Depending on how your /home directory is mounted, you may or may not also see a special directory called lost+found, which isn’t a user account. To get rid of that, we use the grep command with the -v option, which prints all of the lines except ones that contain the specified text: # du -s /home/* | grep -v lost 40 /home/barbara 9868 /home/jessica 40 /home/katie 107340 /home/rich 5124 /home/test # Next, let’s get rid of the full pathname so all we see are the user names. This sounds like a job for the sed command: # du -s /home/* | grep -v lost | sed ’s//home//’ 40 barbara 9868 jessica 40 katie 107340 rich 5124 test # Much better. Now, let’s sort this output so that it appears in descending order: # du -s /home/* | grep -v lost | sed ’s//home///’ | sort -g -r 107340 rich 9868 jessica 5124 test 40 katie 40 barbara # The sort command sorts numerical values when you use the -g option, and will sort in descending order when you include the -r option. There’s just one more piece of information that you’re interested in, and that’s the total amount of space used for all of the users. Here’s how to get that value: # du -s /home 122420 /home # This is all the information you’ll need to create the disk hogs report. Now you’re ready to push this into a shell script. Creating the script Now that you know how to extract the raw data for the report, it’s time to figure out a script that can read the report, parse the raw data values, and display it in a format that’s presentable to a user. The easiest way to manipulate data for reports is with the gawk command. The report will have three sections: • A header with text identifying the columns • The body of the report, showing the user, their total disk space used, and a percentage of the total they’re consuming • A footer which shows the total disk space usage for all users The gawk command can perform all of these functions as a single command, using the BEGIN and END tags. Here’s the diskhogs script that puts all of the elements together: # cat diskhogs #!/bin/bash # calculate disk usage and report per user TEMP=mktemp -t tmp.XXXXXX du -s /home/* | grep -v lost | sed ’s//home///’ | sort -g -r ›$TEMP
TOTAL=du -s /home | gawk ’{print $1}’ cat$TEMP | gawk -v n="$TOTAL" ’ BEGIN { print "Total Disk Usage by User"; print "UsertSpacetPercent" } { printf "%st%dt%6.2f%n",$2, $1, ($1/n)*100
}
END {
print "--------------------------";
printf "Totalt%dn", n
}’
rm -f $TEMP # The script sends the result from the formula used to generate the raw data to a temporary file, then stores the result of the total disk space formula in the variable$TOTAL.

Next, the script retrieves the raw data in the temporary file, and sends it to the gawk command.The gawk command retrieves the $TOTAL value and assigns it to a local variable called n. The code in the gawk command first creates the report header in the BEGIN section: BEGIN { print "Total Disk Usage by user"; print UsertSpacetPercent" } It then uses the printf command to format and rearrange the text from the raw data: { printf "%st%dt%6.2f%n",$2, $1, ($1/n)*100
}

This is the section that processes each line of output from the du command. The printf command allows us to format the output to make a nice table. If you happen to have long usernames on your system, you may need to fudge the formatting some to get it to turn out.

The other trick here is that the diskhogs script passes the $TOTAL variable value to the gawk script via the gawk command line parameter: -v n=$TOTAL

Now the gawk variable n is equal to the total user disk space, and you can the use that value anywhere in the gawk script.

Finally, the script ends the output display by showing the total amount of user disk spaced used:

END {
print "--------------------------";
printf "Totalt%dn", n
}’

this uses the n variable, which contains the value from the $TOTAL shell variable. Running the script Putting it all together, when you run the script you should get a nice report: # ./diskhogs Total Disk Usage by user User Space Percent rich 107340 87.68% jessica 9868 8.06% test 5124 4.19% katie 40 0.03% barbara 40 0.03% -------------------------- Total 122420 # Now you’ve got a report that you can send off to your boss and feel proud of! The core statistics of any Linux system are the CPU and memory usage. If these values start getting out of control, things can go wrong very quickly on the system. This section demonstrates how to write scripts to help you monitor and track the CPU and memory usage on your Linux system, using a couple of basic shell scripts. The required functions As with the other scripts shown so far in this chapter, the first step is to determine exactly what data you want to produce with your scripts. There are a few different commands that you can use to extract CPU and memory information for the system. The most basic system statistics command is the uptime command:$ uptime
09:57:15 up 3:22, 3 users, load average: 0.00, 0.08, 0.28
$The uptime command gives us a few different basic pieces of information that we can use: • The current time • The number of days, hours, and minutes the system has been operational • The number of users currently logged into the system • The one, five, and fifteen minute load averages Another great command for extracting system information is the vmstat command. Here’s an example of the output from the vmstat command:$ vmstat
procs- ----memory--------- ---swap-- --io-- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 178660 13524 4316 72076 8 10 80 22 127 124 3 1 92 4 0
$The first time you run the vmstat command, it displays the average values since the last reboot.To get the current statistics, you must run the vmstat command with command line parameters:$ vmstat 1 2
procs- ----memory--------- ---swap-- --io-- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 178660 13524 4316 72076 8 10 80 22 127 124 3 1 92 4 0
0 0 178660 12845 4245 71890 8 10 80 22 127 124 3 1 86 10 0
$The second line contains the current statistics for the Linux system. As you can see, the output from the vmstat command is somewhat cryptic. Table below explains what each of the symbols mean. Table: The vmstat Output Symbols This is a lot of information. You probably don’t need to record all of the values from the vmstat command; just a few will do. The free memory, and percent of CPU time spent idle, should give you a good snapshot of the system status (and by now you can probably guess exactly how we’ll extract those values from the output). You may have also noticed that the output from the vmstat command includes table heading information, which we obviously don’t want in our data. To solve that problem, you can use thesed command to display only lines that have a numerical value in them:$ vmstat 1 2 | sed -n ’/[0-9]/p’
1 0 172028 8564 5276 62244 9 13 98 28 160 157 5 2 89 5 0
0 0 178660 12845 4245 71890 8 10 80 22 127 124 3 1 86 10 0
$That’s better, but now we need to get only the second line of data. Another call to the sed editor can solve that problem:$ vmstat | sed -n ’/[0-9]/p’ | sed -n ’2p’
0 0 178660 12845 4245 71890 8 10 80 22 127 124 3 1 86 10 0
$Now you can easily extract the data value you want using the gawk program. Finally, you’ll want to tag each data record with a date and timestamp to indicate when the snapshots were taken. The date command is handy, but the default output from the date command might be a little cumbersome. You can simplify the date command output by specifying another format:$ date +"%m/%d/%Y %k:%M:%S"
02/05/2008 19:19:26
$That should look much better in our output. Speaking of the output, you should also consider how you want to record the data values. For data that you sample on a regular basis, often it’s best to output the data directly to a log file. You can create the log file in your$HOME directory, appending data each time you run the shell script. When you want to see the results, you can just view the log file.

You should also spend some time considering the format of the log file. You’ll want to ensure that the data in the log file can be read easily (after all that’s the whole purpose of this script).There are many different methods you can use to format the data in the log file. A popular format is comma-separated values (CSV). This format places each record of data on a separate line, and separates the data fields in the record with commas. This is a popular format for people who love spreadsheets, as it’s easily imported into a spreadsheet.

However, staring at a CSV file of data is not the most exciting thing in the world. If you want to provide a more aesthetically appealing report, you can create an HTML document. HTML has been the standard method for formatting Web pages for years. It uses simple tags to delineate data types within the Web page. However, HTML is not just for Web pages. You’ll often find HTML used in e-mail messages as well. Depending on the MUA client, you may or may not be able to view an embedded HTML e-mail document. A better solution is to create the HTML report, and attach it to the e-mail message.

The script will save data in a CSV-formatted file, so you can always access the raw data to import into a spreadsheet. When the system administrator runs the report script, that will reformat the data into an HTML report. How cool is that?

Creating the capture script

Since you need to sample system data at a regular interval, you’ll need two separate scripts. One script will capture the data and save it to the log file. This script should be run on a regular basis from the cron table. The frequency depends on how busy your Linux system is. For most systems, running the script once an hour should be fine.

The second script should output the report data and e-mail it to the appropriate individual(s).Most likely you won’t want to e-mail a new report every time you get a new data sampling. You’ll probably want to run the second script as a cron job at a lower frequency, such as once a day, first thing in the day.

The first script, used to capture the data, is called capstats. Here’s what it looks like:

$cat capstats #!/bin/bash # script to capture system statistics OUTFILE=/home/rich/capstats.csv DATE=date +%m/%d/%Y TIME=date +%k:%M:%S TIMEOUT=uptime VMOUT=vmstat 1 2 USERS=echo$TIMEOUT | gawk ’{print $4}’ LOAD=echo$TIMEOUT | gawk ’{print $9}’ | sed ’s/,//’ FREE=echo$VMOUT |sed -n ’/[0-9]/p’ |sed -n ’2p’ |gawk ’{print $4}’ IDLE=echo$VMOUT |sed -n ’/[0-9]/p’ |sed -n ’2p’|gawk ’{print $15}’ echo "$DATE,$TIME,$USERS,$LOAD,$FREE,$IDLE" ››$OUTFILE
$This script mines the statistics from the uptime and vmstat commands and saves them in variables. The script then writes the values to the file defined by the$OUTFILE variable. For this example, I just saved the file in my $HOME directory. You should modify this location to what suits your environment best. After creating the capstats script, you should probably test it from the command line before having it run regularly from your cron table:$ ./capstats
$cat capstats.csv 02/06/2008,10:39:57,4,0.26,57076,87$

The script created the new file, and placed the statistic values in the proper places. Just to make sure that subsequent runs of the script don’t overwrite the file, test it again:

$./capstats$ cat capstats.csv
02/06/2008,10:39:57,4,0.26,57076,87
02/06/2008,10:41:52,4,0.14,46292,88
$As hoped, the second time the script ran it appended the new statistics data to the end of the file.Now you’re ready to place this in the cron table. To run it once every hour, create this cron table entry: 0 * * * * /home/rich/capstats You’ll need to use the full pathname of where you place your capstats shell script file for thecron table. Now you’re capturing statistics once every hour without having to do anything else! Generating the report script Now that you have a file full of raw data, you can start working on the script to generate a fancy report for your boss. The best tool for this is the gawk command. The gawk command allows you to extract raw data from a file and present it in any manner necessary. First, test this from the command line, using the new capstats.csv file created by the capstats script:$ cat capstats.csv | gawk -F, ’{printf "%s %s - %sn", $1,$2, $4}’ 02/06/2008 10:39:57 - 0.26 02/06/2008 10:41:52 - 0.14 02/06/2008 10:50:01 - 0.06 02/06/2008 11:00:01 - 0.18 02/06/2008 11:10:01 - 0.03 02/06/2008 11:20:01 - 0.07 02/06/2008 11:30:01 - 0.03$

You need to use the -F option for the gawk command to define the comma as the field separator character in your data. After that, you can retrieve each individual data field and display it as you need.

For the report, we’ll be using HTML format. This allows us to create a nicely formatted report with a minimum amount of work. The browser that displays the report will do all the hard work of formatting and displaying the report. All you need to do is insert the appropriate HTML tags to format the data.

The easiest way to display spreadsheet data in HTML is using the ‹table› tag. The table tag allows you to create a table with rows and cells (called divisions in HTML-speak). You define the start of a row using the ‹tr› tag, and the end of the row with the ‹/tr› tag. Similarly, you define cells using the ‹td› and ‹/td› tag pair.
The HTML for a full table looks like this:

‹html›
‹body›
‹h2›Report title‹/h2›
‹table border="1"›
‹tr›
‹td›Date‹/td›‹td›Time‹/td›‹td›Users‹/td›
‹/tr›
‹tr›
‹td›02/05/2008‹/td›‹td›11:00:00‹/td›‹td›4‹/td›
‹td›0.26‹/td›‹td›57076‹/td›‹td›87‹/td›
‹/tr›
‹/table›
‹/body›
‹/html›

Each data record is part of a ‹tr›/‹/tr› tag pair. Each data field is within its own ‹td›/‹/td› tag pair. When you display the HTML report in a browser, the browser creates the table automatically for you, as shown in Figure below.

Figure: Displaying data in an HTML table

For the script, all you need to do is generate the HTML heading code by using echo commands, generate the data HTML code by using the gawk command, then close out the table, again by using echo commands.

Once you have your HTML report generated, you’ll want to redirect it to a file for mailing. The mutt command is a great tool for easily sending e-mail attachments.

Here’s the reportstats script, which will generate the HTML report and mail it off:

$cat reportstats #!/bin/bash # parse capstats data into daily report FILE=/home/rich/capstats.csv TEMP=/home/rich/capstats.html MAIL=which mutt DATE=date +"%A, %B %d, %Y" echo "‹html›‹body›‹h2›Report for$DATE‹/h2›" › $TEMP echo "‹table border="1"›" ››$TEMP
echo "‹tr›‹td›Date‹/td›‹td›Time‹/td›‹td›Users‹/td›" ›› $TEMP echo "‹td›Load‹/td›‹td›Free Memory‹/td›‹td›%CPU Idle‹/td›‹/tr›" ››$TEMP
cat $FILE | gawk -F, ’{ printf "‹tr›‹td›%s‹/td›‹td›%s‹/td›‹td›%s‹/td›",$1, $2,$3;
printf "‹td›%s‹/td›‹td›%s‹/td›‹td›%s‹/td›n‹/tr›n", $4,$5, $6; }’ ››$TEMP
echo "‹/table›‹/body›‹/html›" ›› $TEMP$MAIL -a $TEMP -s "Stat report for$DATE" rich ‹ /dev/null
rm -f $TEMP$

Since the mutt command uses the file name of the attached file as the attachment file name, it’s best not to create the report file using the mktemp command. Instead, I gave the file a more descriptive name. The script deletes the file at the end, so it’s not too important where you create the file.

Running the script

After creating the reportstats script, give it a test run from the command line and see what happens:

$./reportstats$

Well, that wasn’t too exciting. The real test now is to view your mail message, preferably in a graphical e-mail client such as KMail or Evolution. Figure below demonstrates viewing the message from the Evolution e-mail client.

The Evolution e-mail client provide the option of either viewing the attachment separate from the client window, or within the client window. Figure below demonstrates viewing the attached report within the client window. Notice how the data is all nicely formatted using the HTML tables, just as it was when viewing from the browser!Viewing the report attachment in Evolution