Performance tools: top

The top command is one of the most familiar performance tools. Most system administrators run top to see how their Linux and UNIX systems are performing. The top utility provides a great way to monitor the performance of processes and Linux as a whole. top can be run as a normal user as well as root.

The top display has two parts. The first third or so shows information about Linux as a whole. The remaining lines are filled with individual process information. If the window is stretched, more processes are shown to fill the screen.

Much general Linux information can be obtained by using several other commands instead of top. It is nice to have it all on one screen from one command, though. The first line shows the load average for the last one, five, and fifteen minutes. Load average indicates how many processes are running on a CPU or waiting to run. The uptime command can be used to display load averages as well. Next comes process information, followed by CPU, memory, and swap. The memory and swap information is similar to the free command output. After we determine memory and CPU usage, the next question is, which processes are using it?

Most of the process information can be obtained from the ps command too, but top provides a nicer format that is easier to read. The most useful interactive top command is h for help, which lists top’s other interactive commands.

Output Explained

Let’s take a look at what the information from top means. We’ll use the following output from top as an example:

16:30:30  up 16 days,  7:35,  2 users,  load average: 0.54, 0.30, 0.11
73 processes: 72 sleeping, 1 running, 0 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait     idle
           total   13.3%    0.0%   20.9%   0.0%   0.0%    0.0%    65.7%
Mem:   511996k av,  498828k used,   13168k free,  0k shrd,  59712k buff
                    387576k actv,   68516k in_d,  9508k in_c
Swap:  105832k av,    2500k used,  103332k free            343056k cached  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
10250 dave      20   0  1104 1104   888 R     3.8  0.2   0:00   0 top
10252 root      23   0   568  568   492 S     0.9  0.1   0:00   0 sleep
    1 root      15   0   512  512   452 S     0.0  0.1   0:04   0 init

The first line from top displays the load average information:

16:30:30 up 16 days, 7:35, 2 users, load average: 0.54, 0.30, 0.11

This output is similar to the output from uptime. You can see how long Linux has been up, the time, and the number of users. The 1-, 5-, and 15-minute load averages are displayed as well. Next, the process summary is displayed:

73 processes: 72 sleeping, 1 running, 0 zombie, 0 stopped

We see 73 total processes. Of those, 72 are sleeping, and one is running. There are no zombies or stopped processes. A process becomes a zombie when it exits and its parent has not waited for it with the wait(2) or waitpid(2) functions. This often happens because the parent process exits before its children. Zombies don’t take up resources other than the entry in the process table. Stopped processes are processes that have been sent the STOP signal. See the signal(7) man page for more information.

Next up is the CPU information:

CPU states:  cpu  user   nice  system   irq  softirq  iowait   idle
           total 13.3%   0.0%   20.9%  0.0%     0.0%    0.0%  65.7%

The CPU lines describe how the CPUs spend their time. The top command reports the percentage of CPU time spent in user or kernel mode, running niced processes, and in idleness. The iowait column shows the percentage of time that the processor was waiting for I/O to complete while no process was executing on the CPU. The irq and softirq columns indicate time spent serving hardware and software interrupts. Linux kernels earlier than 2.6 don’t report irq, softirq, and iowait.

The memory information is next:

Mem: 511996k av, 498828k used, 13168k free,   0k shrd,  59712k buff
                 387576k actv, 68516k in_d,   9508k in_c

The first three metrics give a summary of memory usage. They list total usable memory, used memory, and free memory. These are all you need to determine whether Linux is low on memory.

The next five metrics identify how the used memory is allocated. The shrd field shows shared memory usage and buff is memory used in buffers. Memory that has been allocated to the kernel or user processes can be in three different states: active, inactive dirty, and inactive clean. Active, actv in top, indicates that the memory has been used recently. Inactive dirty, in_d in top, indicates that the memory has not been used recently and may be reclaimed. In order for the memory to be reclaimed, its contents must be written to disk. This process is called “laundering” and can be called a fourth temporary state for memory. Once laundered, the inactive dirty memory becomes inactive clean, in_c in top.

The swap information is next:

Swap: 105832k av, 2500k used, 103332k free   343056k cached

The av field is the total amount of swap that is available for use, followed by the amount used and amount free. Last is the amount of memory used for cache by the kernel.

The rest of the top display is process information:

  PID USER    PRI NI SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
10250 dave     20  0 1104 1104   888 R     3.8  0.2   0:00   0 top
10252 root     23  0  568  568   492 S     0.9  0.1   0:00   0 sleep
    1 root     15  0  512  512   452 S     0.0  0.1   0:04   0 init

Saving Customization

A very nice top feature is the capability to save the current configuration. Change the display as you please using the interactive commands and then press w to save the view. top writes a .toprc file in the user’s home directory that saves the configuration. The next time this user starts top, the same display options are used.

top also looks for a default configuration file, /etc/toprc. This file is a global configuration file and is read by top when any user runs the utility. This file can be used to cause top to run in secure mode and also to set the refresh delay. Secure mode prevents non-root users from killing or changing the nice value of processes. It also prevents non-root users from changing the refresh value of top. A sample /etc/toprc file for our Red Hat Enterprise Linux ES release 3 looks like the following:

$ cat /etc/toprc
s3

The s indicates secure mode, and the 3 specifies three-second refresh intervals. Other distributions may have different formats for /etc/toprc. The capability to kill processes is a pretty nice feature. If some user has a runaway process, the top command makes it easy to find and kill. Run top, show all the processes for a user with the u command, and then use k to kill it. top not only is a good performance monitoring tool, but it can also be used to improve performance by killing those offensive processes.

Batch Mode

top can also be run in batch mode. Try running the following command:

$ top n 1 b >/tmp/top.out

The -n 1 tells top to only show one iteration, and the -b option indicates that the output should be in text suitable for writing to a file or piping to another program such as less. Something like the following two-line script would make a nice cron job:

# cat /home/dave/top_metrics.sh
echo "**** " 'date' " ****" >> /var/log/top/top.'date +%d'.out
/usr/bin/top -n 1 -b >> /var/log/top/top.'date +%d'.out

We could add it to crontab and collect output every 15 minutes.

# crontab -l
*/15 * * * * /home/dave/top_metrics.sh

The batch output makes it easy to take a thorough look at what is running while enjoying a good cup of coffee. All the processes are listed, and the output isn’t refreshing every five seconds. If a .toprc configuration file exists in the user’s home directory, it is used to format the display. The following output came from the top batch mode running on a multi-CPU Linux server. Note that we don’t show all 258 processes from the top output.

10:17:21  up 125 days, 10:10,  4 users,  load average: 3.60, 3.46, 3.73
258 processes: 252 sleeping, 6 running, 0 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
           total   41.0%    0.0%   21.4%   0.4%     0.4%    0.0%   36.5%
           cpu00   36.7%    0.0%   22.6%   1.8%     0.0%    0.0%   38.6%
           cpu01   46.2%    0.0%   17.9%   0.0%     0.9%    0.0%   34.9%
           cpu02   32.0%    0.0%   28.3%   0.0%     0.0%    0.0%   39.6%
           cpu03   49.0%    0.0%   16.9%   0.0%     0.9%    0.0%   33.0%
Mem:  4357776k av, 4321156k used,   36620k free,       0k shrd,  43860k buff
                   3261592k actv,  625088k in_d,   80324k in_c
Swap: 1048536k av,  191848k used,  856688k free          3920940k cached  PID USER    PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM     TIME CPU COMMAND
17599 wwrmn    21   0  9160 6900  1740 R    12.2  0.1    0:01    1 logsw
 1003 coedev   15 -10 71128  65M 66200 S <   8.0  1.5  414:42    2 vmware-vmx
17471 wwrmn    15   0 10116 7868  1740 S     6.8  0.1    0:12    2 logsw
17594 wwrmn    18   0  9616 7356  1740 R     4.4  0.1    0:01    0 logsw
 6498 coedev   25   0 43108  36M 33840 R     4.0  0.8   9981m    1 vmware-vmx
17595 wwrmn    17   0  8892 6632  1740 S     3.0  0.1    0:01    3 logsw
17446 wwrmn    15   0 10196 7960  1740 S     2.8  0.1    0:13    3 logsw
17473 wwrmn    15   0  9196 6948  1740 S     2.8  0.1    0:02    1 logsw
17477 wwrmn    15   0  9700 7452  1740 S     2.3  0.1    0:04    2 logsw
  958 coedev   15 -10 71128  65M 66200 S <   2.1  1.5   93:53    3 vmware-vmx
 7828 coedev   15 -10 38144  33M 33524 S <  1.8   0.7   4056m    1 vmware-vmx
 6505 coedev   25   0     0    0     0 RW   1.8   0.0   3933m    1 vmware-rtc
 7821 coedev   15 -10 38144  33M 33524 S <  1.6   0.7   6766m    1 vmware-vmx
 6478 coedev   15 -10 43108  36M 33840 S <  1.6   0.8   6224m    0 vmware-vmx
17449 wwrmn    15   0  9820 7572  1740 S    1.6   0.1    0:07    3 logsw
 7783 coedev   15   0 47420  15M  1632 S    1.4   0.3   1232m    3 vmware
 6497 coedev   15 -10 43108  36M 33840 S <  0.9   0.8   3905m    1 vmware-vmx
 1002 coedev   15 -10 71128  65M 66200 S <  0.9   1.5   59:54    2 vmware-vmx
17600 jtk      20   0  1276 1276   884 R    0.9   0.0   0:00     2 top
 7829 coedev   25   0 38144  33M 33524 R    0.7   0.7   6688m    0 vmware-vmx
    1 root     15   0   256  228   200 S    0.0   0.0    2:25    0 init

By now you can see why top is such a popular performance tool. The interactive nature of top and the ability to easily customize the output makes it a great resource for identifying problems.