Note: This article originally ran in Full Circle Magazine Issue #61, May, 2012.
<h2>Use The TOP Command</h2>
One of the great advantages of using Linux is that there are some great tools available to help you understand what is going on with your computer and diagnose possible problems. One of the most useful is the top command. I am going to cover some of the things you can do, and maybe mention one or two other commands as well.
First of all, just to get it out of the way, there is an alternative called htop, and I do plan to cover it later. But htop generally needs to be installed before you can use it, while top should already be on your system, making it a good
starting point. Usage of the command is simple: just open a terminal/console and type top. The result will be something like the image shown here.
There is a lot of information on this screen, so it will take us a little time to go through all of the options. What you can see right away is that this is listing processes running on your computer, and you can see the Process ID for each one, how much RAM each one is using, what percentage of the CPU each one is using, the owner of each process, etc. Then you can see all sorts of cryptic numbers above this listing. We will cover all of it either in this article or one to follow, but to get there we need to get going!
By default, top lists processes in order of the amount of CPU each one is using, expressed as a percentage of the total available. This is important to know, since if your CPU is maxing out you will see degraded performance. This can show up as lags in responding to keyboard and/or mouse input, jerkiness on audio or video playback, etc. On my Kubuntu desktop, I have a side panel set up with monitors for CPU usage, CPU temperature, Memory usage, Swap usage, and network traffic – so that I can monitor these critical functions and prevent problems from getting out of hand. I have seen situations where the CPU usage maxed out at 100% and stayed there (usually as a result of Flash, which cannot die soon enough, but that is a rant for another day). When that happens, the top command lets me quickly check and see what application is problematic so I can kill it.
One of the nice things about the top command is that it is interactive as long as you have it up in the terminal. So, you can kill a process quite easily by simply typing a k with the terminal open and top running. This will open a prompt above the process list asking you which process to kill. Just enter the Process ID of the misbehaving application and it will be gone.
Now, if you are looking at the screenshot of top running on my computer you may have noticed something. I said that it gave the CPU usage as a percentage of the total. And if you looked carefully you might have seen that the percentages add up to more than 100%. How can that be, you ask? Well, the answer is that it is looking at these as percentages of the core that the process is running on. Since this computer is a dual-core machine, it has two processors and can distribute individual processes to whichever core it wishes. So I could theoretically see up to 200% if I added up the numbers here (though that would be bad since it would indicate I was maxed out). If I had a quad-core, I could have up to 400%, etc.
Priority and Niceness
The idea of niceness is to determine which processes should get more goodies when running, and which should be put in the background. In other words, to set some priorities of access to the CPU. This is done by using a niceness number, which appears in the column NI. In the screen capture you see that all of these processes are running at a niceness number of 0. What that means is that they are running at a default priority which has not been altered in any way. Niceness numbers run from -20 to +19, with -20 being the highest priority. I said that 0 is the default, but you can check it on your system by running the command nice without any arguments. What is returned is the default niceness level. I will leave this topic here for now, but if you want to know more there is a good web page on this topic here. This article will explain how you can change niceness levels for certain processes if you wish to do so.
Next to the column on CPU usage in the screenshot is a column for memory usage, again expressed as a percentage of the total available. In this case, it happens that the process using the most CPU is also the one using the most memory, which is not unusual. But suppose you wanted to see your processes sorted in the order of the memory they consume? Well, as I mentioned before, the top command is interactive. To change the sort order, just press an upper-case letter O while the command is running in the terminal. This brings up a very useful screenful of sort options:
With this screen open you can press a lower case letter n, then enter, and get a listing in order of memory percentage used. Or you can sort in other ways if needed.
Again, this is useful if you find you are running out of memory and need to know where it is going. If one process is using a lot of memory unexpectedly, that would definitely be an indication. That does imply you have some idea of what constitutes normal in these situations. The best way to build a sense of that is to check periodically, and observe what is going on. In my case, I build my machines with 16GB of RAM these days, so I don’t expect to see very high percentage usage in most cases. For instance, right now I have my bottom panel filled with program icons for programs I have open (18 of them right now), and a quick scan of the output of top shows I am using somewhere in the 35-40% range of my total memory.
In the screenshot, you do see one big memory hog, but that is actually expected. I had VirtualBox open and running a virtual machine at the time, and I had configured it to use 4GB of RAM (plus other resources, of course.) So, in this case, I did see what I expected to see. But if I saw Firefox using that much memory, I would know it was a problem and I would shut it down promptly.
q for Quit
If you have top running in your terminal, you might want to know how to get out of it, and that is simple, just press the letter q (for Quit), and you will be back at your terminal prompt. You can get more information in either of two ways: the old-school way is to type man top in the terminal, but the new, improved, way is to type info top. Though I think you will find the same result either way. The point is that this is a rich command with a lot of options.
Interpretation of System Data
We’ve looked at some basics of the top command, and focused on looking at the process listings to spot and correct possible problems. This is still a very useful thing, of course, but there is also a lot of system data in the output that is useful. Recall the screenshot we used last time to display the output of this command (shown above).
Now we want to focus on those numbers on the top that arepresenting some very useful system data. So let’s start at the top (literally):
Line 1, the Top line
On the first line (above), we have the uptime. Actually, this is information you could get using the uptime command as well:
So this is a clue that the top command is gathering information that is available individually from other commands and bringing it together in one package of awesome goodness. Very convenient that is.
So in order we see that the time is 15:52:05, and the computer has been up over 17 days. It has two users right now, which is normal. One of the users is root, but you should never run as root for ordinary activities. That is a dangerous and insecure practice. As long as you are logged in as root, any software that runs on your system has root-level privileges. The preferred way to run is to create an ordinary user who does not have quite so high a level of rights, and run as that user. In this case, that user is kevin. By the way, Ubuntu makes it very difficult for you to do anything other than this procedure. If you need higher rights, you use the sudo command to give yourself temporary privileges.
The last part of this line is called load. These three numbers are giving the load for the previous 1, 5, and 15 minutes. But what is load? It is the average number of processes that are runnable, or are uninterruptible. Basically, without getting too technical, it is how occupied the CPU is most of the time. But the wrinkle is that it is not adjusted (normalized) for the number of CPUs. What this means is that a single CPU system with a load of 1 is loaded all of the time. But on my dual core system, I never got to 2, so I am OK. If you had a quad-core, the magic number would be 4, and so on.
Line 2, Tasks
There is nothing interesting to see here. That last category, zombie, sounds like it ought to be at least interesting, but it really isn’t. Zombie processes are runs that have finished running, and will shortly be closed.
Line 3, CPU(s)
This is worth a look or two. Last time we looked at the processes on the bottom of the top command’s output to see if any one process was hogging things. On this line, instead of looking at the individual processes, we are looking at the total picture of what is going on. And here we don’t need to worry about how many cores we have, these numbers aggregate all of the data for all cores.
The first statistic is %us, which in this case is 32.0%. This is the percentage of the CPU cycles that are taken up by user processes. This does not necessarily mean ones that a person started, they can mean processes kicked off by Apache, MySQL, etc. If this percentage is very high, it can be an indication of a problem, since we have other demands to consider. For example, the next statistic is %sy, which is the percentage of CPU cycles that are taken up by the kernel and by other system processes. Obviously you need to have some cycles available for this or you won’t have a functioning computer. The third one, %id, is percentage of time the CPU is idle, and the higher the better here (within reason, you need to actually use the computer!). As long as you have some reasonable idle time available, you probably don’t have a problem. You can double-check this by looking at the fourth statistic, %wa. This is the percentage of time that a process had to wait for access to the CPU. In this case, .2% is good. You won’t be likely to see this at 0.0% too much, since, by the nature of computing, processes are competing for CPU time, but a high number here would definitely indicate a problem.
The rest of the statistics are pretty ignorable, as they deal with really obscure issues, but you can look them in the man page for top.
Lines 4 & 5, Memory and Swap
These two lines are best addressed together, since you need to combine this information to tell a complete story. What we need to know is how much memory is being used, and how much is available, at any one time. This is important because lack of RAM is the most common cause of a slow, sluggish computer. This can sometimes look like a different problem altogether, which is why it is important to look at the actual data. For instance, if you noticed your hard drive was constantly “chattering” (known as thrashing), you might think you had a hard drive or I/O problem, but in fact this is most commonly caused by a lack of RAM. When there is not enough RAM to hold all of the program code and data currently in use, some of it gets copied out to the hard drive (called paging) to free up space for other code and data. The place where this data gets copied is called the swap area. So when your hard drive is constantly thrashing, it usually means that code and data is constantly being written to and read from the swap area, and more RAM would eliminate this problem.
Now, one of the things you need to understand to interpret this data is that writing to the hard drive and reading from it is approximately 4 gazillion times slower than reading and writing to RAM. So you want to minimize the
use of swap for performance reasons. But because RAM is so much faster than the hard drive, the operating system will prefer to use it whenever possible. One way to speed things up is to keep code in memory even when you have closed the program. After all, you might open it up again, and pulling it from RAM will speed it up a lot. So the operating system caches a lot of code in RAM that is not currently being actively used. Because of this, the reported RAM usage will look like you are on the verge of running out, but this may not be the case. You need to look at all of the data to assess this.
In this case, we start off by noting that this machine has 15,949,272k of RAM. In other words, 16GB, which I knew because that is what I installed in this box. And the next number says that practically all of this 16GB is being used. Is this a problem? Not really. If you look at the second line, you see that I have 6GB of swap space, but hardly any of it is being used (I am using just under 2MB of RAM here). And the last number tells the story. Of my 16GB of RAM, fully half of it, 8GB, is being used to cache code. If I wanted to open a program that was already in the cache, great, the code is already there and it will open quickly. If I want to open some other program, the operating system will delete some of the code that is in cache to free up the space, so there is no problem.
Htop, the Alternative
I actually prefer top, but some people like htop better, and I think you will see why. For some things it is easier to work with, particularly if you need to do some tasks related to processes. But note that it is not installed by default on many systems, so you will need to install it first. On Ubuntu machines, install it with:
sudo aptget install htop
First, you can see that it presents much the same data on individual processes as the top command. Processes are still listed in order of CPU usage by default, you still see the process ID, User, CPU%, and MEM%, just as before. You can see the command that launched the process, instead of just the program name. Unlike top, htop lets you scroll horizontally using the arrow keys.
On htop, you do have one interesting addition, which is a separate graphical display of the CPU usage for each CPU or core that you have, in this case 1 and 2 since it is a dual-core machine. And you can see the memory and swap usage in ways that you might find easier to read. Uptime. Loads, and Tasks, are shown on the top right.
The real advantage of htop comes when you want to do something to one or more of your processes. You simply use the up and down arrow to highlight the process, then use one of the function keys shown on the bottom. For instance, if you highlight a process and then press F9 you will kill the process. Pressing F7 (Nice -) will lower the nice number, thus increasing the priority (yes, this is not intuitive). And pressing F8 will reduce the priority by raising the nice number. But be aware that to give a really high priority to a process you would need to have root access, perhaps by using the command
Personally, I don’t have much reason to mess with this, but the worst that could happen is that you would need to reboot your computer if you really screw it up.
Other function keys let you quickly change the sort order, change the field to sort on, and so on.
In summary, I think htop is very useful, but I tend to use top more often for two reasons. First is that I like the more detailed information it gives me. And second is that I know it will be available on any system I am likely to sit down to, while htop will need to be installed, and that means a working Internet connection,which I might not have. But, in general, these two commands do much the same thing, and are a crucial addition to your Linux tool kit.