分析cpu使用情况的方法

2017-05-03

怎么分析cpu使用情况

要分析系统的CPU资源是否够的前提谁占用了CPU资源，占用了多少，时间多长。下面是一些衡量CPU闲忙程度的经用指标：

1)用户使用CPU的情况

CPU运行常规用户进程

CPU运行niced process

CPU运行实时进程

2)系统使用CPU的情况

用于系统调用

用于I/O管理：中断和驱动

用于内存管理：paging and swapping

用于进程管理：context switch and process start

3)WIO：由于进程等待I/O而使CPU处于空闲状态的比率，这些I/O主要指block I/O,raw I/O,VM paging/swapins;

4)CPU的空闲率，即除了上面的WIO以外的空闲情况;

5)CPU用于上下文交换的比率(Context Switch CPU utilization)

6)nice

7)real-time

运行进程队列的长度，即处于可运行状态的进程个数的大小，不过我们关心的是这些在等待CPU调度执行时所花的时间;

9)平均负载(load average)

CPU资源成为系统性能的瓶颈的征兆

CPU就像人的大脑，完成各种交给它的任务。如果任务太多，CPU就要忙不过来，它的运行效率就要下降。就像人生病会有一典型症状一样，当CPU资源成为系统性能的瓶颈时，它也有一些典型的症状：

很慢的响应时间(slow response time)

CPU空闲时间为零(zero percent idle CPU)

过高的用户占用CPU时间(high percent user CPU)

过高的系统占用CPU时间(high percent system CPU)

长时间的有很长的运行进程队列(large run queue size sustained over time)

processes blocked on prority

必须注意的是，如果系统出现上面的这些症状并不能说一定是由于CPU资源不够，事实，有些症状的出现很可能是由于其他资源的不足而引起，如内存不够时，CPU会忙内存管理的事，这时从表面上， CPU的利用是100%，甚至显得不够，如果据此就简单地认为增加CPU就可以解决问题是大错特错了。

因此，还是那句话，必须用不同的工具、从不同的方面对系统进行分析后，才能做出结论，即使这样，经验将起到不可替代的作用。

哪些进程是占用CPU资源的大户?

在操作系统中，并不是所有的进程都以同样的方式使用CPU资源。通常情况下，有些进程需要比其他进程更多的CPU时间片才能顺利地完成任务。下面是一些典型的占用CPU资源的大户：

进程创建(process creation)

终端字符进程(teminal character processes(MUX- and LAN-based)

计算密集型进程和实时进程

X-终端和X-服务器进程(X-terminals and X-servers)

利用SAR工具分析CPU的利用率

利用SAR进行CPU的利用率分析的命令形式：

#sar -u，这时数据是通过sa1在后台定时生成;

#sar -u 5 100，每隔5秒取样一次，共取100次;

SAR -u:Report CPU utilization (the default); portion of time running in one of several modes. On a multi-processor system, if the -M option is used together with the -u option, per-CPU utilization as well as the average CPU utilization of all the processors are reported. If the -M option is not used, only the average CPU utilization of all the processors is reported:

cpu: cpu number (only on a multi-processor system with the -M option);

%usr: user mode;

%sys: system mode;

%wio: idle with some process waiting for I/O (only block I/O, raw I/O, or VM pageins/swapins indicated);

%idle: otherwise idle;

对结果的分析

首先，我们看%idle列的值，如果为接近零，则再看对应%wio列的值，如果这列的大于7，则表明系统的磁盘或其他I/O可能有问题，需要进一步的分析：

用iostat命令分析各个磁盘的传输闲忙状况，如#iostat -t 5 2，每隔5秒取样一次，共取2次;

用sar -d命令分析各块设备(磁盘、磁带)活动情况;

用sar -b命令分析系统的缓存的活动情况;

用sar -w命令分析进程的deactivation/reactivation and switching activities of the system;

如果%idle列很小，而对应的%wio列的值也很小，这时，我们查看%usr列和%sys列的值。如果%usr列的值很大，说明有用户进程占用很多CPU时间;如果%sys列的值很大，则说明系统管理方面花了很多时间。需要进一步的分析：

用GlancePlus对占用CPU时间最大的进程进行单独分析，为什么它会占用如此多的CPU时间。

如果%sys列的值很大，可以用SAR -C命令对系统调用进行进一步分解，看这些系统调用主要是做些什么。同时，还必须分析是否有其他瓶颈，如paging也会引起%sys的值很大，这时，可以用sar -q查看系统的运行进程队列长度，也可以用GlancePlus和vmstat查看内存的使用情况;

利用SAR工具分析运行进程队列长度

利用SAR进行运行进程队列长度分析的命令形式：

#sar -q，这时数据是通过sa1在后台定时生成;

#sar -q 5 100，每隔5秒取样一次，共取100次;

SAR -q: Report average queue length while occupied, and percent of time occupied. On a multi-processor machine, if the -M option is used together with the -q option, the per-CPU run queue as well as the average run queue of all the processors are reported. If the -M option is not used, only the average run queue information of all the processors is reported:

cpu: cpu number (only on a multi-processor system with the -M option);

runq-sz: Average length of the run queue(s) of processes (in memory and runnable);

%runocc: The percentage of time the run queue(s) were occupied by processes (in memory and runnable);

swpq-sz: Average length of the swap queue of runnable processes (processes swapped out but ready to run);

%swpocc: The percentage of time the swap queue of runnable processes (processes swapped out but ready to run) was occupied.

对结果的分析：

这些数据越小越好。

如果runq-sz大于4，或者%swapocc大于5时，则表明系统的CPU或内存可能有问题，需要进一步的分析：

用sar -u命令分析CPU的使用情况;

用sar -w命令分析进程的deactivation/reactivation and switching activities of the system;

也可以用GlancePlus;

利用SAR工具分析系统调用

利用SAR进行系统调用分析的命令形式：

#sar -c，这时数据是通过sa1在后台定时生成;

#sar -c 5 100，每隔5秒取样一次，共取100次;

SAR -c: Report system calls:

scall/s: Number of system calls of all types per second;

sread/s: Number of read() and/or readv() system calls per second;

swrit/s: Number of write() and/or writev() system calls per second;

swpq-sz: Average length of the swap queue of runnable processes (processes swapped out but ready to run);

fork/s: Number of fork() and/or vfork() system calls per second;

exec/s: Number of exec() system calls per second;

rchar/s: Number of characters transferred by read system calls block devices only) per second;

wchar/s: Number of characters transferred by write system calls (block devices only) per second.

对结果的分析：

如果scall/s列的值很大，那么这么多的系统调用的原因就必须仔细分析了。

我们可以查看fork/s和exec/s列的值，看看系统是否在创建大量新的进程。

利用time命令测试某个命令和程序的执行效率

我们可以利用time命令来测试一个命令的执行效率，语法为：

time command

command is executed. Upon completion, time prints the elapsed time during the command, the time spent in the system, and the time spent executing the command. Times are reported in seconds.

Execution time can depend on the performance of the memory in which the program is running.

当我们觉得某个进程的性能不好时，最简单的方法就是利用time命令来查看一下进程执行时它的时间分布情况，然后再用其他工具进一步分析。

利用top命令查看最耗CPU资源的进程

我们可以利用top命令来查看最耗CPU资源的进程。top命令还会根据进程占用CPU资源的多少而动态改变。

它的语法为：

top [-s time] [-d count] [-q] [-u] [-h] [-n number]

其中各选项的含义为：

-s time: 屏幕刷新的时间间隔time，缺省为5秒;

-d count: 屏幕刷新count次后，top命令自己也退出;

-q: This option runs the top program at the same priority as if it is executed via a nice -20 command so that it will execute faster (see nice(1)). This can be very useful in discovering any system problem when the system is very sluggish. This option is accessibly only to users who have appropriate privileges.

-u: User ID (uid) numbers are displayed instead of usernames. This improves execution speed by eliminating the additional time required to map uid numbers to user names.

-h: Hides the individual CPU state information for systems having multiple processors. Only the average CPU status will be displayed.

-n number: Show only number processes per screen. Note that this option is ignored if number is greater than the maximum number of processes that can be displayed per screen.

在top命令运行时，我们可用以下几个快捷键来翻屏：

j: 向前翻;

k: 向后翻;

t: 回到第一页;

对结果的分析：

通过top命令，我们可以快速了解到目前系统的CPU资源使用情况，尤其是占用CPU资源最多的进程是我们必须关注的对象。

我们通过RES(the current size of the process resident in memory)列可以知道每个进程占用内存的数量。

我们通过NICE列可以知道系统是否使用NICE值来调节该进程的工作负载平衡。

利用uptime命令查看系统整体情况

uptime prints the current time, the length of time the system has been up, the number of users logged on to the system, and the average number of jobs in the run queue over the last 1, 5, and 15 minutes.

w is linked to uptime and prints the same output as uptime -w, displaying a summary of the current activity on the system.

它的语法为：

uptime [-hlsuw] [user]

w [-hlsuw] [user]

其中各选项的含义为：

-h: Suppress the first line and the heading line. This option should not be used with the -u option. This option assumes the use of the -w option to uptime.

-l: Use long output. This option assumes the use of the -w option to uptime.

-s: Use the short form. of output for displaying terminal information. The terminal name is abbreviated; the login time and CPU times are suppressed.

-u: Print only the first line describing the overall state of the system. This is the default for the uptime command.ormation for systems having multiple processors. Only the average CPU status will be displayed.

-w: Print a summary of the current activity on the system for each user. This is the default for the w command.

分析c

分析cpu使用情况的方法

相关话题

怎么分析cpu使用情况

更多相关阅读

最新发布的文章