Troubleshoot metric issues

Available in Classic and VPC

You might run into the following problems when using Cloud Insight. Find out causes and possible solutions.

Troubleshoot metric collection failure issues

Metric suddenly stops being collected.

Cause

Cloud Insight Agent may not work properly after a point when the disk usage of the root(/) path is above 99%.

Solution

Secure or check disk capacity for the path, and restart Agent to see whether Metric collection works properly.

Troubleshoot inconsistency issues between Server's proc_mem_usert and Memory’s mem_usert

The proc_mem_usert in the server is higher than mem_usert in the memory.

Cause

The description for each metric is as follows.

Item	Description
SERVER/proc_mem_usert	Memory usage rate for the entire process in the server
MEMORY/mem_usert	Memory usage rate for the whole server

Generally, the server memory is used by multiple elements other than processes, so MEMORY/mem_usert tends to be greater than SERVER/proc_mem_usert.

SERVER/proc_mem_usert can be larger than MEMORY/mem_usert in the following cases.

SERVER/proc_mem_usert is the sum of Local Memory occupied by the process + Shared Memory referred to by the process (RSS) used by all processes. If multiple processes refer to the same Shared Memory page, then the sum of RSS can be aggregated to be higher than the actual memory usage rate since they are added repeatedly to the RSS.
The value for RSS is only updated when a process is using the CPU. Under a situation where the CPU load is very high, the CPU may not be assigned to each process. In this case, the update of RSS values may not be done properly. Thus, the sum of all RSS values may be greater than the actual memory usage rate.

Solution

To check the exact memory usage rate, use MEMORY/mem_usert rather than SERVER/proc_mem_usert.

Troubleshoot inconsistency issues between CPU's used_rto are Server’s avg_cpu_used_rto

The CPU's used_rto is higher or lower than the server's avg_cpu_used_rto.

Cause

The description for each metric is as follows.

Item	Description
CPU/used_rto	Each vCPU’s usage rate For example, if vCPU is 4, the usage rate for either of cpu_idx: 0–3.
SERVER/avg_cpu_used_rto	The average CPU usage rate in the entire server

Due to the characteristics of the Linux architecture, a specific process has a tendency to use a specific CPU more, rather than using all CPUs equally. In such a case, CPU/used_rto may appear higher or lower than SERVER/avg_cpu_used_rto.

Solution

To check the exact average CPU usage rate of the server, use SERVER/avg_cpu_used_rto rather than CPU/used_rto.

Troubleshoot inclusion issues for the metric uncollected during widget creation

The metric not collected during widget creation is included in the metrics list.

Cause

While creating the widget, the list of all metrics provided by the selected product is displayed. Even if a currently uncollected metric is added to the widget, it is displayed on the widget if it is collected later.
However, if additional settings are required for metrics collection (such as detailed monitoring and plugin settings), if the metric does not support the target resource, or if the server is Server(Classic) or Server(VPC), the unsupported metrics may be displayed on some OSs.

Solution

Check the causes for the metric list’s display, and see the metric's description provided on the console.

Note

If you're still having trouble finding what you need, click on the feedback icon and send us your thoughts and requests. We'll use your feedback to improve this guide.

Troubleshoot Process Plugin metric collection failure issues

I've registered Process Plugin, but the related metric is not collected.

Cause

The process name may have been properly registered.

Solution

Check the PID of the process.

Check the PID of the process using the ps -ef | grep <프로세스명> command.

<Example>:
if the process is java -jar myapp.jar --port=8080, run the following command.

ps -ef | grep java

Check the PID of the process in the result.

Check the exact run commands.

After checking the PID, you can view the run commands through /proc/{pid}/cmdline.
In cmdline files, the arguments are stored separately by null byte (\0).
To check all commands exactly, run the following command.

cat /proc/{pid}/cmdline | tr '\000' '\040'

For more information, see man official documentation.

It is recommended to register the process name to a wide range using a wildcard (*) first, and then check if the metrics are being collected properly.

Example:

To collect metrics for all processes that include java, register with *java*.
After checking if the metrics are collected properly, register with a narrower range, like *java -jar myapp.jar*.

Use metrics for exact monitoring.

For more precise monitoring, the process_count metric is more useful than the is_process_up metric.
For more information, see the Guide.

Documentation Index

Troubleshoot metric issues

Troubleshoot metric collection failure issues

Cause

Solution

Troubleshoot inconsistency issues between Server's proc_mem_usert and Memory’s mem_usert

Cause

Solution

Troubleshoot inconsistency issues between CPU's used_rto are Server’s avg_cpu_used_rto

Cause

Solution

Troubleshoot inclusion issues for the metric uncollected during widget creation

Cause

Solution

Troubleshoot Process Plugin metric collection failure issues

Cause

Solution