Available in Classic and VPC
You might run into the following problems when using Cloud Insight. Find out causes and possible solutions.
Troubleshoot metric collection failure issues
Metric suddenly stops being collected.
Cause
Cloud Insight Agent may not work properly after a point when the disk usage of the root(/) path is above 99%.
Solution
Secure or check disk capacity for the path, and restart Agent to see whether Metric collection works properly.
Troubleshoot inconsistency issues between Server's proc_mem_usert and Memory’s mem_usert
The proc_mem_usert in the server is higher than mem_usert in the memory.
Cause
The description for each metric is as follows.
| Item | Description |
|---|---|
| SERVER/proc_mem_usert | Memory usage rate for the entire process in the server |
| MEMORY/mem_usert | Memory usage rate for the whole server |
Generally, the server memory is used by multiple elements other than processes, so MEMORY/mem_usert tends to be greater than SERVER/proc_mem_usert.
SERVER/proc_mem_usert can be larger than MEMORY/mem_usert in the following cases.
SERVER/proc_mem_usertis the sum of Local Memory occupied by the process + Shared Memory referred to by the process (RSS) used by all processes. If multiple processes refer to the same Shared Memory page, then the sum of RSS can be aggregated to be higher than the actual memory usage rate since they are added repeatedly to the RSS.- The value for RSS is only updated when a process is using the CPU. Under a situation where the CPU load is very high, the CPU may not be assigned to each process. In this case, the update of RSS values may not be done properly. Thus, the sum of all RSS values may be greater than the actual memory usage rate.
Solution
To check the exact memory usage rate, use MEMORY/mem_usert rather than SERVER/proc_mem_usert.
Troubleshoot inconsistency issues between CPU's used_rto are Server’s avg_cpu_used_rto
The CPU's used_rto is higher or lower than the server's avg_cpu_used_rto.
Cause
The description for each metric is as follows.
| Item | Description |
|---|---|
| CPU/used_rto | Each vCPU’s usage rate |
| SERVER/avg_cpu_used_rto | The average CPU usage rate in the entire server |
Due to the characteristics of the Linux architecture, a specific process has a tendency to use a specific CPU more, rather than using all CPUs equally. In such a case, CPU/used_rto may appear higher or lower than SERVER/avg_cpu_used_rto.
Solution
To check the exact average CPU usage rate of the server, use SERVER/avg_cpu_used_rto rather than CPU/used_rto.
Troubleshoot inclusion issues for the metric uncollected during widget creation
The metric not collected during widget creation is included in the metrics list.
Cause
While creating the widget, the list of all metrics provided by the selected product is displayed. Even if a currently uncollected metric is added to the widget, it is displayed on the widget if it is collected later.
However, if additional settings are required for metrics collection (such as detailed monitoring and plugin settings), if the metric does not support the target resource, or if the server is Server(Classic) or Server(VPC), the unsupported metrics may be displayed on some OSs.
Solution
Check the causes for the metric list’s display, and see the metric's description provided on the console.
If you're still having trouble finding what you need, click on the feedback icon and send us your thoughts and requests. We'll use your feedback to improve this guide.
Troubleshoot Process Plugin metric collection failure issues
I've registered Process Plugin, but the related metric is not collected.
Cause
The process name may have been properly registered.
Solution
- Check the PID of the process.
Check the PID of the process using the ps -ef | grep <프로세스명> command.
<Example>:
if the process is java -jar myapp.jar --port=8080, run the following command.
ps -ef | grep java
Check the PID of the process in the result.
- Check the exact run commands.
After checking the PID, you can view the run commands through /proc/{pid}/cmdline.
In cmdline files, the arguments are stored separately by null byte (\0).
To check all commands exactly, run the following command.
cat /proc/{pid}/cmdline | tr '\000' '\040'
For more information, see man official documentation.
- Register the process name using a wildcard.
It is recommended to register the process name to a wide range using a wildcard (*) first, and then check if the metrics are being collected properly.
Example:
- To collect metrics for all processes that include
java, register with*java*. - After checking if the metrics are collected properly, register with a narrower range, like
*java -jar myapp.jar*.
- Use metrics for exact monitoring.
- For more precise monitoring, the
process_countmetric is more useful than theis_process_upmetric. - For more information, see the Guide.