Available in Classic and VPC
You might run into the following problems when using Cloud Insight. Find out causes and possible solutions.
Event occurrence despite no issues during monitoring with Server (VPC)'s is_process_up
Event occurs while monitoring with is_process_up of Server (VPC), even when there is no issue.
Cause
The is_process_up data of Plugin Process is collected based on when the PID of the process name you registered is newly created. If a process name including an asterisk (*) has been registered, the PID list of all matching processes becomes the target.
The conditions under which is_process_up fluctuates are as follows:
- is_process_up = 1: when the PID list is maintained, or new PIDs are added
- is_process_up = 0: when some or all of the PID list disappears
Therefore, is_process_up can be 0 even though the main process is normal in the following cases:
- If a sub process of the main process is temporarily created and deleted
- If a sub process of the main process is temporarily deleted and created
- If a main process has fewer sub processes
Example: when you register *httpd* as a process name, PID change over time and is_process_up / process_count metric value
| Time | PID (Main) | PIDs (sub) | is_process_up | process_count | Detail |
|---|---|---|---|---|---|
| 12:00 | 123 | - | 1 | 1 | No sub process |
| 12:01 | 123 | 124, 125 | 1 | 3 | Create sub process. |
| 12:02 | 123 | 124 | 0 | 2 | Delete part of a sub process. |
| 12:03 | 123 | 124, 126 | 1 | 3 | Create sub process. |
| 12:04 | 123 | 124, 127 | 0 | 3 | Update part of a sub process. |
| 12:05 | 123 | - | 0 | 1 | Delete all sub processes. |
| 12:06 | - | - | 0 | 0 | Delete main process. |
Solution
To review the Apache service's normality, process names such as *httpd* are often monitored. In such cases, monitoring with is_process_up may not be normally performed. If the Apache service is terminated, the process_count of *httpd* will be 0. Therefore, we recommend monitoring with conditions such as process_count == 0.
Display of the dimensions of a file, process, and port plugin which were deleted during Event Rule creation
Dimensions of a file, process, and port plugin, which were deleted when creating an event rule, continue to be displayed.
Cause
Dimensions of a deleted file, process, and port plugin may be displayed for up to 2 days on the Event Rule creation screen after deletion. The metric information that is collected during the deletion of a file, process, or port plugin is immediately deleted. Dimension information, however, is deleted only if metric information stays uncollected for 2 days.
Solution
The dimension is not deleted until its metric stays uncollected for 2 days. Check it again after the period passes.
Mismatch between the total rule count value in Event Rule and the values of monitoring targets and monitoring items
I have created an Event Rule, but the total rule count value does not match the values of monitoring targets and monitoring items.
Cause
The total rule count of Event Rule is assessed only based on actually created rules. In this case, whether the rule is actually created depends on whether the configured monitoring target is collecting metrics for monitoring items.
For example, if you have 3 monitoring targets, but the monitoring item metrics are being collected for 2 of them, then the total rule count is indicated as 2, not 3.
Solution
Check the cases in which monitoring item metrics are not collected for some of the monitoring targets.
- When some of the monitoring target servers are not being collected for dimensions set for monitoring item metric
- When some of the monitoring target servers are not configured for detailed monitoring although it is necessary because the monitoring item metric's type is in Extended mode
- When some metric collections are stopped because some monitoring target servers stopped
- When some metric collections are not properly working due to an internal firewall or firewall solution
- When metric collection for some monitoring target servers is not normally working due to Agent problems
- When the metric is later collected for monitoring targets, which were originally excluded from the total rule count because the metric collection had not been taken at the time of the event rule creation (In this case, the target is automatically added to the total rule count.)
Event occurrence despite failure to meet the event trigger conditions
I changed the conditions after the event occurrence, but an event occurred again although the changed conditions were not met.
Cause
If you change the conditions for an event which is existing, the existing event ends and the end event notification occurs under the conditions configured at the time.
The examples where the elements such as duration were not considered are as follows:
| Time | process_count | Condition | Description |
|---|---|---|---|
| 00:00 | 0 | process_count = 1 | Non-occurrence of the event |
| 00:01 | 1 | process_count = 1 | The process_count = 1 event notification occurs. |
| 00:02 | 1 | process_count = 0 | The process_count = 0 end (resolve) event notification occurs. |
| 00:03 | 0 | process_count = 0 | The process_count = 0 event notification occurs. |
Solution
Check the actual conditions you set for end events that occurred due to their change in
> Services > Management & Governance > Cloud Insight > Event on the NAVER Cloud Platform console.
Event occurrence despite the CPU usage below the event rule condition
An event occurred although the CPU usage was lower than the event rule condition.
Cause
The CPU/used_rto metric has cpu_idx:0~N dimensions depending on the number of CPUs. If you create an event rule without selecting a dimension, the metrics for all dimensions are targeted, and if any of the metrics for each dimension meet the condition, an event occurs.
Example: if the server has 2 CPUs and the event rule and metric values are as follows, the CPU/used_rto value is 45, but an event occurs because the corresponding value for the dimension cpu_idx: 0 is 60, which meets the condition.
-
Monitoring items and conditions
Metric: CPU/used_rto
Dimension: not selected
Condition: >=50
Aggregation method: AVG
Duration: 1 minute -
Min1data at a certain timeTime CPU/used_rto (cpu_idx: 0) CPU/used_rto (cpu_idx: 1) CPU/used_rto 00:01 60 30 45
Solution
If you need to configure the event according to the server's average CPU usage, use the SERVER/avg_cpu_used_rto metric.
Mismatch between event occurrence content and the data displayed on the Event menu
Event occurrence content differs from the data displayed on the Event menu.
Cause
For the graph you can view in
> Services > Management & Governance > Cloud Insight > Event on the NAVER Cloud Platform console, its data aggregation interval varies by event starting time and ending time (examples: Min5). To view the data that actually triggered the event rule, you must view the data with an aggregation interval of Min1.
Solution
View Min1 data after setting the view period to less than 1 hour, through configuring a separate Dashboard or viewing the event rule details in the Event Rule menu.
No server data with higher CPU usage on the CPU usage widget
When viewing the TOP 10 widget data in the Service Dashboard, server data with higher CPU usage is not displayed on the CPU usage widget.
Cause
The list of Service Dashboard TOP 10 is chosen according to the following criteria.
- For the view period (
startTimeandendTime), the top 10 are selected by sorting the most recently collected metric values for each resource based onendTime.
If there are more than 10 resources, a resource whose metric value is not included in the top 10 according to the above criteria may not be displayed on the Service Dashboard. In addition, although there are resources with high metric values during the view period, the resources may not be displayed on the Service Dashboard because the most recently collected values based on endTime are compared.
To check the exact metrics for a certain resource, check the data by selecting a certain resource on the widget data, or check the metrics for a certain resource by configuring a separate dashboard and widget.
Solution
This event is not an issue because the data of servers with higher CPU usage were excluded from the collection criteria of Service Dashboard TOP 10. For more information, see the section Cause.
If you're still having trouble finding what you need, click on the feedback icon and send us your thoughts and requests. We'll use your feedback to improve this guide.