Cloud Insight FAQ
    • PDF

    Cloud Insight FAQ

    • PDF

    Article Summary

    The latest service changes have not yet been reflected in this content. We will update the content as soon as possible. Please refer to the Korean version for information on the latest updates.

    Available in Classic and VPC

    This document provides answers for FAQs for Cloud Insight.

    If you haven't found the answer to your question in the FAQs below, search the user guide for what you would like to know.

    Q. Which services' performance indicators can I view through Cloud Insight?

    A. For which services' performance indicators are available through Cloud Insight, see services with performance indicators provided.

    Q. What do Metric and Dimension mean?

    A. Metric refers to the value the user is to handle, and Dimension indicates the Metric's properties. You can use Dimensions to define which server a Metric belongs to, where it is located, or what the value represents.

    Q. How long are the data collection and aggregation intervals?

    • The data collection interval of the Metric is 1 minute. The collection interval refers to an interval at which the target sends data to Cloud Insight, independent of the aggregation interval.
      • The data is saved as collected in Cloud Insight, and is computed using various types of aggregation methods at each aggregation interval.
    • Aggregations are executed in the intervals of 1 minute (Min1), 5 minutes (Min5), 30 minutes (Min30), 2 hours (Hour2), and 1 day (Day1).
      • Aggregation functions such as AVG (average value), MIN (minimum value), MAX (maximum value), COUNT (number of collections), and SUM (total) of the current aggregation interval are supported.

      • <example> Assuming the data below has been collected between 00:01 and 00:05, the expected values of the 1 minute (Min1) and 5 minutes (Min5) aggregation period are as shown in the table.

        00:01:00 - 1
        00:02:00 - 2
        00:03:00 - 3
        00:04:00 - 4
        00:05:00 - 5
        

        Aggregation interval: 1 minute (Min1)

        TimeAVG (average value)MIN (minimum value)MAX (maximum value)COUNT (number of collections)SUM (total)
        00:0111111
        00:0222212
        00:0333313
        00:0444414
        00:0555515

        Aggregation interval: 5 minute (Min5)

        TimeAVG (average value)MIN (minimum value)MAX (maximum value)COUNT (number of collections)SUM (total)
        00:01315515

    Q. How do I create and use Custom Schema?

    A. Cloud Insight supports various Metric types and indicators, but the Metric you want may not be supported. In this case, you can use Custom Schema and SendData API to freely aggregate and collect the metrics you want, and use them in Cloud Insight.

    Note

    For more information on Custom Schema and SendData API, see the following guides:

    The detailed scenario for using Custom Schema and Send Data API is as follows:

    1. Create Custom Schema

    See Custom Schema user guide to create Custom Schema.
    After creating Custom Schema, click the [Example of data transfer] button to check the [Sample Data format to be transferred].

    The following is an example of Custom Schema for collecting the usage of Filesystem. (As Cloud Insight provides the Filesystem-type metrics, please note that this is for illustrative purposes only.)

    Examples of the input value when Custom Schema is created

    Product Type : CustomFilesystem
      Set collection target:
        ID Dimension : instanceName
        Data Type : String
      Metrics :
      - Metric : totalSize
        Data Type : Integer
        AggregationCycle : Min1, Min5, Min30
        Aggregation : AVG
        Unit : MB
      - Metric : usedSize
        Data Type : Integer
        AggregationCycle : Min1, Min5, Min30
        Aggregation : AVG
        Unit : MB
      - Metric : availSize
        Data Type : Integer
        AggregationCycle : Min1, Min5, Min30
        Aggregation : AVG
        Unit : MB
      Dimensions :
      - Dimension : mountPoint
        Data Type : String
    

    Examples of the Sample Data type after Custom Schema is created

    {
    	"cw_key": "801142312146182144",
    	"data": {
    		"instanceName": "fe79g8ahkab",
    		"totalSize": 893,
    		"availSize": 260,
    		"usedSize": 405,
    		"mountPoint": "gh1apxl4it9"
    	}
    }
    

    2. Aggregate the metrics you want

    Directly aggregate metric values ​that fit the Custom Schema data type. Create a script to connect to the target server and derive the desired value.

    The following is an example of writing a script that follows the example above.

    #!/bin/bash
    
    MOUNTPOINT="/userDevice"
    
    USAGES=$(df -m | grep " $MOUNTPOINT$")
    
    totalSize=$(echo $USAGES | awk '{print $2}')
    usedSize=$(echo $USAGES | awk '{print $3}')
    availSize=$(echo $USAGES | awk '{print $4}')
    

    3. Transfer Custom Metric Data through the SendData API

    Organize the metric values you have aggregated directly according to the data transfer format of Custom Schema, and transfer them to Cloud Insight using the SendData API.

    The following is an example of the Custom Schema data transfer format that follows the example above.

    {
    	"cw_key": "801142312146182144",
    	"data": {
    		"instanceName": "myServer",
    		"totalSize": 1180,
    		"availSize": 1150,
    		"usedSize": 30,
    		"mountPoint": "/userDevice"
    	}
    }
    

    4. Check the data collected in Cloud Insight

    Like this, you can check Custom Metric data transferred to Cloud Insight when creating a Dashboard Widget or Event Rule or Template on the Cloud Insight console.

    5. Repeat it every minute

    If you have checked Custom Product Type, ID Dimension, Dimensions, and Metric in Cloud Insight normally, repeat procedures No. 2 and 3 above every minute (using appropriate means such as Crontab) to collect metric values ​with Cloud Insight.

    Q. What is agent_status metric?

    A. The agent_status metric is a metric that can monitor the status of the Cloud Insight Agent.
    The conditions for the agent_status metric are as below.

    • 0: when the agent is normal
    • 1: when data are not collected for three minutes but the ping check succeeds
    • 2: when data are not collected for three minutes and the ping check fails

    The agent_status values are processed as branches rather than consecutively. If the server is stopped while the agent is normal, the agent_status value will change from 0 to 2 rather than from 0 through 1 to 2.
    Note that the ping check is performed against your server by a separate management server (ping check monitoring server). As ping check failure is not the same as a server failure, if the agent_status value is 2, you need to check the network as well as the agent and server status.

    Q. What is the difference in data between Process data and Plugin Process data in server (VPC)?

    A. Process refers to the data related to the server's TOP 10 processes. Plugin process refers to the data related to the specific processes set by the user. Accordingly, we recommend that you use the plugin process feature to monitor specific processes.

    Q. How can I use the server (VPC)'s Plugin (File/Process/Port) features?

    A. To use the Plugin feature, you must set up monitoring for specific files, processes, or ports through API first.

    See the following for the Plugin configuration and view APIs.

    Plugin (File/Process/Port) Metric is Extended and requires detailed monitoring settings of the server.

    See the detailed examples below.

    (It is described based on the Plugin Process. It is similarly applied to Plugin File and Plugin Port.)

    1. Check if the detailed monitoring is enabled on the server.

    2. Register the desired process for monitoring to Cloud Insight through AddProcessPlugin API.
      For configList of Payload, see ps -ef on Linux or tasklist on Windows.

      Payload examples

      payload = {
        "configList": [ "*httpd*", "*java*" ],
        "instnaceNo": "1234567",
        "type": "VPCServer"
      }
      
      Note

      Asterisk (*) can only be used while setting plugin processes. When a process name is set with a string containing an asterisk (*), the PID list of all matching processes becomes the target.

      Note

      When calling the AddPluginProcess API, only one instanceNo can be registered at a time. If multiple instanceNos are targeted, API is called multiple times.

    3. Check if the Plugin Process configList is normally registered in Cloud Insight through GetAllProcessPlugin.

    4. If the Plugin Process configList is normally registered, you can check the registered process name in the Cloud Insight Console after about 2 to 3 minutes. When setting the Dashboard's Widget, the process name for the Target InstanceName to set Plugin Process is exposed as Dimensions.

    5. If you need to change or delete the Plugin Process, use UpdateProcessPlugin or RemoveProcessPlugin.

    Note

    If you delete Plugin Process, it does not disappear from Dimension right away. For more information, see Troubleshooting Cloud Insight.

    Q. What is the default value if I don't select the Metric Dimension?

    • The Dimension selection differs according to the Metric.
      <example> When the Metric is Server: there is only one Dimension, so there is no optional Dimension. When the Metric is CPU: you can select the Dimension of cpu_idx: 0~N according to the number of CPUs.

    • If you did not choose a Dimension when you had options you could select from, then the values that respond to the Aggregation settings for all selectable Dimensions are printed.
      <example> If no Dimension was selected under the following conditions,

      Metric : CPU/used_rto
      Dimension : cpu_idx: 0, cpu_idx: 1
      Aggregation : AVG
      

      then it is set as the average used_rto of cpu_idx: 0 and cpu_idx: 1 according to the Aggregation settings.

    Q. An event occurred when the CPU usage was below the event rule condition. Why did the event occur?

    A. The CPU/used_rto metric has cpu_idx:0~N dimension(s) depending on the number of CPUs.
    If you create an event rule without selecting a dimension, the metrics for all dimensions are targeted, and if any of the metrics for each dimension meet the condition, an event occurs.
    <example> If the server has 2 CPUs and the event rule and metric values are as shown below, the CPU/used_rto value is 45, but an event is raised because the corresponding value for the dimension cpu_idx: 0 is 60, which satisfies the condition.

    • Monitoring items and conditions:
    Metric: CPU/used_rto
    Dimension: not selected
    Condition: >= 50
    Aggregation method: AVG
    Duration: 1 minute
    
    • Min1 data at one point in time:
    TimeCPU/used_rto (cpu_idx: 0)CPU/used_rto (cpu_idx: 1)CPU/used_rto
    00:01603045

    So if you need to set an event for the average CPU usage of your server, use the SERVER/avg_cpu_used_rto metric.

    Q. If I have set the monitoring items and conditions of the Event Rule as Conditions of multiple Metrics, do all the conditions need to be met for the Event to occur?

    A. If multiple Conditions of multiple Metrics are set in the Event Rule, each Condition operates under the OR condition. In other words, if the Condition for an individual Metric added to the monitoring items and conditions is met, then the Event occurs.

    In Cloud Insight, if multiple Metrics are selected as the monitoring items and conditions when configuring Event Rules, then Event Rules that correspond to the number of selected Target*Metric of metrics are actually created. If you click the [View all Rules] button when creating Event Rules, or after selecting a created Event Rule, you can view all the Event Rules that were actually created.

    <example> If 2 Conditions are configured to the Event Rule for 1 VM, and the Auto Scaling policy is set as the action, then 2 Event Rules are actually created as follows:

    • If the VM's avg_cpu_used_rto > 50% execute the Auto Scaling policy
    • If the VM's mem_usert > 50%, execute the Auto Scaling policy

    Thus, when avg_cpu_used_rto > 50%, or when mem_usert > 50%, the Event is triggered and the Auto Scaling policy is executed.

    Q. How is mem_usert of the Server (VPC) collected?

    A. The mem_usert value refers to the percentage of memory used compared to the total memory, and the calculation formula is as follows:

    used_mem_mb = total_mem_mb - free_mem_mb - buffuer_mb - cache_mb - slab_reclaimable_mb
    mem_usert = used_mem_mb / total_mem_mb * 100
    

    Q. How is the Filesystem Type metric collected?

    A. Metrics in the Filesystem Type are registered with Mountpoint Name as a Dimension and can be collected when the following criteria are met.

    • A separate partition or device formatted with one of the following file systems: ext3, ext4, xfs (based on UUID)

      > blkid
      /dev/xvda1: UUID="f95bed0a-11af-4b2c-bfcc-4afb91a68fc1" TYPE="xfs"
      /dev/xvda2: UUID="0692fdb8-bb3c-4094-83f0-fe95a339b8c1" TYPE="xfs"
      
    • Actually mounted

      > df -h
      /dev/xvda2       49G  3.6G   46G   8% /
      /dev/xvda1     1014M  183M  832M  18% /boot
      
    Note

    If the Filesystem is not formatted with ext3, ext4, or xfs, you can register it in /etc/fstab and mount it.

    > cat /etc/fstab
    /dev/xvdb    /mnt/vol     vfat      defaults     0   0
    

    The mountpoint recorded in /etc/fstab must exactly match the mountpoint resulting from the actual df -h command.

    <example>
    /logs/ != /logs

    Q. How do I install the Agent?

    A. Connect to your VPC server and see the method for your OS.

    Note

    Installation domains are only accessible from VPC servers. To connect to it from the Internet environment, use NAVER Cloud Platform open site.

    • Linux

      • Download installation package: https://nsight.ncloud.com/agent_controller_linux_ncp.tar.gz
      • Unzip in /home1/nbpmon/: tar -zxvf agent_controller_linux_ncp.tar.gz
      • Run Agent with root permissions: /home1/nbpmon/agent_controller_linux/install_agent.sh pub
    • Linux Bare Metal

      • Download installation package: https://nsight.ncloud.com/agent_controller_linux_pub_bm.tar.gz
      • Unzip in /home1/nbpmon/: tar -zxvf agent_controller_linux_pub_bm.tar.gz
      • Run Agent with root permissions: /home1/nbpmon/agent_controller_linux/bm_install_agent.sh
    • Window

      • Download installation package: https://nsight.ncloud.com/agent_controller_windows_ncp.zip
      • Unzip: unzip agent_controller_windows_ncp.zip
      • Run agent: agent_controller_windows/install_agent.bat pub
      Caution

      After downloading or unzipping, the installation folder must be under the NBP folder.
      The following is an example of a wrong installation path:
      C:\Program Files (x86)\NBP\agent_controller_windows_ncp\agent_controller_windows
      The following is the correct installation path:
      C:\Program Files (x86)\NBP\agent_controller_windows

    Q. Where can I download the Agent script file for Linux?

    A. You can download it by clicking to_stop_start_uninstall_agent.zip. Unzip the downloaded file and then locate the script files in the Agent directory (/home1/nbpmon/agent_controller_linux/). You can start, stop, install, or delete the Agent through the script.

    Q. Do I have to have the Agent installed to monitor data in Server (VPC)?

    A. You need the Agent to collect performance indicators in Server (VPC). However, since the Agent is built in by default when creating servers, users don't need to install it separately. Please note that if the Agent is deleted or does not run due to the user settings, it is impossible to collect data through Cloud Insight.

    Q. How do I check if the Agent is in operation?

    A. See the method for your OS.

    • Linux
      Check whether the Agent process is alive through ps -ef | grep agent. If the agent_updater.py and agent.py processes are running, the agent is working normally.
    • Window
      Check the nsight2_agent service's status. If the service has started, then it means that the Agent is in operation.

    Q. How do I stop or start the Agent?

    A. See how to stop or start the Agent for your OS.

    • Linux

      • Stop the Agent: run /home1/nbpmon/agent_controller_linux/stop_agent.sh.
      • Start the Agent: run /home1/nbpmon/agent_controller_linux/start_agent.sh.
      • Restart the Agent: run /home1/nbpmon/agent_controller_linux/restart_agent.sh.
    • Window

      • Stop the Agent: run C:\Program Files(x86)\NBP\agent_controller_windows\agent.bat stop.
      • Start the Agent: run C:\Program Files(x86)\NBP\agent_controller_windows\agent.bat start.

    Q. How do I delete the Agent?

    A. See how to delete the Agent for your OS.

    Q. How do I reinstall Agent?

    A. If installation has not been completed properly, you can reinstall Agent in the following way:

    • Linux

      1. Pause Agent
        Run /home1/nbpmon/agent_controller_linux/stop_agent.sh.

      2. Delete Agent
        Run /home1/nbpmon/agent_controller_linux/uninstall_agent.sh.

      3. Delete Agent installation path
        Delete /home1/nbpmon/agent_controller_linux. Be sure to back up any necessary files.

      4. Install Agent
        For how to install Agent, see Q. How do I install Agent?

    • Window

      1. Pause Agent
        Run the following command:
      C:\Program Files(x86)\NBP\agent_controller_windows\agent_controller_windows\agent.bat stop
      
      1. Delete Agent
        Run the following command:
      C:\Program Files (x86)\NBP\agent_controller_windows\agent_controller_windows\agent.bat uninstall
      
      1. Delete Agent installation path
        Delete C:\Program Files (x86)\NBP\agent_controller_windows. Be sure to back up any necessary files.

      2. Install Agent
        For how to install Agent, see Q. How do I install Agent?

    Q. How do I check the Agent's logs?

    A. The log files can be viewed as follows, depending on your OS.

    • Linux
      You can check log files in /home1/nbpmon/agent_controller_linux/logs.

    • Windows
      You can check log files in C:\Program Files (x86)\NBP\agent_controller_windows\logs.

    Q. What should I do to adjust the log size of the Agent and number of backups?

    A. You can adjust the log size of the Agent and the number of backups as follows:

    1. Check the logger.py file according to your OS.

      • Linux
        /home1/nbpmon/agent_controller_linux/logger.py
      • Window
        C:\Program Files (x86)\NBP\agent_controller_windows\logger.py
    2. Edit LOG_SIZE_IN_BYTES and LOG_BACKUP_COUNT among logger.py file details.

      ...
      def get_logger(name, logfile=DEFAULT_LOG, max_bytes=LOG_SIZE_IN_BYTES, backup_count=LOG_BACKUP_COUNT):
          logger = logging.getLogger(name)
          setup_logger(logger, logfile, max_bytes, backup_count)
          return logger
      
    3. Restart Agent after editing the logger.py file.

    Q. Is it necessary to understand how the actions are linked to each other to define permissions by actions using User Created policies?

    A. When you select specific actions to grant to a sub account, the system offers a feature to automatically select the related actions as well.

    Q. When receiving Event information in SMS, what contents are contained in the SMS?

    A. Cloud Insight provides an SMS alarm function for cases where Event occurs, the Event remains unresolved, and the Event ends.
    The Message format for each situation is as follows:

    Send statusSMS Format
    Event occurs[Ncloud] ${RuleName} ${Level} ${InstanceName} ${Condition}
    Remind Event[Ncloud][Remind] ${RuleName} ${Level} ${InstanceName} ${Condition}
    End Event[Ncloud][Resolve] ${RuleName} ${InstanceName} ${Condition}

    SMS is sent with minimal information due to message capacity limitations according to message characteristics.
    If you need more information, it is recommended to use Integration.

    Q. Cloud DB products are in use. How should I interpret the contents of the SMS that are automatically transferred when an event occurs?

    A. As metrics provided by each Cloud DB type are different, please check the console screen for details. The contents of the mainly used metrics are as follows:

    ProductMetricSMS SampleDescription
    Cloud DB for MySQL(VPC)mysql_active[Ncloud] DB Down:0, Threshold:== 0, Duration:1min WARNING test mysql_active=0The test DB server is down
    Cloud DB for MySQL(VPC)mysql_slavedelay[Ncloud] DB Down:0, Threshold:== 0, Duration:1min WARNING test mysql_slavedelayReplication of the latest data from Master to Slave is delayed (data up to 1 minute ago is reflected)
    Cloud DB for MySQL(VPC)mysql_slaverun[Ncloud] DB Down:0, Threshold:== 0, Duration:1min WARNING test mysql_slaverun=0The Slave server of the test DB is not synchronized

    Q. It seems that there is a metric that is not collected in the metric list when creating the Widget. What are the criteria for the displayed metrics?

    A. While creating the Widget, it shows a list of all metrics provided by the selected product. Even if you add a metric that is not collected currently to the Widget, it will be displayed on the Widget if the metric gets collected later. However, metrics not provided by the OS may appear in cases where additional settings are required for collecting the metric (detailed monitoring, Plugin setting, and so on), the metric is not supported by the subject resource (Server (classic)), or it is a Server (VPC) product. See the metric descriptions for metrics provided by your OS.

    Q. The data on the Events page are different from actual event data. Why is this?

    A. The graph displayed when viewing events on the Events page from the console has a different aggregation period (e.g., Min5) for the data viewed, depending on the event start and end dates.
    To see the data that actually triggered the event rule, you need to view the data with an aggregation period of Min1.
    Therefore, you can check the Min1 data by configuring the Dashboard separately or by viewing the event rule on the Event Rule page and setting the view period to within 1 hour in the Details menu.

    Q. What are the criteria for the process name collecting ProcessPlugin?

    A. In the case of ProcessPlugin, information on matching process names is collected based on /proc/{pid}/stat or /proc/{pid}/cmdline.

    Q. Is there a way to stop event rule actions at specific times?

    A. You can use the Planned Maintenance feature to stop actions following the occurrence of an Event.
    Please set the dimension for each product related to the event rule you wish to disable.

    Q. When viewing the TOP 10 widget data in Service Dashboard, server data with high CPU usage is not displayed in the CPU usage widget. Why is this?

    A. The criteria for selecting the Service Dashboard Top 10 list are as follows:

    • Search and sort the Min1 metric values of the last 10 minutes and select the top 10
    • If there is no Min1 data for the last 10 minutes, select 10 values randomly

    If the widget data is TOP 10, the list of servers to be exposed must be selected. So the data used for selection is the data viewed during the (endTime - 10 minutes, endTime) period. This data is not actually displayed on the dashboard and is only used internally.
    If a server with high CPU usage is not listed in the TOP 10, the server's CPU usage Min1 metric value for the (endTime - 10 minutes, endTime) period may not be included in the top 10 based on the above criteria.
    Like this, as the Top10 is compared with data of 10 minutes ago based on the end time of the set search period (endTime), it may not be displayed as the expected list for the entire search period.

    Q. After the event occurred, the conditions were changed. Although the changed conditions are not met, an event occurred. Why did the event occur?

    A. When there is an existing event, if you change the conditions of that event, the existing event will end and an end event notification will be sent under the conditions set at the time.
    Therefore, if you want to check the actual conditions set at the time of the end event that occurred due to changed conditions, you must check them by viewing them on the console event page.

    Please note that the example below is a simple example for reference without considering duration, etc.

    Timeprocess_countConditionEvent occurrence and contents
    00:000process_count = 1Non-occurrence of the event
    00:011process_count = 1The process_count = 1 event alarm occurs.
    00:021process_count = 0The process_count = 0 end (resolve) event occurs.
    00:030process_count = 0The process_count = 0 event alarm occurs.

    Was this article helpful?

    Changing your password will log you out immediately. Use the new password to log back in.
    First name must have atleast 2 characters. Numbers and special characters are not allowed.
    Last name must have atleast 1 characters. Numbers and special characters are not allowed.
    Enter a valid email
    Enter a valid password
    Your profile has been successfully updated.