- Print
- PDF
Hadoop monitoring with Cloud Insight
- Print
- PDF
Available in VPC
You can use NAVER Cloud Platform's Cloud Insight to monitor Hadoop's performance and operation indicators, and quickly check and respond in case of failures.
Preparations
- Create a Cloud Hadoop cluster.
- Please refer to Getting started with Cloud Hadoop for more information about creating Cloud Hadoop.
- Request subscription to Cloud Insight.
- Please refer to the Cloud Insight Guide for more information about requesting subscription to Cloud Insight.
Dashboard configuration
If you've completed preparations, then you can create a dashboard in the Cloud Insight console screen and add widgets for monitoring Cloud Hadoop.
The following describes how to create Cloud Insight dashboard and add widgets for monitoring Cloud Hadoop.
- From the VPC environment of the NAVER Cloud Platform console, click the Services > Management & Governance > Cloud Insight (Monitoring) menus, in that order.
- Click the [Create dashboard] button.
- Enter the dashboard name and description, and then click the [Create] button.
- Click the [Add widget] button.
- Enter the widget's name, select the widget type, and then click the [Next] button.
- This example uses the Time Series widget for the explanation.
- This example uses the Time Series widget for the explanation.
- Enter the widget settings as below, and then click the [Next] button.
- Product Type: Cloud Hadoop(VPC)
- Target: Select All available resources, and then select the cluster to monitor
(Refer to Set target group when selecting group) - Metric: Select All metrics, select items to monitor, and then click the [Add selected item] button
(Refer to Set rule template when selecting template) - Set data list: Select Dimension (properties), Interval (aggregation interval), and Aggregation (aggregation function) of the selected monitoring item
- After checking the set widget details, click the [Create] button.
- The widget will be added to the dashboard. You can monitor the Cloud Hadoop cluster through the added widget.
- The widget will be added to the dashboard. You can monitor the Cloud Hadoop cluster through the added widget.
Group and template settings
You can group specific monitoring targets or save specific monitoring items (metrics) as a template so you can manage monitoring settings and widgets easily.
Set target group
The following describes how to create a target group and group specific monitoring targets.
- From the VPC environment of the NAVER Cloud Platform console, click the Services > Management & Governance > Cloud Insight (Monitoring) menus, in that order.
- Click the Configuration > Template menus, in that order.
- Click the [Target Group] tab, and then click the [Create target group] button.
- Enter the group settings as below, and then click the [Create] button.
- Product Type: Cloud Hadoop(VPC)
- Group name, Group description: Enter the group name and description
- Available monitor targets: Select all monitoring targets to include in the group, and then click
Set rule template
The following describes how to set a rule template and save specific monitoring items as a template.
From the VPC environment of the NAVER Cloud Platform console, click the Services > Management & Governance > Cloud Insight (Monitoring) menus, in that order.
Click the Configuration > Template menus, in that order.
Click the [Rule Template] tab, and then click the [Create rule template] button.
Enter the template settings as below, and then click the [Next] button.
- Product Type: Cloud Hadoop(VPC)
- Template name, Description: Enter the template name and description
- Select the monitoring items (metrics) to include in the template from each category tab
Enter the monitoring conditions of each monitoring item by referring to below, and then click the [Save] button.
- Dimension: Properties of the monitoring item
- Level: Level of the event in case of event occurring
- Condition: Event triggering condition
- Method: Aggregation function of the monitoring item
- Duration: The time of duration
NoteThe following is an example of setting an Info level event trigger if the Cloud Hadoop (VPC)'s CPU/user_rto (cpu_idx:1) value continues to be 0 for one minute.
Set event
You can select monitoring targets and items, create events by setting monitoring conditions and notification actions, and view the status of created events.
This guide explains how to use Send notification message as an event's notification action. Please refer to Cloud Insight Guide for information about other notification actions including Integration, Cloud Functions, and Auto Scaling policies.
The following shows how to set an event.
- From the VPC environment of the NAVER Cloud Platform console, click the Services > Management & Governance > Cloud Insight (Monitoring) menus, in that order.
- Click the Configuration > Event Rule menus, in that order.
- Click the [Event Rules] button.
- Select Cloud Hadoop (VPC) from the Select monitored product item, and then click the [Next] button.
- Select the individual monitoring target or monitoring group, and then click the [Next] button.
- To create a new group, refer to Set target group.
- To create a new group, refer to Set target group.
- Select the individual monitoring item or monitoring template, and then click the [Next] button.
- To create a new template, refer to Set rule template.
- To create a new template, refer to Set rule template.
- Select the notification recipient group from the [Send notification message] tab, and then click the [Next] button.
- To create a new notification recipient group, refer to Create notification recipient group.
- To create a new notification recipient group, refer to Create notification recipient group.
- After checking the set event details, click the [Create] button.
View event status
The following describes how to view the created event's status.
- From the VPC environment of the NAVER Cloud Platform console, click the Services > Management & Governance > Cloud Insight (Monitoring) menus, in that order.
- Click the Event menu.
Create notification recipient group
The following describes how to create a notification recipient group to send event notification messages and add recipients.
- From the VPC environment of the NAVER Cloud Platform console, click the Services > Management & Governance > Cloud Insight (Monitoring) menus, in that order.
- Click the Notification Recipient menu.
- Click the button from the Target group list, enter the name of the group to create, and then click the button.
- Click All targets from the Target group list.
- Select targets to assign to the created group, and then click the [Assign] button.
- To add new targets, click the [Add target] button, and add targets by referring to Cloud Insight Guide.
- To add new targets, click the [Add target] button, and add targets by referring to Cloud Insight Guide.
- Enter the information of the notification recipient to add, complete the identity authentication process, and then click the [Register] button.
Cloud Hadoop Metric
You can monitor the indicators below for all clusters created. Cloud Insight collects data for metrics at 1-minute intervals.
If the cluster's HDFS and YARN do not operate normally, metrics are not collected and cannot be checked on the dashboard.
Indicators | Type | Unit | Explanation |
---|---|---|---|
active_nodes | INTEGER | num | number of nodes presently running MapReduce tasks or jobs |
allocated_container | INTEGER | num | number of resource containers allocated by the ResourceManager |
allocated_mb | INTEGER | MB | amount of memory allocated to the cluster |
allocated_v_cores | INTEGER | num | number of core nodes working |
apps_completed | INTEGER | num | number of applications submitted to YARN that have completed |
apps_failed | INTEGER | num | number of applications submitted to YARN that have failed to complete |
apps_killed | INTEGER | num | number of applications submitted to YARN that have been killed |
apps_pending | INTEGER | num | number of applications submitted to YARN that are in a pending state |
apps_running | INTEGER | num | number of applications submitted to YARN that are running |
apps_submitted | INTEGER | num | number of applications submitted to YARN |
available_mb | INTEGER | MB | amount of memory available to be allocated |
capacity_remaining_gb | INTEGER | GB | amount of remaining HDFS disk capacity |
corrupt_blocks | INTEGER | num | number of blocks that HDFS reports as corrupted |
decommissioned_nodes | INTEGER | num | number of nodes allocated to MapReduce applications that have been marked in a DECOMMISSIONED state |
hdfs_bytes_read | INTEGER | Bytes | number of bytes read from HDFS |
hdfs_bytes_written | INTEGER | Bytes | number of bytes written to HDFS |
hdfs_utilization | FLOAT | % | percentage of HDFS storage currently used |
lost_nodes | INTEGER | num | number of nodes allocated to MapReduce that have been marked in a LOST state |
missing_blocks | INTEGER | num | number of blocks in which HDFS has no replicas |
num_live_data_nodes | INTEGER | num | number of data nodes that are receiving work from Hadoop |
pending_containers | INTEGER | num | number of containers in the queue that have not yet been allocated |
pending_deletion_blocks | INTEGER | num | number of blocks marked for deletion |
pending_replication_blocks | INTEGER | num | status of block replication: blocks being replicated, age of replication requests, and unsuccessful replication requests |
pending_v_cores | INTEGER | num | number of core nodes waiting to be assigned |
rebooted_nodes | INTEGER | num | number of nodes available to MapReduce that have been rebooted and marked in a REBOOTED state |
reserved_containers | INTEGER | num | number of containers reserved |
reserved_mb | INTEGER | MB | amount of memory reserved |
total_load | INTEGER | num | total number of concurrent data transfers |
total_mb | INTEGER | MB | total amount of memory in the cluster |
total_nodes | INTEGER | num | number of nodes presently available to MapReduce jobs |
under_replicated_blocks | INTEGER | num | number of blocks that need to be replicated one or more times |
unhealthy_nodes | INTEGER | num | number of nodes available to MapReduce jobs marked in an UNHEALTHY state |
yarn_memory_available_percentage | FLOAT | % | percentage of remaining memory available to YARN (= available_mb / total_mb) |