Hadoop monitoring with Cloud Insight

Available in VPC

NAVER Cloud Platform's Cloud Insight service allows you to monitor the performance and operational indicators of Hadoop and quickly identify and respond to failures.

Preparations

Create a Cloud Hadoop cluster.
- For more information about creating a Cloud Hadoop cluster, see Getting started with Cloud Hadoop guide.
Subscribe to Cloud Insight.
- For more information on subscribing to Cloud Insight, see Cloud Insight user guide

Configure dashboards

You can create dashboards and add widgets to monitor Cloud Hadoop from the Cloud Insight console screen.

To create a dashboard and add widgets in Cloud Insight, follow these steps:

In the VPC environment of the NAVER Cloud Platform console, navigate to > Services > Management & Governance > Cloud Insight (Monitoring) in order.
Click the [Create dashboard] button.
Enter a name and description for the dashboard and click [Create].
Click the [Add widget] button.
Enter a name for the widget, select a widget type, and click [Next].
- This example uses the Time Series widget.
Enter the widget settings as shown below and click [Next].
- On the [CPU] tab, select the CPU/used_rto and CPU/user_rto check boxes and click [Add selection].
- Product Type: Cloud Hadoop(VPC)
- Target: select All owned resources and select the cluster you want to monitor.
  When selecting Group, see Target Group settings
- Metric: select All metrics, select the item to monitor, and click [Add selection].
  When selecting Template, see Rule Template settings
- Setup data list: Dimension (properties), Interval (aggregation interval), and Aggregation (aggregation function) of the selected monitoring item
After reviewing the widget settings, click [Create].
- The dashboard displays added widgets as illustrated below. You can use the added widget to monitor your Cloud Hadoop cluster.

Group and template settings

To make it easier to manage monitoring settings and widgets, you can group specific monitoring targets or save specific monitoring items (metrics) as templates.

Target group configuration

To create a target group and organize specific monitoring targets into groups, follow these steps:

In the VPC environment of the NAVER Cloud Platform console, navigate to > Services > Management & Governance > Cloud Insight (Monitoring) in order.
Click Configuration > Template in order.
Click the [Target Group] tab and click [Create target group].
Enter the group settings as shown below and click [Create].

Product Type: Cloud Hadoop(VPC)
Group name, Group description: enter a group name and description.
Selectable monitoring targets: select all the monitoring targets to be included in the group and click .

Rule Template settings

To set up a rule template to save specific monitoring items as a template, follow these steps:

In the VPC environment of the NAVER Cloud Platform console, navigate to > Services > Management & Governance > Cloud Insight (Monitoring) in order.
Click Configuration > Template in order.
Click the [Rule Template] tab and click [Create rule template].
Enter the template settings as shown below and click [Next].
- Product Type: Cloud Hadoop(VPC)
- Template name, Description: enter a template name and description.
- On each classification tab, locate and select the monitoring items (metrics) you want to include in the template.
Enter the monitoring conditions for each monitoring item as shown below and click [Save].
- Dimension: properties of the monitoring item
- Level: level at which the event occurs
- Condition: condition under which the event occurs
- Method: aggregation function of the monitoring item
- Duration: duration
Note

The following example shows how to set an info level event to occur when the value of CPU/user_rto (cpu_idx: 1) in Cloud Hadoop (VPC) is 0 and lasts for 1 minute:

Set event

You can select a monitoring target and item, set monitoring conditions and notification actions to create an event, and check the status of the created event.

Note

This guide explains how to use Send Notification Messages as the notification action for an event. For more information on other notification actions, such as Integration, Cloud Functions, and Auto Scaling Policies, see Cloud Insight user guide

To set up an event, follow these steps:

In the VPC environment of the NAVER Cloud Platform console, navigate to > Services > Management & Governance > Cloud Insight (Monitoring) in order.
Click Configuration > Event Rule, in order.
Click [Create Event Rules].
Under Select a product to monitor, select Cloud Hadoop (VPC) and click [Next].
Select an individual monitoring target or monitoring group and click [Next].
- To create a new group, see Target Group settings
Select an individual monitoring item or monitoring template and click [Next].
- To create a new template, see Rule Template settings
In the [Send Notification Messages] tab, select a recipient group to be notified and click [Next].
- To create a new notification recipient group, see Create recipient groups
After reviewing the event settings, click [Create].

Check event status

To check the status of an event you have created, follow these steps:

In the VPC environment of the NAVER Cloud Platform console, navigate to > Services > Management & Governance > Cloud Insight (Monitoring) in order.
Click Event. When an event occurs under an event rule, you can review the details of the event as shown below.

Create recipient groups

To create a notification recipient group to receive event notification messages and add recipients, follow these steps:

In the VPC environment of the NAVER Cloud Platform console, navigate to > Services > Management & Governance > Cloud Insight (Monitoring) in order.
Click Notification Recipient menu in order.
In the recipient group list, click , type the name of the group you want to create, and click .
In the Recipient Group list, click All Recipients.
Select the recipients you want to assign to the created group and click [Assign].
- To add a new recipient, click [Add recipient] and refer to Cloud Insight user guide to add a recipient.
Enter the information of the notification recipient you want to add, complete the identity verification, and click [Register].

Cloud Hadoop Metric

For all clusters created, you can monitor the metrics listed below. Cloud Insight collects data for the metrics every 1 minute.

Note

If HDFS and YARN in the cluster do not operate normally, metrics are not collected and cannot be viewed on the dashboard.

Metric	Type	Unit	Description
active_nodes	INTEGER	num	number of nodes presently running MapReduce tasks or jobs
allocated_container	INTEGER	num	number of resource containers allocated by the ResourceManager
allocated_mb	INTEGER	MB	amount of memory allocated to the cluster
allocated_v_cores	INTEGER	num	number of core nodes working
apps_completed	INTEGER	num	number of applications submitted to YARN that have completed
apps_failed	INTEGER	num	number of applications submitted to YARN that have failed to complete
apps_killed	INTEGER	num	number of applications submitted to YARN that have been killed
apps_pending	INTEGER	num	number of applications submitted to YARN that are in a pending state
apps_running	INTEGER	num	number of applications submitted to YARN that are running
apps_submitted	INTEGER	num	number of applications submitted to YARN
available_mb	INTEGER	MB	amount of memory available to be allocated
capacity_remaining_gb	INTEGER	GB	amount of remaining HDFS disk capacity
corrupt_blocks	INTEGER	num	number of blocks that HDFS reports as corrupted
decommissioned_nodes	INTEGER	num	number of nodes allocated to MapReduce applications that have been marked in a DECOMMISSIONED state
hdfs_bytes_read	INTEGER	Bytes	number of bytes read from HDFS
hdfs_bytes_written	INTEGER	Bytes	number of bytes written to HDFS
hdfs_utilization	FLOAT	%	percentage of HDFS storage currently used
lost_nodes	INTEGER	num	number of nodes allocated to MapReduce that have been marked in a LOST state
missing_blocks	INTEGER	num	number of blocks in which HDFS has no replicas
num_live_data_nodes	INTEGER	num	number of data nodes that are receiving work from Hadoop
pending_containers	INTEGER	num	number of containers in the queue that have not yet been allocated
pending_deletion_blocks	INTEGER	num	number of blocks marked for deletion
pending_replication_blocks	INTEGER	num	status of block replication: blocks being replicated, age of replication requests, and unsuccessful replication requests
pending_v_cores	INTEGER	num	number of core nodes waiting to be assigned
rebooted_nodes	INTEGER	num	number of nodes available to MapReduce that have been rebooted and marked in a REBOOTED state
reserved_containers	INTEGER	num	number of containers reserved
reserved_mb	INTEGER	MB	amount of memory reserved
total_load	INTEGER	num	total number of concurrent data transfers
total_mb	INTEGER	MB	total amount of memory in the cluster
total_nodes	INTEGER	num	number of nodes presently available to MapReduce jobs
under_replicated_blocks	INTEGER	num	number of blocks that need to be replicated one or more times
unhealthy_nodes	INTEGER	num	number of nodes available to MapReduce jobs marked in an UNHEALTHY state
yarn_memory_available_percentage	FLOAT	%	percentage of remaining memory available to YARN (= available_mb / total_mb)