Monitoring 콘솔 사용

모니터링 결과 특정 지표가 임계치를 초과하거나 특정 조건을 만족하는 경우, 이벤트로 인식하여 사용자에게 알람으로 알리도록 설정이 가능합니다. 이벤트 및 알람 설정에 대한 자세한 사용 방법은 Cloud Insight로 Cloud Hadoop 모니터링 가이드를 참조해 주십시오.

Monitoring 시작

네이버 클라우드 플랫폼 콘솔에서 Services > Big Data & Analytics > Cloud Hadoop 메뉴를 차례대로 클릭해 주십시오.

[클러스터 생성] 버튼을 클릭한 후, Cloud Hadoop 클러스터를 생성해 주십시오.

클러스터 생성에 대한 자세한 내용은 Cloud Hadoop 시작 가이드를 참조해 주십시오.

좌측의 Cloud Hadoop > Monitoring 메뉴를 클릭해 주십시오.

Cloud Hadoop 클러스터 목록에서 모니터링할 클러스터를 클릭해 주십시오.

Monitoring 화면

Monitoring 이용을 위한 기본적인 설명은 다음과 같습니다.

좌측 영역에서 현재 운영중인 Cloud Hadoop 클러스터와 클러스터별 서버를 선택할 수 있습니다.

클러스터명을 클릭하면 우측 영역에 HADOOP Dashboard가 노출되고, 클러스터명 하단의 서버를 클릭하면 OS Dashboard가 노출됩니다.

Monitoring 대시보드 확인

Monitoring에서 제공하고 있는 대시보드는 여러 개의 그래픽 차트로 구성되어 있습니다. 사용자는 클러스터별로 확인하고 싶은 대시보드에서 원하는 정보만 디스플레이하여 직관적으로 확인할 수 있습니다. 대시보드를 사용하는 방법은 다음과 같습니다.

HADOOP Dashboard

좌측 Cloud Hadoop 클러스터 목록에서 원하는 클러스터를 클릭하면 우측과 같이 HADOOP Dashboard를 볼 수 있습니다.

HADOOP Dashboard에서 데이터는 매분 수집됩니다.
모니터링 정보는 평균값 기준이며, 선택한 기간 유형에 따라 조회 주기가 달라집니다.

각 그룹별로 확인할 수 있는 지표는 다음과 같습니다.

그룹	지표명	단위	설명
Apps	apps_completed	num	number of applications submitted to YARN that have completed
apps_failed	num	number of applications submitted to YARN that have failed to complete
apps_killed	num	number of applications submitted to YARN that have been killed
apps_pending	num	number of applications submitted to YARN that are in a pending state
apps_running	num	number of applications submitted to YARN that are running
apps_submitted	num	number of applications submitted to YARN
Blocks	corrupt_blocks	num	number of blocks that HDFS reports as corrupted
missing_blocks	num	number of blocks in which HDFS has no replicas
pending_deletion_blocks	num	number of blocks marked for deletion
pending_replication_blocks	num	status of block replication: blocks being replicated, age of replication requests, and unsuccessful replication requests
under_replicated_blocks	num	number of blocks that need to be replicated one or more times
Containers	allocated_container	num	number of resource containers allocated by the ResourceManager
pending_containers	num	number of containers in the queue that have not yet been allocated
reserved_containers	num	number of containers reserved
HDFS capacity(GB)	capacity_remaining_gb	GB	amount of remaining HDFS disk capacity
HDFS read/write(bytes)	hdfs_bytes_read	bytes	number of bytes read from HDFS
hdfs_bytes_written	bytes	number of bytes written to HDFS
HDFS utilization(%)	hdfs_utilization	%	percentage of HDFS storage currently used
Memory(MB)	allocated_mb	MB	amount of memory allocated to the cluster
available_mb	MB	amount of memory available to be allocated
reserved_mb	MB	amount of memory reserved
total_mb	MB	total amount of memory in the cluster
Nodes	num_live_data_nodes	num	number of data nodes that are receiving work from Hadoop
unhealthy_nodes	num	number of nodes available to MapReduce jobs marked in an UNHEALTHY state
active_nodes	num	number of nodes presently running MapReduce tasks or jobs
decommissioned_nodes	num	number of nodes allocated to MapReduce applications that have been marked in a DECOMMISSIONED state
lost_nodes	num	number of nodes allocated to MapReduce that have been marked in a LOST state
rebooted_nodes	num	number of nodes available to MapReduce that have been rebooted and marked in a REBOOTED state
total_nodes	num	number of nodes presently available to MapReduce jobs
V_cores	allocated_v_cores	num	number of core nodes working
pending_v_cores	num	number of core nodes waiting to be assigned
Data transfers	total_load	num	total number of concurrent data transfers
YARN memory(%)	yarn_memory_available_percentage	%	percentage of remaining memory available to YARN (= available_mb / total_mb)

그룹

지표명

단위

설명

Apps

apps_completed

num

number of applications submitted to YARN that have completed

apps_failed

num

number of applications submitted to YARN that have failed to complete

apps_killed

num

number of applications submitted to YARN that have been killed

apps_pending

num

number of applications submitted to YARN that are in a pending state

apps_running

num

number of applications submitted to YARN that are running

apps_submitted

num

number of applications submitted to YARN

Blocks

corrupt_blocks

num

number of blocks that HDFS reports as corrupted

missing_blocks

num

number of blocks in which HDFS has no replicas

pending_deletion_blocks

num

number of blocks marked for deletion

pending_replication_blocks

num

status of block replication: blocks being replicated, age of replication requests, and unsuccessful replication requests

under_replicated_blocks

num

number of blocks that need to be replicated one or more times

Containers

allocated_container

num

number of resource containers allocated by the ResourceManager

pending_containers

num

number of containers in the queue that have not yet been allocated

reserved_containers

num

number of containers reserved

HDFS capacity(GB)

capacity_remaining_gb

amount of remaining HDFS disk capacity

HDFS read/write(bytes)

hdfs_bytes_read

bytes

number of bytes read from HDFS

hdfs_bytes_written

bytes

number of bytes written to HDFS

HDFS utilization(%)

hdfs_utilization

percentage of HDFS storage currently used

Memory(MB)

allocated_mb

amount of memory allocated to the cluster

available_mb

amount of memory available to be allocated

reserved_mb

amount of memory reserved

total_mb

total amount of memory in the cluster

Nodes

num_live_data_nodes

num

number of data nodes that are receiving work from Hadoop

unhealthy_nodes

num

number of nodes available to MapReduce jobs marked in an UNHEALTHY state

active_nodes

num

number of nodes presently running MapReduce tasks or jobs

decommissioned_nodes

num

number of nodes allocated to MapReduce applications that have been marked in a DECOMMISSIONED state

lost_nodes

num

number of nodes allocated to MapReduce that have been marked in a LOST state

rebooted_nodes

num

number of nodes available to MapReduce that have been rebooted and marked in a REBOOTED state

total_nodes

num

number of nodes presently available to MapReduce jobs

V_cores

allocated_v_cores

num

number of core nodes working

pending_v_cores

num

number of core nodes waiting to be assigned

Data transfers

total_load

num

total number of concurrent data transfers

YARN memory(%)

yarn_memory_available_percentage

percentage of remaining memory available to YARN (= available_mb / total_mb)

실시간으로 클러스터의 지표 변화를 모니터링 할 수 있습니다.

클러스터 데이터 노드 수를 감소시켰을 때 지표들이 변하는 모습입니다.

사용자는 아래와 같이 그래프에 직접 마우스 커서를 올려서 그래프를 줌 인, 줌 아웃 할 수 있으며, 조회할 기간을 정한 후 대시보드에서 원하는 기간의 지표를 볼 수 있습니다.
chadoop-vpc-monitoring4_ko

아래와 같이

을 클릭하면 차트를 출력할 수 있으며, 다양한 확장자 파일로 그래프를 다운로드할 수 있습니다. 원하는 포맷을 선택하여 데이터를 다운로드해 주십시오.
chadoop-vpc-monitoring5_ko

OS Dashboard

모니터링 페이지에서 클러스터명이 아닌 클러스터 하위의 서버를 선택해 주십시오. OS Dashboard를 볼 수 있습니다.

OS Dashboard에서 데이터는 매분 수집됩니다.
모니터링 정보는 평균값 기준이며, 선택한 기간 유형에 따라 조회 주기가 달라집니다.

Cloud Hadoop 클러스터를 이루고 있는 마스터 노드, 엣지 노드, 데이터 노드를 확인할 수 있고 각각 CPU Usage, LoadAverage, Memory, Disk I/O, Disk usage, Network I/O 지표를 확인할 수 있습니다.