使用 Monitoring 控制台
-
打印
-
PDF
使用 Monitoring 控制台
-
打印
-
PDF
可在VPC环境下使用。
Monitoring提供两种仪表盘,可方便用户查看与Cloud Hadoop性能和记录有关的各种监控信息。Monitoring服务包含在NAVER Cloud Platform的Cloud Hadoop中,无需额外付费即可使用。
Monitoring提供的仪表盘种类如下:
- HADOOP Dashboard: 与正在运行的Cloud Hadoop有关的监控信息
- OS Dashboard: 正在运行的Cloud Hadoop各服务器的硬件和网络信息
通过这两种仪表盘,用户可以查看最近两个月内的Cloud Hadoop相关信息以及各服务器的硬件和网络指标。各仪表盘由图表构成,用户可以输出特定的图表,也可以将其作为多种格式的文件下载到本地PC,从而更有效地开展业务。
参考
可以设置为在监控结果的特定指标超过阈值或满足特定条件时识别为事件并向用户发送通知。关于事件和通知设置的具体使用方法,请参考通过Cloud Insight监控Cloud Hadoop。
启动Monitoring
- 在NAVER Cloud Platform控制台依次点击Services > Big Data & Analytics > Cloud Hadoop菜单。
- 点击 [创建集群] 按钮,然后创建Cloud Hadoop集群。
- 关于创建集群的详细内容,请参考启动Cloud Hadoop指南。
- 点击左侧的Cloud Hadoop > Monitoring菜单。
- 在Cloud Hadoop集群列表上点击要监控的集群。
Monitoring界面
关于Monitoring的基本使用说明如下:
- 可以从左侧区域选择当前正在运行的Cloud Hadoop集群以及各集群的服务器。
- 如果点击集群名称,右侧区域会显示HADOOP Dashboard;如果点击集群名称下方的服务器,则会显示OS Dashboard。
查看Monitoring仪表盘
Monitoring提供的仪表盘由多个图表构成。用户可以按集群在想要查看的仪表盘中显示并直观地查看所需的信息。仪表盘的使用方法如下:
HADOOP Dashboard
- 在左侧Cloud Hadoop集群列表中点击所需集群即可显示如右侧所示的HADOOP Dashboard。
- HADOOP Dashboard上的数据是以分钟为单位收集的。
- 监控信息以平均值为准,查询周期将根据所选期间的类型而变化。
- 各群组中可查看的指标具体如下:
群体 | 指标名 | 单位 | 说明 |
---|---|---|---|
Apps | apps_completed | num | number of applications submitted to YARN that have completed |
apps_failed | num | number of applications submitted to YARN that have failed to complete | |
apps_killed | num | number of applications submitted to YARN that have been killed | |
apps_pending | num | number of applications submitted to YARN that are in a pending state | |
apps_running | num | number of applications submitted to YARN that are running | |
apps_submitted | num | number of applications submitted to YARN | |
Blocks | corrupt_blocks | num | number of blocks that HDFS reports as corrupted |
missing_blocks | num | number of blocks in which HDFS has no replicas | |
pending_deletion_blocks | num | number of blocks marked for deletion | |
pending_replication_blocks | num | status of block replication: blocks being replicated, age of replication requests, and unsuccessful replication requests | |
under_replicated_blocks | num | number of blocks that need to be replicated one or more times | |
Containers | allocated_container | num | number of resource containers allocated by the ResourceManager |
pending_containers | num | number of containers in the queue that have not yet been allocated | |
reserved_containers | num | number of containers reserved | |
HDFS capacity(GB) | capacity_remaining_gb | GB | amount of remaining HDFS disk capacity |
HDFS read/write(bytes) | hdfs_bytes_read | num | number of bytes read from HDFS |
hdfs_bytes_written | num | number of bytes written to HDFS | |
HDFS utilization(%) | hdfs_utilization | % | percentage of HDFS storage currently used |
Memory(MB) | allocated_mb | MB | amount of memory allocated to the cluster |
available_mb | MB | amount of memory available to be allocated | |
reserved_mb | MB | amount of memory reserved | |
total_mb | MB | total amount of memory in the cluster | |
Nodes | num_live_data_nodes | num | number of data nodes that are receiving work from Hadoop |
unhealthy_nodes | num | number of nodes available to MapReduce jobs marked in an UNHEALTHY state | |
active_nodes | num | number of nodes presently running MapReduce tasks or jobs | |
decommissioned_nodes | num | number of nodes allocated to MapReduce applications that have been marked in a DECOMMISSIONED state | |
lost_nodes | num | number of nodes allocated to MapReduce that have been marked in a LOST state | |
rebooted_nodes | num | number of nodes available to MapReduce that have been rebooted and marked in a REBOOTED state | |
total_nodes | num | number of nodes presently available to MapReduce jobs | |
V_cores | allocated_v_cores | num | number of core nodes working |
pending_v_cores | num | number of core nodes waiting to be assigned | |
Data transfers | total_load | num | total number of concurrent data transfers |
YARN memory(%) | yarn_memory_available_percentage | % | percentage of remaining memory available to YARN (= available_mb / total_mb) |
- 可以实时监控集群的指标变化。
- 下图所示的是集群数据节点数量减少时指标变化的样子。
- 下图所示的是集群数据节点数量减少时指标变化的样子。
- 用户可以手动将鼠标光标悬停在图表上以放大或缩小图表,也可以指定查询期间后在仪表盘上查看所需期间的指标。
- 如果点击
如图您可以打印图表并下载各种文件扩展名的图表。请选择所需的格式以下载数据。
OS Dashboard
- 请在监控页面上选择集群下方的服务器而非集群名称。此时可以查看OS Dashboard。
- OS Dashboard上的数据是以分钟为单位收集的。
- 监控信息以平均值为准,查询周期将根据所选期间的类型而变化。
- 可以查看组成Cloud Hadoop集群的主节点、边缘节点和数据节点,也可以分别查看这些节点的CPU Usage、LoadAverage、Memory、Disk I/O、Disk usage和Network I/O等指标。
本文是否有帮助