Cloud Hadoop troubleshooting

Prev Next

Available in VPC

You might run into the following problems when using Cloud Hadoop. Find out causes and possible solutions.

OOM (Out of Memory) occurred

An OOM (Out of Memory) caused the server to hang.

Cause

When system memory usage increases rapidly, the kernel’s OOM Killer terminates processes that consume a large amount of memory. If this results in a kernel process being terminated, the server can hang.

Solution

How to respond when a server hang occurs
Contact Customer Support to request a VM reboot.

How to prevent server hangs

  • Set up a monitoring batch for ping checks and process supervision to periodically check node status.
  • Scale out the edge nodes or master nodes that run jobs to distribute load.
  • Scale up node specifications to increase memory capacity.

Cluster operation error after changing settings

After changing settings in Ambari, the cluster does not operate normally.

Cause

Changing settings through Ambari can unintentionally affect related settings and cause the cluster to behave abnormally.

Solution

Ambari stores cluster settings and Version numbers in chronological order. You can roll back to the version before the abnormal behavior and restart.

The following explains how to roll back HDFS settings to a previous version.

  1. In Ambari, click Services > HDFS > Configs tap.
  2. Click cloudhadoop-vpc-troubleshoot-icon button to compare with the previous version.
    cloudhadoop-vpc-troubleshoot-ambariconfig01
  3. Check the comparison screen for Version 2 (working settings) and Version 3 (misconfigured settings).
    cloudhadoop-vpc-troubleshoot-ambariconfig02.png
  4. Click Version 2, the version to switch to.
  5. To roll back to the previous settings, click [Make current].
    cloudhadoop-vpc-troubleshoot-ambariconfig03.png
  6. After confirming that a new version number has been assigned, click [Restart].
    cloudhadoop-vpc-troubleshoot-ambariconfig04.png

Forgot the Ambari login password

Forgot the Ambari login password.

Cause

You no longer remember the cluster administrator account and password you entered when creating the cluster.

Solution

Refer to Initialize cluster admin password and reset the password.

Zeppelin Notebook access errors

You are using a Spark cluster but cannot connect to Zeppelin Notebook.

Cause

  • Zeppelin Notebook is not running.
  • SSH tunneling is misconfigured.

Solution

  • Access the Ambari Web UI and verify that Zeppelin Notebook is running properly.
  • If Zeppelin Notebook is running but you still cannot connect, check your tunneling configuration.

Security vulnerability report

The security.datanode.protocol.acl setting is set to *, and it was reported as a security vulnerability.

Cause

security.datanode.protocol.acl is a property key that specifies the users and groups that can access data nodes. By default it is *, which allows access for all users, but you can change the permission.

Solution

Starting with Cloud Hadoop 2.3, security.datanode.protocol.acl is provided as hdfs hadoop.

If you created a lower version of Cloud Hadoop or operate a cluster with *, you can modify the access rules as follows.

  • Separate users and groups with a space ( ).
  • Separate entries in the user list and the group list with commas (,).

Example:
To allow the users alice and bob and the groups users and wheel, configure a rule like this:
alice,bob users,wheel

Because the users for each component that Cloud Hadoop creates automatically belong to the hadoop group, you can configure it as follows:
security.datanode.protocol.acl=hdfs,custom_user1,custom_user2 hadoop,custom_group1,custom_group2

Learning resources

We offer various materials for you to explore. To learn more about Cloud Hadoop, check out these helpful links:

Note

If you're still having trouble finding what you need, click on the feedback icon and send us your thoughts and requests. We'll use your feedback to improve this guide.