Available in VPC
Kerberos integrates with the Hadoop cluster to provide strong authentication for users and services.
This guide describes how to configure the authentication system installed in Cloud Hadoop for Secure Hadoop configuration.
Before configuring Secure Hadoop, check whether the cluster is integrated with the Data Catalog service. If the cluster is integrated with external Hive Metastore through Data Catalog, some services may not function properly when Kerberos is applied to the cluster.
Configuration
The cluster administrator can configure detailed authentication for Cloud Hadoop, including not only integrated management of users and groups but also user authentication and permission management through Kerberos.
Multi-Master configuration
- To maintain service continuity, LDAP and Kerberos service redundancy configurations are installed and provided by default on two Cloud Hadoop master nodes.
- On the master nodes, slapd, krb5kdc, kadmin daemons run for authentication services.
| Master 1 | Master 2 |
|---|---|
| LDAP (slapd) | LDAP (slapd) |
| Kerberos (krb5kdc / kadmin) | Kerberos (krb5kdc / kadmin) |
Authentication Workflow
Cloud Hadoop is designed to authenticate via Kerberos. It is configured with Kerberos and LDAP system authentication systems, and users and services must be authenticated to the system.

Every node's Hadoop service possesses a Kerberos principal for authentication. Services have a keytab file stored on the server, and a keytab file includes a randomized password. Typically, users must obtain a Kerberos ticket via kinit command to interact with the service.
Kerberos’s principal
In Kerberos, a user is referred to as a principal. A Hadoop deployment environment consists of user principals and service principals. User principals are usually synchronized with Kerberos Distribution Center (KDC). Each user principal represents each actual user. Service principals vary by server and service, meaning each service on a server has its own unique principal.
Keytab file
Keytab files contain Kerberos principals and keys. They allow users and services to authenticate to Hadoop services using keytabs without having to use interactive tools or enter passwords. Hadoop generates a service principal for each node's service. These principals are stored in a keytab file on the Hadoop node.
Preliminary tasks for Kerberize
- Ensure that the ambari-agent is running on all nodes within the cluster, including the ambari-server.
- All nodes managed by Ambari (except for 2 master servers) should have the krb5-workstation package installed. While setting Advanced kerberos-env in 2. Configure Kerberos below, make sure to uncheck "Install OS-specific Kerberos client package(s)" before proceeding with Kerberize.

- Kerberize requires a complete shutdown of the cluster. Proceed with this preferably before operation of the cluster. (Recommended)
Ambari Kerberize settings
Start Kerberos settings
- Access the Ambari UI and click Cluster Admin > Kerberos in order on the bottom left.
- Click the [ENABLE KERBEROS] button.

- Check the details in the warning pop-up window, and click the [PROCEED ANYWAY] button.

Kerberos setup wizard
1. Get Started
What type of KDC do you plan on using? Select Existing MIT KDC from the option. Then, select all 3 checkboxes as below and click the [Next] button.

2. Configure Kerberos
- Set each of the following items and click the [Next] button.
KDC
- KDC hosts: enter the host name (FQDN) of the 2 master nodes where KDC is installed, using a comma (,) as the separator.
- Realm name: enter the Realm set during the Cloud Hadoop installation.
- Click the Test KDC Connection button to test the connectivity.
Kadmin
- Kadmin host: you only have to enter the host name (FQDN) of 1 master node. If unsure which master node's FQDN to enter, enter
kadmin -p admin/admin -q "listprincs"on the master node, then enter kadmin/FQDN@REALM’s FQDN. - Admin principal : enter
admin/admin. - Select the Save Admin Credentials checkbox.

Make sure to click the Save Admin Credentials checkbox. Otherwise, use of Cloud Hadoop service may be restricted.
Advanced kerberos-env
- Uncheck Install OS-specific Kerberos client package(s) before proceeding with Kerberize.

- Change the Encryption Types to
aes256-cts aes128-cts. - Add to
+requires_preauthto Principal Attributes.

Advanced krb5.conf
- If Kerberos details were set to be used during the creation of Cloud Hadoop, it is essential to uncheck the Manage Kerberos client krb5.conf checkbox. After unchecking, click the [NEXT] button.

3. Install and Test Kerberos Client
Once the Kerberos configuration task is completed, Install Kerberos Client and Test Kerberos Client will start automatically.
Installation is complete when Kerberos service has been installed and tested successfully message appears on the screen. Once installation is complete, click the [Next] button.

In case of an Admin session expiration error, enter the following and click the [SAVE] button.
- Admin principal :
admin/admin - Admin password: the KDC admin account password set during cluster creation
- Select the Save Admin Credentials checkbox
If you enter as above yet the error persists, check if there has been a change to the KDC admin account password.
4. Configure Identities
This step involves configuring the service users and Hadoop services' principals and keytab locations.
Check the list of settings that are automatically added by the Ambari Wizard and click the [Next] button.

5. Confirm Configuration
After checking the setup information, click the [Next] button.

6. Stop Services
Once the setup information is verified, the cluster shutdown process will automatically begin. Once the shutdown process is complete, click the [Next] button.

7. Kerberize Cluster
The process consists of 7 sequential steps. Once all the steps are completed, click the [Next] button.

8. Start and Test Services
This steps involves operating and checking Hadoop service. Once it is completed, click the [Next] button.

9. Check Admin - Kerberos Enabled status
Once the Kerberos security is enabled message is displayed on the screen, the cluster has successfully completed the Kerberize task.

Check Kerberize application
To check if Kerberize has been applied, check the Hadoop service principal and test by executing hadoop fs command.
The following example is based on the premise that the Kerberos information is set to be used when creating Cloud Hadoop. (ex. Realm : NAVERCORP.COM)
-
After completing the Ambari Kerberize settings work above, execute it using the
kadmin -p admin/admin -q "listprincs"command.- You can see that the Hadoop service principal has been created, and when executing the
hadoop fscommand, an error will occur.
NoteIf Kerberize has not been applied in Ambari, when you execute the
kadmin -p admin/admin -q "listprincs"command on the master node, it will be displayed as follows: You can execute thehadoop fscommand with thesshuserdefault account to see the results without checking permissions as follows:
- You can see that the Hadoop service principal has been created, and when executing the
-
Obtain the admin account ticket using the
kinitcommand and execute thehadoop fscommand again.- The result values should be displayed correctly.
- Deleting the ticket with the
kdestroycommand and re-executing thehadoop fscommand will confirm that an error occurs.

Chang HDFS Log Level
In Hadoop, you can manage and control logs created in various components (HDFS, YARN, MapReduce, and so on) by using log4j.
By adjusting the log level, you can collect less or more information on HDFS tasks. There are 7 levels of adjustable log levels: ALL, DEBUG, INFO, WARN, ERROR, FATAL, and OFF. You need to adjust the level as necessary, as unnecessary information may be included and take up space, depending on the level.
The following describes how to change the log level of the HDFS audit.log.

- Select Ambari Web UI > HDFS > Configs > Advanced > Advanced hadoop-env.
- At hadoop-env template, change the
-Dhdfs.audit.loggervalue ofSHARED_HDFS_NAMENODE_OPTSto the log level you want. - Click the [SAVE] button and restart the service.