Configuring Secure Hadoop (optional)

Prev Next

Available in VPC

Kerberos integrates with the Hadoop cluster to provide strong authentication for users and services.

This guide describes how to configure the authentication system installed in Cloud Hadoop for Secure Hadoop configuration.

Note

Before configuring Secure Hadoop, check whether the cluster is integrated with the Data Catalog service. If the cluster is integrated with external Hive Metastore through Data Catalog, some services may not function properly when Kerberos is applied to the cluster.

Configuration

The cluster administrator can configure detailed authentication for Cloud Hadoop, including not only integrated management of users and groups but also user authentication and permission management through Kerberos.

Multi-Master configuration

  • To maintain service continuity, LDAP and Kerberos service redundancy configurations are installed and provided by default on two Cloud Hadoop master nodes.
  • On the master nodes, slapd, krb5kdc, kadmin daemons run for authentication services.
Master 1 Master 2
LDAP (slapd) LDAP (slapd)
Kerberos (krb5kdc / kadmin) Kerberos (krb5kdc / kadmin)

Authentication Workflow

Cloud Hadoop is designed to authenticate via Kerberos. It is configured with Kerberos and LDAP system authentication systems, and users and services must be authenticated to the system.

chadoop-3-7-00_ko

Every node's Hadoop service possesses a Kerberos principal for authentication. Services have a keytab file stored on the server, and a keytab file includes a randomized password. Typically, users must obtain a Kerberos ticket via kinit command to interact with the service.

Kerberos’s principal

In Kerberos, a user is referred to as a principal. A Hadoop deployment environment consists of user principals and service principals. User principals are usually synchronized with Kerberos Distribution Center (KDC). Each user principal represents each actual user. Service principals vary by server and service, meaning each service on a server has its own unique principal.

Keytab file

Keytab files contain Kerberos principals and keys. They allow users and services to authenticate to Hadoop services using keytabs without having to use interactive tools or enter passwords. Hadoop generates a service principal for each node's service. These principals are stored in a keytab file on the Hadoop node.

Preliminary tasks for Kerberize

  • Ensure that the ambari-agent is running on all nodes within the cluster, including the ambari-server.
  • All nodes managed by Ambari (except for 2 master servers) should have the krb5-workstation package installed. While setting Advanced kerberos-env in 2. Configure Kerberos below, make sure to uncheck "Install OS-specific Kerberos client package(s)" before proceeding with Kerberize.
    chadoop-3-7-03-02-0_ko
  • Kerberize requires a complete shutdown of the cluster. Proceed with this preferably before operation of the cluster. (Recommended)

Ambari Kerberize settings

Start Kerberos settings

  1. Access the Ambari UI and click Cluster Admin > Kerberos in order on the bottom left.
  2. Click the [ENABLE KERBEROS] button.
    chadoop-3-7-02-01_ko
  3. Check the details in the warning pop-up window, and click the [PROCEED ANYWAY] button.
    chadoop-3-7-02-02_ko

Kerberos setup wizard

1. Get Started

What type of KDC do you plan on using? Select Existing MIT KDC from the option. Then, select all 3 checkboxes as below and click the [Next] button.
chadoop-3-7-03-01_ko

2. Configure Kerberos

  1. Set each of the following items and click the [Next] button.

KDC

  • KDC hosts: enter the host name (FQDN) of the 2 master nodes where KDC is installed, using a comma (,) as the separator.
  • Realm name: enter the Realm set during the Cloud Hadoop installation.
  • Click the Test KDC Connection button to test the connectivity.

Kadmin

  • Kadmin host: you only have to enter the host name (FQDN) of 1 master node. If unsure which master node's FQDN to enter, enter kadmin -p admin/admin -q "listprincs" on the master node, then enter kadmin/FQDN@REALM’s FQDN.
  • Admin principal : enter admin/admin.
  • Select the Save Admin Credentials checkbox.
    chadoop-3-7-03-02-01_ko
Caution

Make sure to click the Save Admin Credentials checkbox. Otherwise, use of Cloud Hadoop service may be restricted.

Advanced kerberos-env

  • Uncheck Install OS-specific Kerberos client package(s) before proceeding with Kerberize.
    chadoop-3-7-03-02-0_ko
  • Change the Encryption Types to aes256-cts aes128-cts.
  • Add to +requires_preauth to Principal Attributes.
    chadoop-3-7-03-02-1_ko

Advanced krb5.conf

  • If Kerberos details were set to be used during the creation of Cloud Hadoop, it is essential to uncheck the Manage Kerberos client krb5.conf checkbox. After unchecking, click the [NEXT] button.
    chadoop-3-7-03-02-2_ko

3. Install and Test Kerberos Client

Once the Kerberos configuration task is completed, Install Kerberos Client and Test Kerberos Client will start automatically.
Installation is complete when Kerberos service has been installed and tested successfully message appears on the screen. Once installation is complete, click the [Next] button.

chadoop-3-7-03-03_ko

Note

In case of an Admin session expiration error, enter the following and click the [SAVE] button.

  • Admin principal : admin/admin
  • Admin password: the KDC admin account password set during cluster creation
  • Select the Save Admin Credentials checkbox

If you enter as above yet the error persists, check if there has been a change to the KDC admin account password.

4. Configure Identities

This step involves configuring the service users and Hadoop services' principals and keytab locations.
Check the list of settings that are automatically added by the Ambari Wizard and click the [Next] button.
chadoop-3-7-03-04_ko

5. Confirm Configuration

After checking the setup information, click the [Next] button.
chadoop-3-7-03-05_ko

6. Stop Services

Once the setup information is verified, the cluster shutdown process will automatically begin. Once the shutdown process is complete, click the [Next] button.
chadoop-3-7-03-06_ko

7. Kerberize Cluster

The process consists of 7 sequential steps. Once all the steps are completed, click the [Next] button.
chadoop-3-7-03-07_ko

8. Start and Test Services

This steps involves operating and checking Hadoop service. Once it is completed, click the [Next] button.
chadoop-3-7-03-08_ko

9. Check Admin - Kerberos Enabled status

Once the Kerberos security is enabled message is displayed on the screen, the cluster has successfully completed the Kerberize task.
chadoop-3-7-04-1_ko

Check Kerberize application

To check if Kerberize has been applied, check the Hadoop service principal and test by executing hadoop fs command.
The following example is based on the premise that the Kerberos information is set to be used when creating Cloud Hadoop. (ex. Realm : NAVERCORP.COM)

  1. After completing the Ambari Kerberize settings work above, execute it using the kadmin -p admin/admin -q "listprincs" command.

    • You can see that the Hadoop service principal has been created, and when executing the hadoop fs command, an error will occur.

    chadoop-3-7-05-02_ko

    Note

    If Kerberize has not been applied in Ambari, when you execute the kadmin -p admin/admin -q "listprincs" command on the master node, it will be displayed as follows: You can execute the hadoop fs command with the sshuser default account to see the results without checking permissions as follows:

    chadoop-3-7-05-01_ko

  2. Obtain the admin account ticket using the kinit command and execute the hadoop fs command again.

    • The result values should be displayed correctly.
    • Deleting the ticket with the kdestroy command and re-executing the hadoop fs command will confirm that an error occurs.

    chadoop-3-7-05-03_ko

Chang HDFS Log Level

In Hadoop, you can manage and control logs created in various components (HDFS, YARN, MapReduce, and so on) by using log4j.
By adjusting the log level, you can collect less or more information on HDFS tasks. There are 7 levels of adjustable log levels: ALL, DEBUG, INFO, WARN, ERROR, FATAL, and OFF. You need to adjust the level as necessary, as unnecessary information may be included and take up space, depending on the level.

The following describes how to change the log level of the HDFS audit.log.
chadoop-3-7-06-02

  1. Select Ambari Web UI > HDFS > Configs > Advanced > Advanced hadoop-env.
  2. At hadoop-env template, change the -Dhdfs.audit.logger value of SHARED_HDFS_NAMENODE_OPTS to the log level you want.
  3. Click the [SAVE] button and restart the service.