- Print
- PDF
Secure Hadoop configuration (optional)
- Print
- PDF
Available in VPC
Kerberos integrates with the Hadoop cluster to provide strong authentication for users and services.
This guide describes how to configure the authentication system installed in Cloud Hadoop for Secure Hadoop configuration.
Before configuring Secure Hadoop, check whether the cluster is integrated with the Data Catalog service. If the cluster is integrated with external Hive Metastore through Data Catalog, some services may not function properly when Kerberos is applied to the cluster.
Configuration
Cluster administrators can manage users and groups, as well as user authentication and permission management through Kerberos, allowing for detailed authentication configurations in Cloud Hadoop.
Multi-Master configuration
- To maintain service continuity, LDAP and Kerberos services are configured for redundancy by default and are installed on two Cloud Hadoop master nodes.
- On the master nodes, slapd, krb5kdc, kadmin daemons run for authentication services.
Master 1 | Master 2 |
---|---|
LDAP (slapd) | LDAP (slapd) |
Kerberos (krb5kdc / kadmin) | Kerberos (krb5kdc / kadmin) |
Authentication workflow
Cloud Hadoop is designed to authenticate via Kerberos. It is configured with Kerberos and LDAP authentication systems, requiring both users and services to be authenticated by the system.
Every node's Hadoop service possesses a Kerberos principal for authentication. Services have a keytab file stored on the server, which includes a randomized password in the keytab file. Typically, users must obtain a Kerberos ticket via kinit
command to interact with the service.
Kerberos principal
In Kerberos, users are referred to as principals. A Hadoop deployment environment consists of user principals and service principals. User principals are usually synchronized with KDC (Kerberos Distribution Center). One user principal represents one real user. Service principals vary by server and service, meaning each service on a server has its own unique principal.
Keytab files
Keytab files contain Kerberos principals and keys. They allow users and services to authenticate to Hadoop services using keytabs without having to use interactive tools or enter passwords. Hadoop generates a service principal for each node's service. These principals are stored in a keytab file on the Hadoop node.
Preparations for Kerberize
- Ensure that the ambari-agent is running on all nodes within the cluster, including the ambari-server.
- All nodes managed by Ambari (except for two master servers) should have the krb5-workstation package installed. During the 2. Configure Kerberos step under Advanced kerberos-env settings, make sure to uncheck "Install OS-specific Kerberos client package(s)" before proceeding with Kerberize.
- Kerberize requires a complete shutdown of the cluster. Proceed with this preferably before operation of the cluster. (Recommended)
Ambari Kerberize configuration
Start Kerberos configuration
- Access the Ambari UI and follow the sequence by clicking on Cluster Admin > Kerberos in order in the bottom left corner.
- Click the [ENABLE KERBEROS] button.
- After reviewing the content in the warning pop-up window, click the [PROCEED ANYWAY] button.
Kerberos configuration wizard
1. Get Started
What type of KDC do you plan on using? Select Existing MIT KDC from the option. Then, select all three checkboxes below and click the [Next] button.
2. Configure Kerberos
- Configure the following items and click the [Next] button.
KDC
- KDC hosts: enter the host name (FQDN) of the two master nodes where KDC is installed, using a comma (,) as the separator.
- Realm name: enter the Realm set during the Cloud Hadoop installation.
- Click the Test KDC Connection button to test the connectivity.
Kadmin
- Kadmin host: you only have to enter the host name (FQDN) of one master node. If unsure which master node's FQDN to enter, on the master node, enter
kadmin -p admin/admin -q "listprincs"
, then enter kadmin/FQDN@REALM of FQDN. - Admin principal: enter
admin/admin
. - Select the Save Admin Credentials checkbox.
It is crucial to select the Save Admin Credentials checkbox. If not selected, there might be some limitations when using Cloud Hadoop service.
Advanced kerberos-env
- Proceed with Kerberize after unchecking the Install OS-specific Kerberos client package(s).
- Change the Encryption Types to
aes256-cts aes128-cts
. - Add
+requires_preauth
to Principal Attributes.
Advanced krb5.conf
- If Kerberos details were set to be used during the creation of Cloud Hadoop, it is essential to uncheck the Manage Kerberos client krb5.conf checkbox. Uncheck the checkbox, then click the [NEXT] button.
3. Install and Test Kerberos Client
Once the Kerberos configuration task is completed, Install Kerberos Client and Test Kerberos Client will start automatically.
Installation is complete when the message, Kerberos service has been installed and tested successfully appears on the screen. When the installation is complete, click the [Next] button.
In case of an Admin session expiration error, enter the following and click the [SAVE] button.
- Admin principal :
admin/admin
- Admin password: the KDC admin account password configured during cluster creation
- Select the Save Admin Credentials checkbox
If you enter as above and the error persists, verify if there has been a change to the KDC admin account password.
4. Configure Identities
This step involves configuring the service users and Hadoop services' principals and keytab locations.
Check the list of settings that are automatically added by the Ambari Wizard and click the [Next] button.
5. Confirm Configuration
After checking the configuration, click the [Next] button.
6. Stop Services
Once the configuration information is verified, the cluster shutdown process will automatically begin. Click the [Next] button once the shutdown is complete.
7. Kerberize Cluster
The process consists of 7 sequential steps. Click the [Next] button upon completion.
8. Start and Test Services
This step involves starting and verifying the Hadoop service. Click the [Complete] button upon completion.
9. Check Admin - Kerberos Enabled status
Once the message, Kerberos security is enabled is displayed on the screen, the cluster has successfully completed the Kerberize task.
Check Kerberize application
To check if Kerberize has been applied, verify the Hadoop service principal and test by executing hadoop fs
commands.
The following example assumes that Kerberos information was set to be used during the creation of Cloud Hadoop. (ex. Realm : NAVERCORP.COM)
After completing the Ambari Kerberize configuration, execute it using the
kadmin -p admin/admin -q "listprincs"
command.- You can see that the Hadoop service principal has been created, and an error occurs when you run the
hadoop fs
command.
NoteIf Kerberize has not been applied in Ambari, executing the
kadmin -p admin/admin -q "listprincs"
command on the master node will display as the following: You can execute thehadoop fs
command with thesshuser
default account to see the results without checking permissions as follows:- You can see that the Hadoop service principal has been created, and an error occurs when you run the
Obtain the admin account ticket using the
kinit
command and execute thehadoop fs
command again.- The result values should be displayed correctly.
- Deleting the ticket with the
kdestroy
command and re-executing thehadoop fs
command will confirm that an error occurs.