Getting started with Data Forest
    • PDF

    Getting started with Data Forest

    • PDF

    Article Summary

    Available in VPC

    If you've checked application specifications provided by Data Forest and duly noted all the scenarios, then you are now ready to start using Data Forest. This guide describes the process of creating a notebook and configuring the client environment to access Data Forest and the Data Forest app.

    Create notebook

    The following describes how to create a notebook.

    Preparations

    Create a VPC and a subnet to establish effective network access control.

    1. Click the Services > Big Data & Analytics > Data Forest menus, in that order.
    2. Click the [Create notebook] button from Notebooks.
    3. Enter the notebook settings information, and then click the [Next] button.
      • Account name: enter "df123"
      • Notebook name: enter "my-notebook"
      • VPC/Subnet: enter the information you have created during the preparations for VPC/Subnet
    4. If user settings are required, enter the relevant information.
    5. Select an authentication key that you have from Set authentication key or create a new one, and click the [Next] button.
    6. After the final check, click the [Create] button.

    Set up development environments from notebook nodes

    Once the notebook creation has been complete, you can configure the development environment to easily access the Data Forest cluster and its app through docker containers in the VPC environment.

    Note

    This scenario assumes that the host is CentOS 7.3.

    Step 1. Connect to notebook node and docker

    To access the docker running on the notebook node, you have 2 available methods.

    • Accessing the docker through the notebook WEB UI
    • Connecting to the running docker after SSH access to the notebook node
    Note
    • For more information about how to connect to the notebook, see Create and manage notebook.
    • In the Notebook docker provided by Data Forest, a distinct overlay network configuration is established to access the network where the Data Forest app is running.

    Step 2. Confirm and authenticate keytab

    To access Data Forest components, you must complete Kerberos authentication. Use the keytab file downloaded from the access information after creating the account.

    When creating a notebook node, the keytab file of the Data Forest account is downloaded into the docker. To locate the file, go to the following path.

    • User keytab download path
      • /Home/forest/keytab, the home directory for the forest account

    Run commands as follows to authenticate.

    [forest@0242f09990ad ~][df]$ cd keytabs/
    [forest@0242f09990ad keytabs][df]$ ll
    total 4
    -rw-r--r-- 1 forest forest 218 Dec 21 15:19 df.example.keytab
    [forest@0242f09990ad keytabs][df]$ kinit example -kt df.example.keytab
    [forest@0242f09990ad keytabs][df]$ klist
    Ticket cache: FILE:/tmp/krb5cc_500
    Default principal: example@KR.DF.NAVERNCP.COM
    
    Valid starting       Expires              Service principal
    12/21/2020 17:07:42  12/22/2020 17:07:42  krbtgt/KR.DF.NAVERNCP.COM@KR.DF.NAVERNCP.COM
    	renew until 12/28/2020 17:07:42
    

    Run the kdestroy command to delete the authentication history.

    [forest@0242f09990ad keytabs][df]$ kdestroy
    [forest@0242f09990ad keytabs][df]$ klist
    klist: No credentials cache found (filename: /tmp/krb5cc_500)
    
    Note

    User authentication can't be made without a keytab file. Permission errors may occur in all actions.

    Step 3. Use development environment

    1. Confirm environment variables

    The environment variables required for using commands such as hadoop, yarn, and spark-submit have already been specified.

    [forest@0242f09990ad keytabs][df]$ cat /etc/profile.d/zz-df-env.sh
    # DO NOT EDIT THIS LINE
    # FOR CLUSTER df ENVIRONMENTS
    export HADOOP_CONF_DIR=/etc/hadoop/conf
    export HIVE_CONF_DIR=/etc/hive/conf
    export SPARK_CONF_DIR=/etc/spark2/conf
    ...
    

    2. Run various commands

    [forest@0242f09990ad keytabs][df]$ hadoop fs -touch /user/example/test.txt
    [forest@0242f09990ad keytabs][df]$ hadoop fs -ls
    Found 4 items
    drwxr-xr-x   - example       services          0 2020-12-21 16:33 .sparkStaging
    drwxr-x---   - example       services          0 2020-12-21 15:21 .yarn
    -rw-------   3 example       services          0 2020-12-21 17:10 test.txt
    
    Note
    • You can't access the files located in paths other than the user's HDFS HOME (/user/${USER}).
    • If user authentication hasn't been completed, the message xxxxx appears when running commands. For authentication, see Authenticate and delete authentication history.

    You can view the applications users created and change their status as follows:

    [forest@0242f09990ad keytabs][df]$ yarn app -list
    20/12/21 17:11:43 INFO client.AHSProxy: Connecting to Application History server at rm1.kr.df.naverncp.com/10.213.198.24:10200
    Total number of applications (application-types: [], states: [SUBMITTED, ACCEPTED, RUNNING] and tags: []):1
                    Application-Id	    Application-Name	    Application-Type	      User     Queue	             State	       Final-State	       Progress	                       Tracking-URL
    application_1608526482493_0002	                 dev	        yarn-service	   example       dev	           RUNNING	         UNDEFINED	           100%	                                N/A
    

    You can view Oozie jobs as follows:

    [forest@0242f09990ad keytabs][df]$ oozie jobs
    Job ID                                   App Name     Status    User      Group     Started                 Ended
    ------------------------------------------------------------------------------------------------------------------------------------
    0000000-201125175300661-oz-df-W          no-op-wf     SUCCEEDED example -         2020-11-25 08:56 GMT    2020-11-25 08:56 GMT
    ------------------------------------------------------------------------------------------------------------------------------------
    

    You can run commands using spark-shell as follows:

    [forest@f095a749f891 ~][df]$ spark-shell --master local
    Warning: Ignoring non-spark config property: history.server.spnego.keytab.file=/etc/security/keytabs/spnego.service.keytab
    Warning: Ignoring non-spark config property: history.server.spnego.kerberos.principal=HTTP/_HOST@KR.DF.NAVERNCP.COM
    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
    Spark context Web UI available at http://f095a749f891:4040
    Spark context available as 'sc' (master = local, app id = local-1608542188370).
    Spark session available as 'spark'.
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _\ \/ _ \/ _ `/ __/  '_/
       /___/ .__/_,_/_/ /_/_\   version 2.3.2.3.1.0.0-78
          /_/
    
    Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112)
    Type in expressions to have them evaluated.
    Type :help for more information.
    
    scala> sc
    res0: org.apache.spark.SparkContext = org.apache.spark.SparkContext@b90c5a5
    

    You can use spark-submit to submit JAR files. In the example below, the build was done under the name "example.jar," and Spark2's README.md file was used as the input text file. You can check the word count result in the stdout log of the application.

    [forest@090aea7192a2 ~][df]$ spark-submit --class com.naverncp.example.SparkWordCount \
    --master yarn --deploy-mode cluster --executor-memory 1g --name wordcount --conf "spark.app.id=wordcount" \
    example.jar file:///usr/hdp/current/spark2-client/README.md
    

    The SparkWordCount.scala code is as follows:

    package com.naverncp.example
    
    import org.apache.spark.{SparkConf, SparkContext}
    
    object SparkWordCount {
      def main(args: Array[String]): Unit = {
        val sc = new SparkContext(new SparkConf().setAppName("WordCount Example"))
        val tokenized = sc.textFile(args(0)).flatMap(_.split(" "))
        val wordCounts = tokenized.map((_, 1)).reduceByKey(_ + _)
        println(wordCounts.collect().mkString(", "))
      }
    }
    
    

    3. Configure client for app

    Steps 1 to 3 were about how to check the client configuration for a multi-tenant cluster. This chapter explains how to configure the client for the HBASE-2.0.0, HBASE-2.2.3, and KAFKA-2.4.0 apps. Additional environment variables should be set up before configuring the client.

    Run get-app-env.sh to automatically set up client environment variables for the Data Forest app.

    $ pwd
    /home/forest
    $ mkdir ${DIR}
    $ sh /home/forest/get-app-env.sh ${APP_NAME} ~/${DIR}
    

    HBASE-2.0.0
    The following describes how to configure the client for the HBASE-2.0.0 app. (app name: secure-hbase)

    [forest@0242f09990ad ~][df]$ mkdir secure-hbase
    [forest@0242f09990ad ~][df]$ sh /home/forest/get-app-env.sh secure-hbase ~/secure-hbase
    [forest@0242f09990ad ~][df]$ sh /home/forest/get-app-env.sh secure-hbase ~/secure-hbase
    [/home/forest/get-app-env.sh] Apptype: HBASE-2.0.0
    [/home/forest/get-app-env.sh] Download install-client script for HBASE-2.0.0
    [/home/forest/get-app-env.sh] Install client on /home/forest/secure-hbase
    current secure-hbase: .yarn/services/secure-hbase/components/v1
    HBase-2.0.0 Client has been installed on /home/forest/secure-hbase
    ==============================================================================================
    kinit <user>
    export HBASE_CONF_DIR=/home/forest/secure-hbase
    hbase shell
    ==============================================================================================
    

    HBASE-2.2.3
    The following describes how to configure the client for the HBASE-2.2.3 app. (app name: unsecure-hbase)

    $ mkdir unsecure-hbase
    $ sh /home/forest/get-app-env.sh unsecure-hbase ~/unsecure-hbase
    

    KAFKA-2.4.0
    The following describes how to configure the client for the KAFKA-2.4.0 app. (app name: kafka)

    $ mkdir kafka
    $ sh /home/forest/get-app-env.sh kafka ~/kafka
    

    Was this article helpful?

    Changing your password will log you out immediately. Use the new password to log back in.
    First name must have atleast 2 characters. Numbers and special characters are not allowed.
    Last name must have atleast 1 characters. Numbers and special characters are not allowed.
    Enter a valid email
    Enter a valid password
    Your profile has been successfully updated.