Using Dev

Available in VPC

The Dev app acts as a client for the services provided by Data Forest. In the Dev app, you can use Hadoop commands or submit Spark jobs to the YARN cluster.

Note

For more information on how to create apps, see Create and manage apps.

Check Dev app details

Once the app is created, you can view its details. If the Status in the app details is Stable, the app is running normally.
To view app details:

In the VPC environment of the NAVER Cloud Platform console, navigate to Menu > Services > Big Data & Analytics > Data Forest > Apps.
Select an account.
Click the app to view its details.
Review the app details.
- Quick links: You can access the following links:
  - supervisor: Supervisor URL for monitoring and managing container app processes.
  - shell: URL for accessing the GNU/Linux terminal (TTY) via a web browser.
- Component: DEV-1.0.0 type is composed of 1 shell component.
  - shell: Default memory, CPU, and container count set to the minimum recommended values.

Example:
The following shows the shell access interface:
df-dev_004_vpc_ko

Authenticate with Kerberos

You must authenticate with Kerberos before executing Hadoop commands or submitting Spark jobs to the cluster.
Perform Kerberos authentication using the keytab in the user path.

$ ls -al
total 20
drwxr-s--- 4 test01 hadoop  138 Dec 16 17:57 .
drwxr-s--- 4 test01 hadoop   74 Dec 16 17:44 ..
-rw-r--r-- 1 test01 hadoop  231 Dec 16 17:36 .bashrc
-rw------- 1 test01 hadoop  302 Dec 16 17:36 container_tokens
-rw-r--r-- 1 test01 hadoop  245 Dec 16 17:57 test01.service.keytab
lrwxrwxrwx 1 test01 hadoop  101 Dec 16 17:36 gotty -> /data1/hadoop/yarn/local/usercache/test01/appcache/application_1607671243914_0024/filecache/10/gotty
-rwx------ 1 test01 hadoop 6634 Dec 16 17:36 launch_container.sh
drwxr-S--- 3 test01 hadoop   19 Dec 16 17:53 .pki
drwxr-s--- 2 test01 hadoop    6 Dec 16 17:36 tmp
$ kinit test01/app -kt test01.service.keytab
$ klist 
Ticket cache: FILE:/tmp/krb5cc_20184
Default principal: test01/app@KR.DF.NAVERNCP.COM

Valid starting       Expires              Service principal
12/16/2020 17:39:57  12/17/2020 17:39:56  krbtgt/KR.DF.NAVERNCP.COM@KR.DF.NAVERNCP.COM
        renew until 12/23/2020 17:39:56

Caution

The message kinit: Password incorrect while getting initial credentials is an error that occurs when the given keytab does not match the account.

Check environment variables

The environment variables required to use the Data Forest cluster's services are already specified in the Dev app.
To view the environment variables:

$ echo $HADOOP_HOME
/usr/nch/current/hadoop-client
$ echo $SPARK_HOME
/usr/nch/current/spark2-client

Use Hadoop dfs command

The dfs command runs the file system shell. The file system shell includes commands to interact with file systems supported by Hadoop, such as HDFS, Local FS, WebHDFS, and S3 FS.

The dfs command can be executed in the format hdfs dfs, hadoop fs, or hadoop dfs. To run a file system task:

[test01@shell-0.******.kr.ch.naverncp.com ~][df]$ hadoop fs -ls
Found 30 items
…
-rw-r--r--   3 test01 services        215 2021-04-09 11:35 test01.service.keytab
drwx------   - test01 services          0 2021-05-11 12:21 grafana
drwx------   - test01 services          0 2021-05-07 14:55 hue
…

Note

For more information on file system shells, see Here.

Submit Spark job using spark-shell

The Dev app can run REPL like spark-shell or PySpark as it has the client settings for Data Forest completed.

To submit a Spark job to a cluster using spark-shell:

[test01@shell-0.******.kr.ch.naverncp.com ~][df]$ spark-shell
Warning: Ignoring non-spark config property: history.server.spnego.keytab.file=/etc/security/keytabs/spnego.service.keytab
Warning: Ignoring non-spark config property: history.server.spnego.kerberos.principal=HTTP/_HOST@KR.DF.NAVERNCP.COM
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://shell-0.******.kr.ch.naverncp.com:4040
Spark context available as 'sc' (master = yarn, app id = application_1619078733441_0566).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.3.2.3.1.0.0-78
      /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112)
Type in expressions to have them evaluated.
Type :help for more information.

scala> val rdd1 = sc.textFile("file:///usr/nch/current/spark2-client/README.md")
rdd1: org.apache.spark.rdd.RDD[String] = file:///usr/nch/current/spark2-client/README.md MapPartitionsRDD[1] at textFile at <console>:24

scala> val rdd2 = rdd1.flatMap(_.split(" "))
rdd2: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[2] at flatMap at <console>:25

scala> val rdd3= rdd2.map((_, 1))
rdd3: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[3] at map at <console>:25

scala> val rdd4 = rdd3.reduceByKey(_+_)
rdd4: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[4] at reduceByKey at <console>:25

scala> rdd4.take(10)
res0: Array[(String, Int)] = Array((package,1), (this,1), (Version"](http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version),1), (Because,1), (Python,2), (page](http://spark.apache.org/documentation.html).,1), (cluster.,1), ([run,1), (its,1), (YARN,,1))

scala> rdd4.saveAsTextFile("hdfs://dataforest/user/test01/result")
  ...
org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1478)
  ... 49 elided

To view the result using the Hadoop command:

[test01@shell-0.******.kr.ch.naverncp.com ~][df]$ hadoop fs -ls /user/test01/result
Found 3 items
-rw-------   3 test01 services          0 2021-04-21 14:06 /user/test01/result/_SUCCESS
-rw-------   3 test01 services        886 2021-04-21 14:06 /user/test01/result/part-00000.gz
-rw-------   3 test01 services        888 2021-04-21 14:06 /user/test01/result/part-00001.gz

Access HiveServer2

To access HiveServer2: You can connect to the public HiveServer2 and individual HiveServer2 using the following command format:

$ beeline -u {JDBC connection string} -n {username} -p {password}

Note

For an individual HiveServer2, enter it by referring to the HiveServer2 app details in Quick Links > Connection String.

Configure client environment for apps

To configure the client environment for an app:

Create a secure-hbase directory.

Enter "sh /home/forest/get-app-env.sh {The user's hbase app name} {directory name}" as in the following example:

$ mkdir secure-hbase 
$ sh /home/forest/get-app-env.sh hbase ~/secure-hbase
[/home/forest/get-app-env.sh] Apptype: HBASE-2.2.3
[/home/forest/get-app-env.sh] Download install-client script for HBASE-2.2.3
[/home/forest/get-app-env.sh] Install client on /data10/hadoop/yarn/local/usercache/test01/appcache/application_1619078733441_0563/container_e84_1619078733441_0563_01_000002/secure-hbase
current hbase: .yarn/services/hbase/components/v1
--2021-05-20 14:37:51--  http://dist.kr.df.naverncp.com/repos/release/hbase/hbase-2.2.3-client-bin.tar.gz
Resolving dist.kr.df.naverncp.com (dist.kr.df.naverncp.com)... 10.213.208.69
Connecting to dist.kr.df.naverncp.com (dist.kr.df.naverncp.com)|10.213.208.69|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 233293221 (222M) [application/octet-stream]
Saving to: ‘/data10/hadoop/yarn/local/usercache/test01/appcache/application_1619078733441_0563/container_e84_1619078733441_0563_01_000002/secure-hbase/hbase-2.2.3-client-bin.tar.gz’

100%[=============================================================================================>] 233,293,221  390MB/s   in 0.6s

2021-05-20 14:37:51 (390 MB/s) - ‘/data10/hadoop/yarn/local/usercache/test01/appcache/application_1619078733441_0563/container_e84_1619078733441_0563_01_000002/secure-hbase/hbase-2.2.3-client-bin.tar.gz’ saved [233293221/233293221]

HBase-2.2.3 Client has been installed on /data10/hadoop/yarn/local/usercache/test01/appcache/application_1619078733441_0563/container_e84_1619078733441_0563_01_000002/secure-hbase/hbase-2.2.3-client
==============================================================================================
export HBASE_HOME=/data10/hadoop/yarn/local/usercache/test01/appcache/application_1619078733441_0563/container_e84_1619078733441_0563_01_000002/secure-hbase/hbase-2.2.3-client
$HBASE_HOME/bin/hbase shell 

Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
For Reference, please visit: http://hbase.apache.org/2.0/book.html#shell
Version 2.2.3, rUnknown, Wed Jan 29 22:11:21 KST 2020
Took 0.0025 seconds                                                                                                                                                                         
hbase(main):001:0> 
hbase(main):002:0* version
2.2.3, rUnknown, Wed Jan 29 22:11:21 KST 2020
Took 0.0007 seconds                                                                                                                                                                         
hbase(main):003:0> status
1 active master, 0 backup masters, 3 servers, 0 dead, 0.6667 average load
Took 0.5934 seconds

Documentation Index

Using Dev

Check Dev app details

Authenticate with Kerberos

Check environment variables

Use Hadoop dfs command

Submit Spark job using spark-shell

Access HiveServer2

Configure client environment for apps