- Print
- PDF
Using Dev
- Print
- PDF
Available in VPC
The Dev app plays the role of a client for all services provided in Data Forest. You can use Hadoop commands or submit Spark Job to YARN clusters in the Dev app.
For more information about creating apps, see Create and manage apps.
Check Dev app details
When the app creation is completed, you can view the details. When the Status is Stable under the app's details, it means the app is running normally.
The following describes how to check the app details.
- From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest menu, in that order.
- Click the Data Forest > Apps menus on the left.
- Select an account.
- Click the app whose details you want to view.
- View the app details.
- Quick links: you can connect to the following quick link addresses.
- AppMaster: URL to view container logs. When creating an app, all apps are submitted to the YARN queue, and YARN provides a web UI to check detailed information of each app
- Supervisor: supervisor URL where you can monitor and manage the container's app processes
- Shell: URL that allows access to GNU/Linux terminal (TTY) through a web browser
- Component: DEV-1.0.0 type is composed of one shell component.
- shell: memory and CPU set as a default, and the number of containers is the minimum value recommended
- Quick links: you can connect to the following quick link addresses.
For information on how to log in to the AppMaster UI and view the logs of each container, see Access quick links.
<example>
The shell screen after connection is as follows:
Kerberos authentication
Kerberos authentication must take place before executing a Hadoop command or submitting a Spark job to a cluster. A keytab file is used when performing Kerberos authentication. However, since the Dev app can't access the user's local file system, you should configure it so that the keytab is uploaded to HDFS and the Dev app can download it.
Download keytab
The following describes how to download the keytab.
- From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest > Accounts menus, in that order.
- Select an account and then click [Cluster access information] > Download Kerberos keytab.
- When the Download keytab window appears, click the [Download] button.
- Store the downloaded file safely.
Upload keytab to HDFS
The following describes how to upload the keytab to HDFS.
From the file browser of the koya namespace, upload the keytab
df.{username}.keytab
under path/user/{username}/
.
Download the keytab in HDFS, and run the Kerberos authentication using the keytab.
- Enter the password set up when creating the account as a string in
$PASSWORD
- Special characters must be enclosed in single quotes
' '
$ curl -s -L -u test01:$PASSWORD -o df.test01.keytab "https://sso.kr.df.naverncp.com/gateway/koya-auth-basic/webhdfs/v1/user/test01/df.test01.keytab?op=OPEN" $ ls -al total 20 drwxr-s--- 4 test01 hadoop 138 Dec 16 17:57 . drwxr-s--- 4 test01 hadoop 74 Dec 16 17:44 .. -rw-r--r-- 1 test01 hadoop 231 Dec 16 17:36 .bashrc -rw------- 1 test01 hadoop 302 Dec 16 17:36 container_tokens -rw-r--r-- 1 test01 hadoop 245 Dec 16 17:57 df.test01.keytab lrwxrwxrwx 1 test01 hadoop 101 Dec 16 17:36 gotty -> /data1/hadoop/yarn/local/usercache/test01/appcache/application_1607671243914_0024/filecache/10/gotty -rwx------ 1 test01 hadoop 6634 Dec 16 17:36 launch_container.sh drwxr-S--- 3 test01 hadoop 19 Dec 16 17:53 .pki drwxr-s--- 2 test01 hadoop 6 Dec 16 17:36 tmp $ kinit test01 -kt df.test01.keytab $ klist Ticket cache: FILE:/tmp/krb5cc_20184 Default principal: test01@KR.DF.NAVERNCP.COM Valid starting Expires Service principal 12/16/2020 17:39:57 12/17/2020 17:39:56 krbtgt/KR.DF.NAVERNCP.COM@KR.DF.NAVERNCP.COM renew until 12/23/2020 17:39:56
- Enter the password set up when creating the account as a string in
The error message kinit: Password incorrect while getting initial credentials
occurs when the given keytab does not match the account.
Check environment variables
The environment variables required to use the Data Forest cluster's services are already specified in the Dev app.
The following describes how to view the environment variables.
$ echo $HADOOP_HOME
/usr/hdp/current/hadoop-client
$ echo $SPARK_HOME
/usr/hdp/current/spark2-client
Use Hadoop dfs command
The dfs
command executes the file system shell. The file system shell includes various shells and similar commands that directly interact with other file systems supported by Hadoop such as HDFS, local FS, WebHDFS, S3 FS, and so on.
dfs can be executed in three different formats: hdfs dfs
, hadoop fs
, and hadoop dfs
. The following describes how to execute a file system job.
[test01@shell-0.dev.test01.kr.df.naverncp.com ~][df]$ hadoop fs -ls
Found 30 items
…
-rw-r--r-- 3 test01 services 215 2021-04-09 11:35 df.test01.keytab
drwx------ - test01 services 0 2021-05-11 12:21 grafana
drwx------ - test01 services 0 2021-05-07 14:55 hue
…
For more information about file system shells, see here.
Submit Spark job using spark-shell
The Dev app can't run REPL like spark-shell or PySpark as it has the client settings for Data Forest completed.
The following describes how to submit a Spark job to a cluster using spark-shell.
[test01@shell-0.dev.test01.kr.df.naverncp.com ~][df]$ spark-shell
Warning: Ignoring non-spark config property: history.server.spnego.keytab.file=/etc/security/keytabs/spnego.service.keytab
Warning: Ignoring non-spark config property: history.server.spnego.kerberos.principal=HTTP/_HOST@KR.DF.NAVERNCP.COM
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://shell-0.dev.test01.kr.df.naverncp.com:4040
Spark context available as 'sc' (master = yarn, app id = application_1619078733441_0566).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.3.2.3.1.0.0-78
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112)
Type in expressions to have them evaluated.
Type :help for more information.
scala> val rdd1 = sc.textFile("file:///usr/hdp/current/spark2-client/README.md")
rdd1: org.apache.spark.rdd.RDD[String] = file:///usr/hdp/current/spark2-client/README.md MapPartitionsRDD[1] at textFile at <console>:24
scala> val rdd2 = rdd1.flatMap(_.split(" "))
rdd2: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[2] at flatMap at <console>:25
scala> val rdd3= rdd2.map((_, 1))
rdd3: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[3] at map at <console>:25
scala> val rdd4 = rdd3.reduceByKey(_+_)
rdd4: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[4] at reduceByKey at <console>:25
scala> rdd4.take(10)
res0: Array[(String, Int)] = Array((package,1), (this,1), (Version"](http://spark.apache.org/docs/en/latest/building-spark.html#specifying-the-hadoop-version),1), (Because,1), (Python,2), (page](http://spark.apache.org/documentation.html).,1), (cluster.,1), ([run,1), (its,1), (YARN,,1))
scala> rdd4.saveAsTextFile("hdfs://koya/user/test01/result")
...
org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1478)
... 49 elided
The following describes how to check the result using the Hadoop command.
[test01@shell-0.dev.test01.kr.df.naverncp.com ~][df]$ hadoop fs -ls /user/test01/result
Found 3 items
-rw------- 3 test01 services 0 2021-04-21 14:06 /user/test01/result/_SUCCESS
-rw------- 3 test01 services 886 2021-04-21 14:06 /user/test01/result/part-00000.gz
-rw------- 3 test01 services 888 2021-04-21 14:06 /user/test01/result/part-00001.gz
Access HiveServer2
The following describes how to access HS2.
You can access common HS2 and individual HS2 through the following command format:
$ beeline -u {JDBC connection string} -n {username} -p {password}
- For common HS2, enter the address for JDBC connection string by matching HiveServer2 (Batch)/(Interactive) type in Quick links of App details > View access details.
- For individual HS2, enter it by referring to the HS2 app details in Quick Links > Connection String.
Configure client environment for apps
The following describes how to configure the client environment for an app.
- Create a directory called secure-hbase.
- Enter
sh /home/forest/get-app-env.sh {사용자의 hbase 앱 이름} {디렉터리 이름}
as in the following example:$ mkdir secure-hbase $ sh /home/forest/get-app-env.sh hbase ~/secure-hbase [/home/forest/get-app-env.sh] Apptype: HBASE-2.2.3 [/home/forest/get-app-env.sh] Download install-client script for HBASE-2.2.3 [/home/forest/get-app-env.sh] Install client on /data10/hadoop/yarn/local/usercache/test01/appcache/application_1619078733441_0563/container_e84_1619078733441_0563_01_000002/secure-hbase current hbase: .yarn/services/hbase/components/v1 --2021-05-20 14:37:51-- http://dist.kr.df.naverncp.com/repos/release/hbase/hbase-2.2.3-client-bin.tar.gz Resolving dist.kr.df.naverncp.com (dist.kr.df.naverncp.com)... 10.213.208.69 Connecting to dist.kr.df.naverncp.com (dist.kr.df.naverncp.com)|10.213.208.69|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 233293221 (222M) [application/octet-stream] Saving to: '/data10/hadoop/yarn/local/usercache/test01/appcache/application_1619078733441_0563/container_e84_1619078733441_0563_01_000002/secure-hbase/hbase-2.2.3-client-bin.tar.gz' 100%[=============================================================================================>] 233,293,221 390MB/s in 0.6s 2021-05-20 14:37:51 (390 MB/s) - '/data10/hadoop/yarn/local/usercache/test01/appcache/application_1619078733441_0563/container_e84_1619078733441_0563_01_000002/secure-hbase/hbase-2.2.3-client-bin.tar.gz' saved [233293221/233293221] HBase-2.2.3 Client has been installed on /data10/hadoop/yarn/local/usercache/test01/appcache/application_1619078733441_0563/container_e84_1619078733441_0563_01_000002/secure-hbase/hbase-2.2.3-client ============================================================================================== export HBASE_HOME=/data10/hadoop/yarn/local/usercache/test01/appcache/application_1619078733441_0563/container_e84_1619078733441_0563_01_000002/secure-hbase/hbase-2.2.3-client $HBASE_HOME/bin/hbase shell Use "help" to get list of supported commands. Use "exit" to quit this interactive shell. For Reference, please visit: http://hbase.apache.org/2.0/book.html#shell Version 2.2.3, rUnknown, Wed Jan 29 22:11:21 KST 2020 Took 0.0025 seconds hbase(main):001:0> hbase(main):002:0* version 2.2.3, rUnknown, Wed Jan 29 22:11:21 KST 2020 Took 0.0007 seconds hbase(main):003:0> status 1 active master, 0 backup masters, 3 servers, 0 dead, 0.6667 average load Took 0.5934 seconds