- Print
- PDF
Using Spark History Server
- Print
- PDF
Available in VPC
You can create a personal Spark History Server with the Spark History Server app so you can only check what you have run. Data Forest supports SPARK-HISTORYSERVER-3.1.2 app type.
Check Spark History Server app details
When the app creation is completed, you can view the details. When the Status is Stable under the app's details, it means the app is running normally.
The following describes how to check the app details.
- From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest menus, in that order.
- Click the Data Forest > Apps menu on the left.
- Select the account that owns the app.
- Click the app whose details you want to view.
- View the app details.
- Quick links
- AppMaster: URL where the container log can be viewed. When creating apps, all apps are submitted to the YARN queue. YARN provides a Web UI where each app's details can be viewed
- Spark History REST API: REST API provided by Spark History Server
- Spark History UI: URL that can access Spark history web UI
- shell-shs-0: It is the Web Shell URL for the container where Spark history is installed. Log in with the user's account name and password
- supervisor-shs-0: It is the Web Shell URL for the container where the Supervisor is installed. Log in with the user's account name and password
- Component: SPARK-HISTROYSERVER-3.1.2 type consists of one shs component.
- shs: It requests 1 core and 4 GB memory by default to run.
- Quick links
Access Spark History Server
The screen accessed by Spark History UI URL from the Quick links as follows:
Spark History Server provides a REST API.
Access the Spark History REST API URL from the Quick links in the app details.
The following is the screen after connection.
If you are using the REST API through the web shell, you can use the Spark History REST API address identified above and use it as follows: The following is an example of a dataforest-test user.
$ curl -i -u ys https://dataforest-test--sparkhs-new--shs--18080.proxy.kr.df.naverncp.com/api/v1/version
Enter host password for user 'dataforest-test':
HTTP/1.1 200 OK
Server: nginx/1.14.0
Date: Fri, 14 Oct 2022 08:14:24 GMT
Content-Type: application/json
Content-Length: 25
Connection: keep-alive
Set-Cookie: hadoop.auth="u=dataforest-test&p=dataforest-test&t=authnz-ldap&e=1665771263843&s=v37ewQQe7TSTjntpg5rqUfZsRrRuCvfQux0P2onFy7I="; HttpOnly
Cache-Control: no-cache, no-store, must-revalidate
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
X-Content-Type-Options: nosniff
Vary: Accept-Encoding, User-Agent
{
"spark" : "3.1.2-1"
}
Set Spark jobs
To use the personal Spark History Server, enter the following in Spark to complete the job settings.
- spark.eventLog.enabledspark.eventLog.enabled:
true
- spark.eventLog.dir: It is equivalent to the spark.history.fs.logDirectory setting in the Spark History Server app.
hdfs://koya/user/{USER}/spark2-history/
is set as the default value. Insert your account name in{USER}
. - spark.yarn.historyServer.address: It's the history server's address. After creating the app, enter the URL of the Spark History UI in Quick links.
<example> An example of dataforest-test user is as follows:
Property Name | Info |
---|---|
spark.eventLog.enabled | true |
spark.eventLog.dir | hdfs://koya/user/dataforest-test/spark2-history/ |
spark.yarn.historyServer.address | https://dataforest-test--spark-historyserver--shs--18080.proxy.kr.df.naverncp.com |
If you submit a job after changing the settings, you can view the submitted job information on your personal Spark History Server.
Changing the personal Spark settings
Below is how to add settings for your personal Spark job:
$ vi $SPARK_CONF_DIR/spark-defaults.conf
...
spark.eventLog.dir hdfs://koya/user/dataforest-test/spark2-history/
spark.eventLog.enabled true
spark.yarn.historyServer.address={Spark History UI}
...
If you are using a web shell, please use it by editing the previously transferred settings file below.
$ cd ~/conf
$ vi spark-defaults.conf # Configuration change
Execute Pyspark, spark-shell
Here's how to run Pyspark and spark-shell.
- When executing PySpark, add the following options to run.
$ pyspark --conf spark.driver.extraJavaOptions=-Dhdp.version=3.1.0.0-78 \
--conf spark.yarn.am.extraJavaOptions=-Dhdp.version=3.1.0.0-78 \
--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS=/usr/hdp:/usr/hdp:ro \
--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS=/usr/hdp:/usr/hdp:ro
- Use the command below to execute spark-shell as well.
spark-shell --conf spark.driver.extraJavaOptions=-Dhdp.version=3.1.0.0-78 \
--conf spark.yarn.am.extraJavaOptions=-Dhdp.version=3.1.0.0-78 \
--conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS=/usr/hdp:/usr/hdp:ro \
--conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS=/usr/hdp:/usr/hdp:ro
--conf spark.kerberos.access.hadoopFileSystems=hdfs://<Specify the name node to be used>
When using Zeppelin
You can use the personal Spark History Server app in Apache Zeppelin's Spark interpreter. Refer to the configuration method of spark.yarn.queue in Using Zeppelin > Interpreter settings to add settings related to the history server.