Using Spark History Server
    • PDF

    Using Spark History Server

    • PDF

    Article Summary

    Available in VPC

    You can create a personal Spark History Server with the Spark History Server app so you can only check what you have run. Data Forest supports SPARK-HISTORYSERVER-3.1.2 app type.

    Check Spark History Server app details

    When the app creation is completed, you can view the details. When the Status is Stable under the app's details, it means the app is running normally.
    The following describes how to check the app details.

    1. From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest menus, in that order.
    2. Click the Data Forest > Apps menu on the left.
    3. Select the account that owns the app.
    4. Click the app whose details you want to view.
    5. View the app details.
      df-shs_1-2_vpc_ko
      • Quick links
        • AppMaster: URL where the container log can be viewed. When creating apps, all apps are submitted to the YARN queue. YARN provides a Web UI where each app's details can be viewed
        • Spark History REST API: REST API provided by Spark History Server
        • Spark History UI: URL that can access Spark history web UI
        • shell-shs-0: It is the Web Shell URL for the container where Spark history is installed. Log in with the user's account name and password
        • supervisor-shs-0: It is the Web Shell URL for the container where the Supervisor is installed. Log in with the user's account name and password
      • Component: SPARK-HISTROYSERVER-3.1.2 type consists of one shs component.
        • shs: It requests 1 core and 4 GB memory by default to run.

    Access Spark History Server

    The screen accessed by Spark History UI URL from the Quick links as follows:

    df-shs_04-1_vpc

    Spark History Server provides a REST API.
    Access the Spark History REST API URL from the Quick links in the app details.
    The following is the screen after connection.

    df-shs_05-1_vpc

    If you are using the REST API through the web shell, you can use the Spark History REST API address identified above and use it as follows: The following is an example of a dataforest-test user.

    $ curl -i -u ys https://dataforest-test--sparkhs-new--shs--18080.proxy.kr.df.naverncp.com/api/v1/version
    
    Enter host password for user 'dataforest-test':
    HTTP/1.1 200 OK
    Server: nginx/1.14.0
    Date: Fri, 14 Oct 2022 08:14:24 GMT
    Content-Type: application/json
    Content-Length: 25
    Connection: keep-alive
    Set-Cookie: hadoop.auth="u=dataforest-test&p=dataforest-test&t=authnz-ldap&e=1665771263843&s=v37ewQQe7TSTjntpg5rqUfZsRrRuCvfQux0P2onFy7I="; HttpOnly
    Cache-Control: no-cache, no-store, must-revalidate
    X-Frame-Options: SAMEORIGIN
    X-XSS-Protection: 1; mode=block
    X-Content-Type-Options: nosniff
    Vary: Accept-Encoding, User-Agent
    
    {
      "spark" : "3.1.2-1"
    }
    

    Set Spark jobs

    To use the personal Spark History Server, enter the following in Spark to complete the job settings.

    • spark.eventLog.enabledspark.eventLog.enabled: true
    • spark.eventLog.dir: It is equivalent to the spark.history.fs.logDirectory setting in the Spark History Server app. hdfs://koya/user/{USER}/spark2-history/ is set as the default value. Insert your account name in {USER}.
    • spark.yarn.historyServer.address: It's the history server's address. After creating the app, enter the URL of the Spark History UI in Quick links.

    <example> An example of dataforest-test user is as follows:

    Property NameInfo
    spark.eventLog.enabledtrue
    spark.eventLog.dirhdfs://koya/user/dataforest-test/spark2-history/
    spark.yarn.historyServer.addresshttps://dataforest-test--spark-historyserver--shs--18080.proxy.kr.df.naverncp.com

    If you submit a job after changing the settings, you can view the submitted job information on your personal Spark History Server.

    Changing the personal Spark settings

    Below is how to add settings for your personal Spark job:

    $ vi $SPARK_CONF_DIR/spark-defaults.conf
    ...
    spark.eventLog.dir hdfs://koya/user/dataforest-test/spark2-history/
    spark.eventLog.enabled true
    spark.yarn.historyServer.address={Spark History UI}
    ...
    
    Note

    If you are using a web shell, please use it by editing the previously transferred settings file below.

      $ cd ~/conf
      $ vi spark-defaults.conf # Configuration change
    

    Execute Pyspark, spark-shell

    Here's how to run Pyspark and spark-shell.

    1. When executing PySpark, add the following options to run.
    $ pyspark --conf spark.driver.extraJavaOptions=-Dhdp.version=3.1.0.0-78 \
    --conf spark.yarn.am.extraJavaOptions=-Dhdp.version=3.1.0.0-78 \
    --conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS=/usr/hdp:/usr/hdp:ro \
    --conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS=/usr/hdp:/usr/hdp:ro
    
    1. Use the command below to execute spark-shell as well.
    spark-shell --conf spark.driver.extraJavaOptions=-Dhdp.version=3.1.0.0-78 \
    --conf spark.yarn.am.extraJavaOptions=-Dhdp.version=3.1.0.0-78 \
    --conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS=/usr/hdp:/usr/hdp:ro \
    --conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_MOUNTS=/usr/hdp:/usr/hdp:ro
    --conf spark.kerberos.access.hadoopFileSystems=hdfs://<Specify the name node to be used>
    

    When using Zeppelin

    You can use the personal Spark History Server app in Apache Zeppelin's Spark interpreter. Refer to the configuration method of spark.yarn.queue in Using Zeppelin > Interpreter settings to add settings related to the history server.


    Was this article helpful?

    What's Next
    Changing your password will log you out immediately. Use the new password to log back in.
    First name must have atleast 2 characters. Numbers and special characters are not allowed.
    Last name must have atleast 1 characters. Numbers and special characters are not allowed.
    Enter a valid email
    Enter a valid password
    Your profile has been successfully updated.