Using Zeppelin

Available in VPC

The ZEPPELIN-0.10.1 app supports Apache Zeppelin. Zeppelin is a data visualization tool that can facilitate data analysis. An individual Zeppelin can be used for each user.

Check Zeppelin app details

Once the app is created, you can view its details. If the Status in the app details is Stable, the app is running normally.

To view app details:

In the VPC environment on the NAVER Cloud Platform console, navigate to > Services > Big Data & Analytics > Data Forest.
Click Data Forest > Apps on the left.
Select an account.
Click the app to view its details.
Review the app details.
- Quick links
  - shell: If you use the web shell, you can access the Docker environment where Zeppelin is running and modify internal checks and environment settings as needed. Log in using the account name and password used to create the app.
  - supervisor: URL for managing Zeppelin
  - zeppelin: Log in using the account name and password used to create the app.
- Component: ZEPPELIN-0.10.1 type consists of a single zeppelin component.
  - zeppelin: The default values are the recommended resources. Requests 1 core and 12 GB of memory by default at startup.

Example:

The following shows the shell access interface.
df-zeppelin_5_vpc_ko

The following shows the Zeppelin access interface.
df-zeppelin_06_vpc_ko

Note

If you need to adjust detailed settings when running tasks, see Interpreters in Apache Zeppelin.

Configure Interpreter

Spark

Spark 3.0.1 is set by default, so you can create a notebook and start using it right away. Tasks run in Zeppelin are performed using the Dev-assigned queue by default. If you want to perform the task in another queue, search Spark in Interpreters, click [edit], and then add the spark.yarn.queue setting in Properties.
df-zeppelin_07_vpc_ko(1)

Note

If you submit the task to the unauthorized queue, it may fail.

Note

To use the existing Spark2 version, select "spark248" in Default Interpreter when creating a notebook.

JDBC

To use Hive, enter %jdbc(hive).

Note

For more information on Hive rules and permissions, see Using private Hive.

The following example shows how to query the test02__db_test database.

df-zeppelin_08_vpc_ko

%jdbc(hive)
use test02__db_test;
show tables;
select * from test;

Notebook backup

In the Zeppelin app, notebooks and some settings are backed up together, allowing notebooks and settings to stay synchronized even if the host running Zeppelin changes. The backup is performed every 10 minutes.

If you perform the backup manually, access the web shell and run backup.sh. Then, the notebook and settings are backed up immediately.
Access the Zeppelin container to view the backup log in the hdfs://koya/user/${USER}/zeppelin/${SERVICE_NAME}/backup directory.