Using Zeppelin

Available in VPC

In Cloud Hadoop of NAVER Cloud Platform, Zeppelin Notebook is installed.
This guide explains how to access Zeppelin Notebook UI and how to execute a simple example.
For more information about Zeppelin, see Apache Zeppelin Official Homepage.

Access Zeppelin Notebook UI

To access Zeppelin Notebook UI:

Connect via the console's web UI list

You can access the Zeppelin Notebook UI through [View by application] on the Cloud Hadoop console. For more information, see View by application.

Direct access via web browser

Open a web browser, and enter the following in the address field to access. Use the domain address assigned to the cluster.

https://{domain address}:9996

Access via Ambari Web UI

To access via the Ambari Web UI:

Access the Ambari UI.
- For more information about accessing Ambari UI, see Ambari UI guide.
On the Ambari UI interface, navigate to Zeppelin Notebook > Quick Links > Zeppelin UI.
- For how to access Zeppelin UI, see the guide on Accessing Web UI using tunneling.
Once the login page is displayed in the browser, enter the admin account and password set upon cluster creation to log in.
- If the connection succeeds, a green dot appears next to [Login] at the top right of the Zeppelin page.

Getting started

You can create Zeppelin Notebook to enter data and view the results in a graph.
This guide is written based on Zeppelin Tutorial (Basic Features) Notebook provided on Zeppelin Notebook.

Create Notebook

To create a Notebook:

Click [Notebook] > Create new note at the top of the Zeppelin page.
Set the note name and information of the Notebook, and then click [Create].
- The Default Interpreter can be changed after creating a note.

Load data to table

The following shows a sample code that loads data from bank.csv into the bank table.

%spark.spark
import org.apache.commons.io.IOUtils
import java.net.URL
import java.nio.charset.Charset

// Zeppelin creates and injects sc (SparkContext) and sqlContext (HiveContext or SqlContext)
// So you don't need create them manually

// load bank data
val bankText = sc.parallelize(
IOUtils.toString(
new URL("https://raw.githubusercontent.com/selva86/datasets/refs/heads/master/bank-full.csv"),
Charset.forName("utf8")).split("\n"))

case class Bank(age: Integer, job: String, marital: String, education: String)

val bank = bankText.map(s => s.split(";")).filter(s => s(0) != "\"age\"").map(
s => Bank(s(0).toInt,
s(1).replaceAll("\"", ""),
s(2).replaceAll("\"", ""),
s(3).replaceAll("\"", "")
)
).toDF()
bank.registerTempTable("bank")

Run code and view results

To run a Zeppelin Notebook code and check the result:

Run the code by pressing [Shift] + [Enter] keys or clicking .
- You can check that the code has run normally with the FINISHED status and Took 4 sec phrase.
Write a Spark SQL syntax to search table data in a new paragraph, and then press [Shift] + [Enter] keys or click to run the code.
- Search result is displayed on the screen. You can use the Graph button to check the SQL results in various types of graph.
```
%spark.sql
select age, count(1) value
from bank
where age < 30
group by age
order by age
```

Backup Zeppelin Notebook

Zeppelin Notebook is stored on Server 1 of the cluster's master node. Therefore, if you delete a cluster, the notebook is also deleted.
To use the same notebook in another cluster, you must export the Notebook after completing the task.

To backup Zeppelin Notebook:

Click at the upper left on the Notebook interface.
Specify the file name and path on the local PC and save.
- The exported file is saved in JSON format.

Note

When Zeppelin Notebooks are backed up, they are saved in units of notebooks.

Note

To install jdbc interpreter on Zeppelin, you need to first edit the jdbc version of the /etc/zeppelin/conf/interpreter-list file on the Edge server.

Before change: jdbc org.apache.zeppelin:zeppelin-jdbc:0.11.0-SANPSHOT Jdbc interpreter
After change: jdbc org.apache.zeppelin:zeppelin-jdbc:0.10.1 Jdbc interpreter

For more information, check the library dependency (version, library name, etc.) of jdbc interpreter in Maven.