Using Notebooks

release/20240425
English

Using Notebooks

Article Summary

Share feedback

Thanks for sharing your feedback!

Available in VPC

Cloud Hadoop Notebook provides "serverless" Jupyter Notebook to run the queries and code needed to analyze your data.
You can create and delete notebook nodes through the Notebooks console.
You can proceed with data analysis by accessing the JupyterLab and Jupyter Notebook web pages of the created notebook node.

Queries and code used in notebooks are executed through the kernel of the Cloud Hadoop cluster, and stored as notebook files in Object Storage for flexible reuse.

Notebook screen

The basics of using the Notebook service are as follows.

notebook_console_1_ko

Area	Description
① Create notebook	Create a new notebook
② Delete	Delete the notebook in use
③ Open in JupyterLab	Access JupyterLab Web UI
④ Open in Jupyter	Connect to Jupyter Web UI
⑤ Notebook list	Check the created notebook list and detailed information

Prerequisite

Create Object Storage
Before creating a Cloud Hadoop cluster, you must first create an Object Storage bucket for storing and retrieving data.
- For more information on creating Object Storage, see the Object Storage Guide.
Create a Cloud Hadoop cluster
A Cloud Hadoop cluster to work with the notebook node must have been created.
- For more information on creating a Cloud Hadoop cluster, see Getting Started with Cloud Hadoop.
Node type selection
Choose your node type in advance considering your expected usage.

Create notebook

The following describes how to create a notebook.

You can create multiple notebook nodes in a single Cloud Hadoop cluster, and each notebook node can be linked with the Cloud Hadoop cluster.

Connect to the NAVER Cloud Platform console.
Click Platform from the VPC menu to switch to the VPC environment.
Click the Services > Big Data & Analytics > Cloud Hadoop menus in that order in the NAVER Cloud Platform console.
Click the Notebooks menu.
Click the [Create notebook] button.
When the Create notebook page appears, proceed with the following steps in order.

1. Set notebook

Specify the notebook settings information, and then click the [Next] button.

Notebook name: enter the name of the notebook node.
Notebook version: select the notebook version. Currently, only the 1.0 version is provided for notebooks.
Notebook component: you can check the component information by version.
Cluster: select the Cloud Hadoop cluster you want the notebook to work with. Notebooks can only be linked with the latest Cloud Hadoop 1.8 version or later.
ACG settings: Cloud Hadoop Notebook ACG is automatically created whenever you create a notebook. If you want to set up a network ACL, you can edit the rules by selecting the automatically created ACG when creating the notebook. For more information on ACG settings, see the Set firewall (ACG) guide.

2. Set storage and server

Specify the storage and node server settings information, and then click the [Next] button.

Object Storage buckets: can read and write data in object storage buckets created in preparations. When creating a notebook, select the Object Storage bucket created in preparations.
Notebook node subnet: select the subnet where the notebook node will be located.
- If you create a notebook node on a public subnet, then web domain access based on public IP becomes available.
- If you create a notebook node on a private subnet, then web domain access becomes available via SSL VPN.
Notebook node server type: select the server type to be used for the notebook node. The server type can't be changed after the notebook node is created. For the specifications of servers that can be used as notebook nodes, see Supported server specifications by cluster node.
Number of notebook nodes: the number of notebook nodes is fixed at 1.
Status to add notebook node storage: you can use it by adding a separate block storage.
Notebook node storage type: select the storage type. You can select either SSD and HDD. The storage type can't be changed after the cluster is created.
Notebook node storage capacity: select the storage capacity. You can select from 100 GB to 6 TB, and specify the capacity in 10 GB increments.
Pricing plan: the pricing plan selected at account creation is applied. For more information on fees, see pricing information.

3. Set authentication key

To directly access the notebook node via SSH, you need to set the authentication key (.pem).
When creating a notebook, select an authentication key that you have or create a new one, and then click the [Next] button.

To create a new authentication key, select Create new authentication key, enter the authentication key name, and then click the [Create and save authentication key] button.

Note

The authentication key is required to get the admin password. Keep the saved PEM file in a safe location on your PC.

4. Final confirmation

After checking the request details, click the [Create] button.

Note

Cloud Hadoop Notebook ACG is automatically created whenever you create a notebook. If you want to set up a network ACL, then you can edit the rule by selecting the ACG that was created automatically. For more information on ACG settings, see the Set firewall (ACG) guide.
It takes about 5 to 10 minutes for a notebook to be created. Once the notebook is created and starts running, 운영중 is displayed in the Status column of the cluster list.

Access notebook

Access notebook web page

Click the [Open in JupyterLab] button or the [Open in Jupyter] button in the Cloud Hadoop (Notebooks) console. You can access the web page of Jupyter Notebook installed on the notebook node.

Add allowed ports 8889 for JupyterLab and allowed ports 8888 for Jupyter web page to Cloud Hadoop Notebook ACG.
For notebook nodes created in Public Subnet, web access is possible based on public IP.
In the case of a notebook node created in a private subnet, web access is possible only when connected through SSLVPN.

Note

For detailed descriptions on setting up SSL VPN and ACG, see the guide on setting UI access and password by service .

Access SSH directly to the notebook node

When creating a notebook, you can directly access the notebook node via SSH using the authentication key set in the authentication key setting step. For more information, see the Access cluster node via SSH guide.

Use notebook

You can use the notebook by integrating it in various ways.

Integrating Object Storage data in notebooks
Integrating Spark on Cloud Hadoop cluster from your laptop
Learn TensorFlow MNIST in your notebook

Prerequisite

ACG permission between the Cloud Hadoop cluster and the notebook node to be interlocked is required. Set the ACG as follows, including the ACG of the notebook node in the ACG of the Cloud Hadoop cluster to be interlocked.

In Default ACG of the Cloud Hadoop cluster you want to integrate with, click the [Inbound] tab. Add the ACG of the notebook node to Access source, add all ports 1 - 65535 to Allowed port, and then click the [Apply] button.

Note

We recommend that you create the Cloud Hadoop cluster and laptop in the same subnet that can communicate within the same VPC.

Integrating Object Storage data in notebooks

Connect to the Object Storage bucket by accessing the Jupyter Notebook Web UI and entering your Object Storage information.

You can check the Access key ID and Secret key information for Object Storage on NAVER Cloud Platform portal's My Page > [Authentication Key Management]. For more information, see Getting started with Object Storage.

<Python3 example code>

Proceed by selecting Python3 as the kernel setting from the notebook.

# Import required boto3 module 
import boto3

# Enter Object Storage information
service_name = 's3'
endpoint_url = 'https://kr.object.private.ncloudstorage.com'
region_name = 'kr-standard'
access_key = "user's access key"
secret_key = "user's secret key"

# Integrating Object Storage using Boto3
if __name__ == "__main__":
   s3 = boto3.client(service_name, endpoint_url=endpoint_url, aws_access_key_id=access_key, aws_secret_access_key= secret_key)
   
s3.upload_file("my_model.h5", "best", "model/my_model.h5")

Integrating Spark on Cloud Hadoop cluster from your laptop

After connecting to the Jupyter Notebook web UI, you can use PySpark to integrate with the Cloud Hadoop cluster.

<PySpark example code>

Proceed by selecting PySpark as the kernel configuration on your laptop.

import os
import pyspark
import socket
from pyspark.sql import SQLContext, SparkSession

sc = SparkSession  \
        .builder \
        .appName("SparkFromJupyter") \
        .getOrCreate()

sqlContext = SQLContext(sparkContext=sc.sparkContext, sparkSession=sc)
print("Spark Version: " + sc.version)
print("PySpark Version: " + pyspark.__version__)

df = sqlContext.createDataFrame(
    [(1, 'foo'), (2,'bar')], #records
    ['col1', 'col2'] #column names
)
df.show()

print(socket.gethostname(), socket.gethostbyname(socket.gethostname()))

notebook_pyspark_example

Learn TensorFlow MNIST in your notebook

After accessing the Jupyter Notebook Web UI, you can perform TensorFlow MNIST Training using Python3.

<Python3 example code>

Proceed by selecting Python3 as the kernel setting from the notebook.

# tensorflow library import
# If you need to install TensorFlow
# !pip install tensorflow_version

import tensorflow as tf
mnist = tf.keras.datasets.mnist

# Load MNIST data set
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

# Define model by adding layer
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train and evaluate model
model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test)

# Save learned model as HDF5 file
model.save('my_model.h5')

notebook_tensorflow_mnist_example

Delete notebook

You can delete notebooks whose usage is complete. The notebook files (files with .ipynb extension) used in Jupyter Notebook are stored under the Object Storage bucket of the Cloud Hadoop cluster.

Note

Even if you delete notebooks, notebook files that were used are not deleted.

Was this article helpful?

What's Next

External Hive Metastore integration with Data Catalog (optional)

Table of contents

Notebook screen
Prerequisite
Create notebook
Access notebook
Use notebook
Delete notebook