Create and manage notebook

release/20240425
English

Create and manage notebook

Article Summary

Share feedback

Thanks for sharing your feedback!

The latest service changes have not yet been reflected in this content. We will update the content as soon as possible. Please refer to the Korean version for information on the latest updates.

Available in VPC

This page describes information about notebooks provided by Data Forest and how to create and manage one. Data Forest provides you with an easy and convenient data analysis environment through a notebook.

Preparations

Object Storage creation
Before creating a cluster, you must have created an Object Storage bucket for storing and retrieving data. For more details, see Object Storage guide.
VPC, subnet creation
Create VPC and subnet in Networking > VPC from the NAVER Cloud Platform console. For more details, see VPC user guide. Regardless of the number of notebooks, at least 1 VPC is required. You can have multiple notebooks in the same VPC and use them. For private VPC environment, only the KR-2 Region can create VPC and upon creating a notebook, only the Public Subnet is supported.
Select server specifications
Select the server specifications in advance considering the expected usage.

Note

In case of locked or setting with access limit for Object Storage bucket, an issue may occur when integrating with a notebook.

Create notebook

The following describes how to create a notebook.

Create an account that will own the notebook.
- For how to create accounts, see Create and manage account.
From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest menu, in that order.
Click the Notebooks > Create notebook button on the left.

Enter the details of the notebook you want to create on the notebook settings screen.
df-notebook-createnb-vpc_ko

Name	Description
Account name	Account to own notebook
Notebook type	Select notebook type to create. Only provides JupyterLab type as of now
Notebook name	Designate the name of the notebook to create You can enter between 3 to15 characters, and can enter only lowercase letters, numbers, and hyphens (-)
VPC	Select the VPC created in preparations
Subnet	Select the Subnet for locating the notebook node Currently, notebooks can be created only in subnets with available IP generations in Public Subnets in the Korea (KR-2) Region (zone), and web domain can be accessed based on public IP
Server specifications	Select a server type to use as a notebook node. The server type can't be changed after the notebook node is created
ACG	Data Forest Notebook ACGs are automatically created whenever you create a notebook
Additional storage	You can add a separate Block Storage for use
Additional storage type	Select a storage type. You can choose between SSD and HDD. The storage type can't be changed after the notebook is created
Additional storage capacity	Enter storage capacity. You can set from 100 GB to 6 TB and specify the capacity in 10 GB units
Object Storage buckets	Select the created Object Storage bucket from the previous job

When setting is complete, click the [Next] button.
Enter the user settings values for the components of the notebook you want to create on the user settings screen.

Component Name Description
Jupyter Lab Access Password Password to use when accessing Jupyter Lab Web UI
When setting is complete, click the [Next] button.
On the authentication key setup screen, enter the authentication key information required when directly accessing the notebook node.
Select an authentication key you have or create a new one and click the [Next] button.
To create a new authentication key, select Create new authentication key, enter the authentication key name, and then click the [Create and save authentication key] button.
Note
The authentication key is required to get the admin password. Keep the saved PEM file in a safe location on your PC.
After final confirmation, click the [Create] button.
It takes about 5 to 10 minutes for a new notebook to be created. Once it is successfully created, you can check it from the Notebooks console.

Component	Name	Description
Jupyter Lab	Access Password	Password to use when accessing Jupyter Lab Web UI

Check notebook details

The following describes how to check the notebook details.

From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest menu, in that order.

Click the Notebooks menu on the left. You can see the list of the notebooks you've created.
df-notebook-detailnb-vpc_ko

Name	Description
Notebook name	Name of created notebook
Account	Account that owns notebook
Notebook type	Type of created notebook. Only provides JupyterLab type as of now
Condition	Notebook node's condition
Server specifications	Server specifications of notebook node
VPC	VPC in which notebook is created
Subnet	Subnet applied to notebook node
Creation time	Date and time when the notebook is created

Click df-app_open at the end of the notebook list to see the notebook details.

Item	Description
Account name	Account that owns notebook
Notebook type	Type of created notebook. Only provides JupyterLab type as of now
Notebook name	Name of created notebook
Notebook ID	Unique notebook ID
Server specifications	Server specifications of notebook node
VPC	VPC in which notebook is created
Subnet	Subnet applied to notebook node
ACG	ACG applied to notebook node
Additional storage	Additional storage information
Domain	Domain assigned to public IP
Authentication key name	Name of the authentication key applied to the notebook
SSH access account	OS account name for directly accessing notebook node using SSH
Set user	Information on user settings applied to the notebook
Bucket	Object Storage bucket information

Access notebook

The following describes how to access a notebook.

Access notebook web page

The following describes how to access a notebook's web page:

Before proceeding, ensure that JupyterLab's port 80 is added to the ACG of the notebook.
From the Notebooks menu, click Go to domain from the notebook's details screen.
- If the notebook nodes are created within a public subnet, web pages can be accessed directly using the public IP without requiring additional tunneling settings.
Once the login screen of the JupyterLab web page appears, enter your password to log in.
- The password is the one set in the Access Password field on the user settings screen during the notebook creation process.
- If you forgot your password or need to change your password, click the user settings [Details/reset] button on the details screen of the notebook to reset your password.

Access directly to notebook node via SSH

When creating a notebook, you can directly access the notebook node via SSH using the authentication key set in the authentication key setting step.

Preparations

The following describes how to add a new fixed IP to the notebook ACG.

From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest menu, in that order.
Click the Go to ACG icon from the details page of the notebook you want to access.
Select the ACG of the notebook you want to access and click [ACG settings].
Enter the 4 information items below and add an ACG Rule.
- Protocol: TCP
- Access source: IP of the local equipment used for SSH communication
- Allowed port: 22
- Note (optional)

SSH connection in the mac OS

It describes the access method using iTerm2. Even if other programs are used, the same action generates the same outcome.
You can find the notebook's domain information from its details screen.

chmod 400 </path/to/pem-key>
ssh -i </path/to/pem-key>  forest@<notebook-domain>

SSH connection in Windows

It describes the access method using the PuTTY client. Even if other programs are used, the same action generates the same outcome.

Proceed with the following steps in order:

1. Authentication key (pem) conversion
2. Connect to notebook node

1. Authentication key (.pem) conversion

PuTTY doesn't natively support the private key format (.pem) generated by Data Forest. You can use the PuTTYgen application provided by PuTTY to convert the authentication key into the format (.ppk) used by PuTTY. The private key must be converted to this format (.ppk) before PuTTY can be used to connect to the notebook node.

Run PuTTYgen. (Download puttygen)
Select RSA in Type of key to generate and click the [Load] button.
Select the authentication key (*.pem), and then click the [Open] button.
- To find a file in the PEM format, select the option to display all file types.
- The PEM file is the file name of the authentication key that is currently applied to the cluster. This PEM file must be stored on the user's local PC.
- If the PEM file is missing, the authentication key for connection can be changed from the Console > Manage server access > Change authentication key menu.
Check the details in the completion confirmation pop-up window, and then click the [OK] button.
Click the [Save private key] button and save it as a ppk format file that can be used in PuTTY.
- If PuTTYgen displays a warning message about saving a key without a password, then select the [Yes] button.
- Save it with the same name as the previously generated authentication key. PuTTY automatically adds the .ppk file extension.

Note

Once a tunnel is created, you can save the session in PuTTY and load it when needed.

To save a session, enter the session name in the Saved sessions field of the Load, save or delete a stored session, and then click [Save].
To run a session, select the session from the Saved Sessions list, and then click the [Open] button.

2. Connect to notebook node

Run PuTTY. (Download PuTTY)
Select Session from the Category window and enter info in each of the settings fields as below.
- HostName (or IP address): forest@notebook-domain (enter the domain address of the notebook in notebook-domain)
- Port: 22
In the Category window, select Connection > SSH to expand, and then click the Auth item.
Click the [Browse] button to select the PPK file created by converting the PEM file, and then click the [Open] button.
Make sure that you are successfully connected to the notebook node.

Manage notebook

Change authentication key

If you forgot or want to change the authentication key designated during the notebook creation process, the key can be changed.
The following describes how to change the authentication key.

From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest menu, in that order.
Click the Notebooks menu on the left.
After selecting the account and the notebook to change the authentication key for, click the [Manage server connection] > [Change authentication key] button.
After verifying the user identity, the authentication key can be changed by using a different authentication key that you already own or by creating a new authentication key.

Reset user settings

The following describes how to edit the user settings specified when creating a notebook.

From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest menu, in that order.
Click the Notebooks menu on the left.
Select an account, and from the list of notebooks, select the notebook whose user settings you wish to change to go to the detailed view screen.
Click the [View details/Reset] button of a user settings item.
Enter the values of the settings that need to be reset, and click the [Reset] button.

Note

When editing the user settings, the job details in the docker's home directory (/home/forest/) reset due to restarting the notebook. Make sure to save necessary data before restarting the notebook.

Edit bucket

The following describes how to edit Object Storage bucket.

From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest menu, in that order.
Click the Notebooks menu on the left.
Select the account and the notebook to edit the bucket from the notebook list, and then go to the notebook details screen.
Click the [Edit] button from the bucket list.
Select the bucket for editing and click the [Apply] button.

Note

Duplicate integration is not possible if the bucket is already integrated to the notebook of an identical account.
When editing the bucket, the job details in the docker's home directory (/home/forest/) reset due to restarting the notebook. Make sure to save necessary data before restarting the notebook.

Restart notebook

The following describes how to restart the notebook.

From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest menu, in that order.
Click the Notebooks menu on the left.
Select an account and the notebook to restart from the notebook list, and then click the [Restart] button.

Delete notebook

The following describes how to delete a notebook.

From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest menu, in that order.
Click the Notebooks menu on the left.
Select an account and the notebook to delete from the notebook list, and then click the [Delete] button.

Note

When you delete a notebook, the locally stored notebook files are lost. Make sure to save necessary notebook files before deleting the notebook.

Use notebook

Use JupyterLab Extension

Object Storage Browser - Object Storage integration and bucket information
- The Data Forest account that created a notebook is automatically integrated with the owned Object Storage bucket.
- The file information within each bucket can be viewed.
File Browser - home directory information of the Data Forest user connected to the docker
- The access account name is forest, and the home directory is /home/forest.
- The keytab file of the Data Forest account is already uploaded under ~/keytab.
- If you have added additional storage when creating a notebook, you can check it from the ~/data path that it has been mounted on.
- When creating a notebook, you can check the Object Storage bucket has been mounted on the ~ /data/{bucket_name} folder.
Running Terminals and Kernels - currently running terminals and kernel information
Git - GIT Repository reset and integration information
Terminal - connects to the docker running on the notebook
- In case of a backup requirement, saving data or file in the Object Storage bucket mounted ~ /data/{bucket_name} folder during notebook creation, a backup of the data or file saved in the Object Storage is available even if the notebook restarts or gets deleted.

Run code

After accessing the JupyterLab Web UI, you can use PySpark to integrate with the Data Forest cluster.

Proceed by selecting PySpark as the kernel configuration on your laptop.

import os
import pyspark
import socket
from pyspark.sql import SQLContext, SparkSession

sc = SparkSession  \
        .builder \
        .appName("SparkFromJupyter") \
        .getOrCreate()

sqlContext = SQLContext(sparkContext=sc.sparkContext, sparkSession=sc)
print("Spark Version: " + sc.version)
print("PySpark Version: " + pyspark.__version__)

df = sqlContext.createDataFrame(
    [(1, 'foo'), (2,'bar')], #records
    ['col1', 'col2'] #column names
)
df.show()

print(socket.gethostname(), socket.gethostbyname(socket.gethostname()))

df-notebook-jupyter-pyspark

Integrate with Object Storage and backup notebook files

During a notebook creation if the Object Storage bucket is selecting during the notebook settings process, it is mounted to the /home1/forest/data/{bucket_name} path when the notebook docker is executed. You can back up notebook files (.ipynb files) stored on your local disk by uploading them to the integrated bucket path.
The user can check the automatically integrated Object Storage bucket information by accessing the JupyterLab Web.

To integrate the specified Object Storage bucket when creating a notebook, refer to the following code.
Proceed by selecting Python3 as the kernel setting from the notebook.

# Import required boto3 module 
import boto3

# Enter Object Storage information
service_name = 's3'
endpoint_url = 'https://kr.object.private.ncloudstorage.com'
region_name = 'kr-standard'
access_key = "user's access key"
secret_key = "user's secret key"

# Integrating Object Storage using Boto3
if __name__ == "__main__":
   s3 = boto3.client(service_name, endpoint_url=endpoint_url, aws_access_key_id=access_key, aws_secret_access_key= secret_key)
   
s3.upload_file("local_file_name", "bucket_name", "s3s3_path/file_name")

Train Tensorflow MNIST

After accessing the JupyterLab Web UI, you can perform TensorFlow MNIST Training using Python3.

Proceed by selecting Python3 as the kernel setting from the notebook.

# tensorflow library import
# If you need to install TensorFlow
# !pip install tensorflow_version

import tensorflow as tf
mnist = tf.keras.datasets.mnist

# Load MNIST data set
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

# Define model by adding layer
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train and evaluate model
model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test)

# Save trained model as HDF5 file
model.save('my_model.h5')

GIT integration

After accessing the Jupypterlab Web UI, you can integrate with GIT by entering the GIT repository information to integrate with in the left menu.

Was this article helpful?

What's Next

Create and manage app

Table of contents

Preparations
Create notebook
Check notebook details
Access notebook
Manage notebook
Use notebook