- Print
- PDF
Create and manage notebook
- Print
- PDF
The latest service changes have not yet been reflected in this content. We will update the content as soon as possible. Please refer to the Korean version for information on the latest updates.
Available in VPC
This page describes information about notebooks provided by Data Forest and how to create and manage one. Data Forest provides you with an easy and convenient data analysis environment through a notebook.
Preparations
Object Storage creation
Before creating a cluster, you must have created an Object Storage bucket for storing and retrieving data. For more details, see Object Storage guide.VPC, subnet creation
Create VPC and subnet in Networking > VPC from the NAVER Cloud Platform console. For more details, see VPC user guide. Regardless of the number of notebooks, at least 1 VPC is required. You can have multiple notebooks in the same VPC and use them. For private VPC environment, only the KR-2 Region can create VPC and upon creating a notebook, only the Public Subnet is supported.Select server specifications
Select the server specifications in advance considering the expected usage.
In case of locked or setting with access limit for Object Storage bucket, an issue may occur when integrating with a notebook.
Create notebook
The following describes how to create a notebook.
Create an account that will own the notebook.
- For how to create accounts, see Create and manage account.
From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest menu, in that order.
Click the Notebooks > Create notebook button on the left.
Enter the details of the notebook you want to create on the notebook settings screen.
Name Description Account name Account to own notebook Notebook type Select notebook type to create. Only provides JupyterLab type as of now Notebook name Designate the name of the notebook to create - You can enter between 3 to15 characters, and can enter only lowercase letters, numbers, and hyphens (-)
VPC Select the VPC created in preparations Subnet Select the Subnet for locating the notebook node - Currently, notebooks can be created only in subnets with available IP generations in Public Subnets in the Korea (KR-2) Region (zone), and web domain can be accessed based on public IP
Server specifications Select a server type to use as a notebook node. The server type can't be changed after the notebook node is created ACG Data Forest Notebook ACGs are automatically created whenever you create a notebook Additional storage You can add a separate Block Storage for use Additional storage type Select a storage type. You can choose between SSD and HDD. The storage type can't be changed after the notebook is created Additional storage capacity Enter storage capacity. You can set from 100 GB to 6 TB and specify the capacity in 10 GB units Object Storage buckets Select the created Object Storage bucket from the previous job When setting is complete, click the [Next] button.
Enter the user settings values for the components of the notebook you want to create on the user settings screen.
Component Name Description Jupyter Lab Access Password Password to use when accessing Jupyter Lab Web UI When setting is complete, click the [Next] button.
On the authentication key setup screen, enter the authentication key information required when directly accessing the notebook node.
Select an authentication key you have or create a new one and click the [Next] button.
To create a new authentication key, select Create new authentication key, enter the authentication key name, and then click the [Create and save authentication key] button.NoteThe authentication key is required to get the admin password. Keep the saved PEM file in a safe location on your PC.
After final confirmation, click the [Create] button.
It takes about 5 to 10 minutes for a new notebook to be created. Once it is successfully created, you can check it from the Notebooks console.
Check notebook details
The following describes how to check the notebook details.
From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest menu, in that order.
Click the Notebooks menu on the left. You can see the list of the notebooks you've created.
Name Description Notebook name Name of created notebook Account Account that owns notebook Notebook type Type of created notebook. Only provides JupyterLab type as of now Condition Notebook node's condition Server specifications Server specifications of notebook node VPC VPC in which notebook is created Subnet Subnet applied to notebook node Creation time Date and time when the notebook is created Click at the end of the notebook list to see the notebook details.
Item Description Account name Account that owns notebook Notebook type Type of created notebook. Only provides JupyterLab type as of now Notebook name Name of created notebook Notebook ID Unique notebook ID Server specifications Server specifications of notebook node VPC VPC in which notebook is created Subnet Subnet applied to notebook node ACG ACG applied to notebook node Additional storage Additional storage information Domain Domain assigned to public IP Authentication key name Name of the authentication key applied to the notebook SSH access account OS account name for directly accessing notebook node using SSH Set user Information on user settings applied to the notebook Bucket Object Storage bucket information
Access notebook
The following describes how to access a notebook.
Access notebook web page
The following describes how to access a notebook's web page:
- Before proceeding, ensure that JupyterLab's port 80 is added to the ACG of the notebook.
- From the Notebooks menu, click Go to domain from the notebook's details screen.
- If the notebook nodes are created within a public subnet, web pages can be accessed directly using the public IP without requiring additional tunneling settings.
- Once the login screen of the JupyterLab web page appears, enter your password to log in.
- The password is the one set in the Access Password field on the user settings screen during the notebook creation process.
- If you forgot your password or need to change your password, click the user settings [Details/reset] button on the details screen of the notebook to reset your password.
Access directly to notebook node via SSH
When creating a notebook, you can directly access the notebook node via SSH using the authentication key set in the authentication key setting step.
Preparations
The following describes how to add a new fixed IP to the notebook ACG.
- From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest menu, in that order.
- Click the Go to ACG icon from the details page of the notebook you want to access.
- Select the ACG of the notebook you want to access and click [ACG settings].
- Enter the 4 information items below and add an ACG Rule.
- Protocol: TCP
- Access source: IP of the local equipment used for SSH communication
- Allowed port: 22
- Note (optional)
SSH connection in the mac OS
It describes the access method using iTerm2. Even if other programs are used, the same action generates the same outcome.
You can find the notebook's domain information from its details screen.
chmod 400 </path/to/pem-key>
ssh -i </path/to/pem-key> forest@<notebook-domain>
SSH connection in Windows
It describes the access method using the PuTTY client. Even if other programs are used, the same action generates the same outcome.
Proceed with the following steps in order:
1. Authentication key (.pem) conversion
PuTTY doesn't natively support the private key format (.pem) generated by Data Forest. You can use the PuTTYgen application provided by PuTTY to convert the authentication key into the format (.ppk) used by PuTTY. The private key must be converted to this format (.ppk) before PuTTY can be used to connect to the notebook node.
- Run PuTTYgen. (Download puttygen)
- Select RSA in Type of key to generate and click the [Load] button.
- Select the authentication key (*.pem), and then click the [Open] button.
- To find a file in the PEM format, select the option to display all file types.
- The PEM file is the file name of the authentication key that is currently applied to the cluster. This PEM file must be stored on the user's local PC.
- If the PEM file is missing, the authentication key for connection can be changed from the Console > Manage server access > Change authentication key menu.
- Check the details in the completion confirmation pop-up window, and then click the [OK] button.
- Click the [Save private key] button and save it as a ppk format file that can be used in PuTTY.
- If PuTTYgen displays a warning message about saving a key without a password, then select the [Yes] button.
- Save it with the same name as the previously generated authentication key. PuTTY automatically adds the .ppk file extension.
Once a tunnel is created, you can save the session in PuTTY and load it when needed.
- To save a session, enter the session name in the Saved sessions field of the Load, save or delete a stored session, and then click [Save].
- To run a session, select the session from the Saved Sessions list, and then click the [Open] button.
2. Connect to notebook node
- Run PuTTY. (Download PuTTY)
- Select Session from the Category window and enter info in each of the settings fields as below.
- HostName (or IP address):
forest@notebook-domain
(enter the domain address of the notebook innotebook-domain
) - Port: 22
- HostName (or IP address):
- In the Category window, select Connection > SSH to expand, and then click the Auth item.
- Click the [Browse] button to select the PPK file created by converting the PEM file, and then click the [Open] button.
- Make sure that you are successfully connected to the notebook node.
Manage notebook
Change authentication key
If you forgot or want to change the authentication key designated during the notebook creation process, the key can be changed.
The following describes how to change the authentication key.
- From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest menu, in that order.
- Click the Notebooks menu on the left.
- After selecting the account and the notebook to change the authentication key for, click the [Manage server connection] > [Change authentication key] button.
- After verifying the user identity, the authentication key can be changed by using a different authentication key that you already own or by creating a new authentication key.
Reset user settings
The following describes how to edit the user settings specified when creating a notebook.
- From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest menu, in that order.
- Click the Notebooks menu on the left.
- Select an account, and from the list of notebooks, select the notebook whose user settings you wish to change to go to the detailed view screen.
- Click the [View details/Reset] button of a user settings item.
- Enter the values of the settings that need to be reset, and click the [Reset] button.
When editing the user settings, the job details in the docker's home directory (/home/forest/) reset due to restarting the notebook. Make sure to save necessary data before restarting the notebook.
Edit bucket
The following describes how to edit Object Storage bucket.
- From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest menu, in that order.
- Click the Notebooks menu on the left.
- Select the account and the notebook to edit the bucket from the notebook list, and then go to the notebook details screen.
- Click the [Edit] button from the bucket list.
- Select the bucket for editing and click the [Apply] button.
- Duplicate integration is not possible if the bucket is already integrated to the notebook of an identical account.
- When editing the bucket, the job details in the docker's home directory (/home/forest/) reset due to restarting the notebook. Make sure to save necessary data before restarting the notebook.
Restart notebook
The following describes how to restart the notebook.
- From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest menu, in that order.
- Click the Notebooks menu on the left.
- Select an account and the notebook to restart from the notebook list, and then click the [Restart] button.
Delete notebook
The following describes how to delete a notebook.
- From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest menu, in that order.
- Click the Notebooks menu on the left.
- Select an account and the notebook to delete from the notebook list, and then click the [Delete] button.
When you delete a notebook, the locally stored notebook files are lost. Make sure to save necessary notebook files before deleting the notebook.
Use notebook
Use JupyterLab Extension
- Object Storage Browser - Object Storage integration and bucket information
- The Data Forest account that created a notebook is automatically integrated with the owned Object Storage bucket.
- The file information within each bucket can be viewed.
- File Browser - home directory information of the Data Forest user connected to the docker
- The access account name is
forest
, and the home directory is /home/forest. - The keytab file of the Data Forest account is already uploaded under ~/keytab.
- If you have added additional storage when creating a notebook, you can check it from the ~/data path that it has been mounted on.
- When creating a notebook, you can check the Object Storage bucket has been mounted on the ~ /data/{bucket_name} folder.
- The access account name is
- Running Terminals and Kernels - currently running terminals and kernel information
- Git - GIT Repository reset and integration information
- Terminal - connects to the docker running on the notebook
- In case of a backup requirement, saving data or file in the Object Storage bucket mounted ~ /data/{bucket_name} folder during notebook creation, a backup of the data or file saved in the Object Storage is available even if the notebook restarts or gets deleted.
Run code
After accessing the JupyterLab Web UI, you can use PySpark to integrate with the Data Forest cluster.
Proceed by selecting PySpark as the kernel configuration on your laptop.
import os
import pyspark
import socket
from pyspark.sql import SQLContext, SparkSession
sc = SparkSession \
.builder \
.appName("SparkFromJupyter") \
.getOrCreate()
sqlContext = SQLContext(sparkContext=sc.sparkContext, sparkSession=sc)
print("Spark Version: " + sc.version)
print("PySpark Version: " + pyspark.__version__)
df = sqlContext.createDataFrame(
[(1, 'foo'), (2,'bar')], #records
['col1', 'col2'] #column names
)
df.show()
print(socket.gethostname(), socket.gethostbyname(socket.gethostname()))
Integrate with Object Storage and backup notebook files
- During a notebook creation if the Object Storage bucket is selecting during the notebook settings process, it is mounted to the /home1/forest/data/{bucket_name} path when the notebook docker is executed. You can back up notebook files (.ipynb files) stored on your local disk by uploading them to the integrated bucket path.
- The user can check the automatically integrated Object Storage bucket information by accessing the JupyterLab Web.
To integrate the specified Object Storage bucket when creating a notebook, refer to the following code.
Proceed by selecting Python3 as the kernel setting from the notebook.
# Import required boto3 module
import boto3
# Enter Object Storage information
service_name = 's3'
endpoint_url = 'https://kr.object.private.ncloudstorage.com'
region_name = 'kr-standard'
access_key = "user's access key"
secret_key = "user's secret key"
# Integrating Object Storage using Boto3
if __name__ == "__main__":
s3 = boto3.client(service_name, endpoint_url=endpoint_url, aws_access_key_id=access_key, aws_secret_access_key= secret_key)
s3.upload_file("local_file_name", "bucket_name", "s3s3_path/file_name")
Train Tensorflow MNIST
After accessing the JupyterLab Web UI, you can perform TensorFlow MNIST Training using Python3.
Proceed by selecting Python3 as the kernel setting from the notebook.
# tensorflow library import
# If you need to install TensorFlow
# !pip install tensorflow_version
import tensorflow as tf
mnist = tf.keras.datasets.mnist
# Load MNIST data set
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
# Define model by adding layer
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# Train and evaluate model
model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test)
# Save trained model as HDF5 file
model.save('my_model.h5')
GIT integration
After accessing the Jupypterlab Web UI, you can integrate with GIT by entering the GIT repository information to integrate with in the left menu.