- Print
- PDF
Using Notebooks
- Print
- PDF
Available in VPC
Cloud Hadoop Notebook provides "serverless" Jupyter Notebook to run the queries and code needed to analyze your data.
You can create and delete notebook nodes through the Notebooks console.
You can proceed with data analysis by accessing the JupyterLab and Jupyter Notebook web pages of the created notebook node.
Queries and code used in notebooks are executed through the kernel of the Cloud Hadoop cluster, and stored as notebook files in Object Storage for flexible reuse.
Notebook screen
The basics of using the Notebook service are as follows.
Area | Description |
---|---|
① Create notebook | Create a new notebook |
② Delete | Delete the notebook in use |
③ Open in JupyterLab | Access JupyterLab Web UI |
④ Open in Jupyter | Connect to Jupyter Web UI |
⑤ Notebook list | Check the created notebook list and detailed information |
Prerequisite
- Create Object Storage
Before creating a Cloud Hadoop cluster, you must first create an Object Storage bucket for storing and retrieving data.- For more information on creating Object Storage, see the Object Storage Guide.
- Create a Cloud Hadoop cluster
A Cloud Hadoop cluster to work with the notebook node must have been created.- For more information on creating a Cloud Hadoop cluster, see Getting Started with Cloud Hadoop.
- Node type selection
Choose your node type in advance considering your expected usage.
Create notebook
The following describes how to create a notebook.
You can create multiple notebook nodes in a single Cloud Hadoop cluster, and each notebook node can be linked with the Cloud Hadoop cluster.
- Connect to the NAVER Cloud Platform console.
- Click Platform from the VPC menu to switch to the VPC environment.
- Click the Services > Big Data & Analytics > Cloud Hadoop menus in that order in the NAVER Cloud Platform console.
- Click the Notebooks menu.
- Click the [Create notebook] button.
- When the Create notebook page appears, proceed with the following steps in order.
1. Set notebook
Specify the notebook settings information, and then click the [Next] button.
- Notebook name: enter the name of the notebook node.
- Notebook version: select the notebook version. Currently, only the 1.0 version is provided for notebooks.
- Notebook component: you can check the component information by version.
- Cluster: select the Cloud Hadoop cluster you want the notebook to work with. Notebooks can only be linked with the latest Cloud Hadoop 1.8 version or later.
- ACG settings: Cloud Hadoop Notebook ACG is automatically created whenever you create a notebook. If you want to set up a network ACL, you can edit the rules by selecting the automatically created ACG when creating the notebook. For more information on ACG settings, see the Set firewall (ACG) guide.
2. Set storage and server
Specify the storage and node server settings information, and then click the [Next] button.
- Object Storage buckets: can read and write data in object storage buckets created in preparations. When creating a notebook, select the Object Storage bucket created in preparations.
- Notebook node subnet: select the subnet where the notebook node will be located.
- If you create a notebook node on a public subnet, then web domain access based on public IP becomes available.
- If you create a notebook node on a private subnet, then web domain access becomes available via SSL VPN.
- Notebook node server type: select the server type to be used for the notebook node. The server type can't be changed after the notebook node is created. For the specifications of servers that can be used as notebook nodes, see Supported server specifications by cluster node.
- Number of notebook nodes: the number of notebook nodes is fixed at 1.
- Status to add notebook node storage: you can use it by adding a separate block storage.
- Notebook node storage type: select the storage type. You can select either SSD and HDD. The storage type can't be changed after the cluster is created.
- Notebook node storage capacity: select the storage capacity. You can select from 100 GB to 6 TB, and specify the capacity in 10 GB increments.
- Pricing plan: the pricing plan selected at account creation is applied. For more information on fees, see pricing information.
3. Set authentication key
To directly access the notebook node via SSH, you need to set the authentication key (.pem).
When creating a notebook, select an authentication key that you have or create a new one, and then click the [Next] button.
- To create a new authentication key, select Create new authentication key, enter the authentication key name, and then click the [Create and save authentication key] button.
The authentication key is required to get the admin password. Keep the saved PEM file in a safe location on your PC.
4. Final confirmation
After checking the request details, click the [Create] button.
- Cloud Hadoop Notebook ACG is automatically created whenever you create a notebook. If you want to set up a network ACL, then you can edit the rule by selecting the ACG that was created automatically. For more information on ACG settings, see the Set firewall (ACG) guide.
- It takes about 5 to 10 minutes for a notebook to be created. Once the notebook is created and starts running,
운영중
is displayed in the Status column of the cluster list.
Access notebook
Access notebook web page
Click the [Open in JupyterLab] button or the [Open in Jupyter] button in the Cloud Hadoop (Notebooks) console. You can access the web page of Jupyter Notebook installed on the notebook node.
- Add allowed ports
8889
for JupyterLab and allowed ports8888
for Jupyter web page to Cloud Hadoop Notebook ACG. - For notebook nodes created in Public Subnet, web access is possible based on public IP.
- In the case of a notebook node created in a private subnet, web access is possible only when connected through SSLVPN.
For detailed descriptions on setting up SSL VPN and ACG, see the guide on setting UI access and password by service .
Access SSH directly to the notebook node
When creating a notebook, you can directly access the notebook node via SSH using the authentication key set in the authentication key setting step. For more information, see the Access cluster node via SSH guide.
Use notebook
You can use the notebook by integrating it in various ways.
- Integrating Object Storage data in notebooks
- Integrating Spark on Cloud Hadoop cluster from your laptop
- Learn TensorFlow MNIST in your notebook
Prerequisite
ACG permission between the Cloud Hadoop cluster and the notebook node to be interlocked is required. Set the ACG as follows, including the ACG of the notebook node in the ACG of the Cloud Hadoop cluster to be interlocked.
- In Default ACG of the Cloud Hadoop cluster you want to integrate with, click the [Inbound] tab. Add the ACG of the notebook node to Access source, add all ports 1 - 65535 to Allowed port, and then click the [Apply] button.
We recommend that you create the Cloud Hadoop cluster and laptop in the same subnet that can communicate within the same VPC.
Integrating Object Storage data in notebooks
Connect to the Object Storage bucket by accessing the Jupyter Notebook Web UI and entering your Object Storage information.
- You can check the Access key ID and Secret key information for Object Storage on NAVER Cloud Platform portal's My Page > [Authentication Key Management]. For more information, see Getting started with Object Storage.
<Python3 example code>
Proceed by selecting Python3 as the kernel setting from the notebook.
# Import required boto3 module
import boto3
# Enter Object Storage information
service_name = 's3'
endpoint_url = 'https://kr.object.private.ncloudstorage.com'
region_name = 'kr-standard'
access_key = "user's access key"
secret_key = "user's secret key"
# Integrating Object Storage using Boto3
if __name__ == "__main__":
s3 = boto3.client(service_name, endpoint_url=endpoint_url, aws_access_key_id=access_key, aws_secret_access_key= secret_key)
s3.upload_file("my_model.h5", "best", "model/my_model.h5")
Integrating Spark on Cloud Hadoop cluster from your laptop
After connecting to the Jupyter Notebook web UI, you can use PySpark to integrate with the Cloud Hadoop cluster.
<PySpark example code>
Proceed by selecting PySpark as the kernel configuration on your laptop.
import os
import pyspark
import socket
from pyspark.sql import SQLContext, SparkSession
sc = SparkSession \
.builder \
.appName("SparkFromJupyter") \
.getOrCreate()
sqlContext = SQLContext(sparkContext=sc.sparkContext, sparkSession=sc)
print("Spark Version: " + sc.version)
print("PySpark Version: " + pyspark.__version__)
df = sqlContext.createDataFrame(
[(1, 'foo'), (2,'bar')], #records
['col1', 'col2'] #column names
)
df.show()
print(socket.gethostname(), socket.gethostbyname(socket.gethostname()))
Learn TensorFlow MNIST in your notebook
After accessing the Jupyter Notebook Web UI, you can perform TensorFlow MNIST Training using Python3.
<Python3 example code>
Proceed by selecting Python3 as the kernel setting from the notebook.
# tensorflow library import
# If you need to install TensorFlow
# !pip install tensorflow_version
import tensorflow as tf
mnist = tf.keras.datasets.mnist
# Load MNIST data set
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
# Define model by adding layer
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# Train and evaluate model
model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test, y_test)
# Save learned model as HDF5 file
model.save('my_model.h5')
Delete notebook
You can delete notebooks whose usage is complete. The notebook files (files with .ipynb extension) used in Jupyter Notebook are stored under the Object Storage bucket of the Cloud Hadoop cluster.
Even if you delete notebooks, notebook files that were used are not deleted.