Using Notebooks
    • PDF

    Using Notebooks

    • PDF

    Article Summary

    Available in VPC

    Cloud Hadoop Notebook provides "serverless" Jupyter Notebook to run the queries and code needed to analyze your data.
    You can create and delete notebook nodes through the Notebooks console.
    You can proceed with data analysis by accessing the JupyterLab and Jupyter Notebook web pages of the created notebook node.

    Queries and code used in notebooks are executed through the kernel of the Cloud Hadoop cluster, and stored as notebook files in Object Storage for flexible reuse.

    Notebook screen

    The basics of using the Notebook service are as follows.

    notebook_console_1_ko

    AreaDescription
    ① Create notebookCreate a new notebook
    ② DeleteDelete the notebook in use
    ③ Open in JupyterLabAccess JupyterLab Web UI
    ④ Open in JupyterConnect to Jupyter Web UI
    ⑤ Notebook listCheck the created notebook list and detailed information

    Prerequisite

    1. Create Object Storage
      Before creating a Cloud Hadoop cluster, you must first create an Object Storage bucket for storing and retrieving data.
    2. Create a Cloud Hadoop cluster
      A Cloud Hadoop cluster to work with the notebook node must have been created.
    3. Node type selection
      Choose your node type in advance considering your expected usage.

    Create notebook

    The following describes how to create a notebook.

    You can create multiple notebook nodes in a single Cloud Hadoop cluster, and each notebook node can be linked with the Cloud Hadoop cluster.

    1. Connect to the NAVER Cloud Platform console.
    2. Click Platform from the VPC menu to switch to the VPC environment.
    3. Click the Services > Big Data & Analytics > Cloud Hadoop menus in that order in the NAVER Cloud Platform console.
    4. Click the Notebooks menu.
    5. Click the [Create notebook] button.
    6. When the Create notebook page appears, proceed with the following steps in order.

    1. Set notebook

    Specify the notebook settings information, and then click the [Next] button.

    • Notebook name: enter the name of the notebook node.
    • Notebook version: select the notebook version. Currently, only the 1.0 version is provided for notebooks.
    • Notebook component: you can check the component information by version.
    • Cluster: select the Cloud Hadoop cluster you want the notebook to work with. Notebooks can only be linked with the latest Cloud Hadoop 1.8 version or later.
    • ACG settings: Cloud Hadoop Notebook ACG is automatically created whenever you create a notebook. If you want to set up a network ACL, you can edit the rules by selecting the automatically created ACG when creating the notebook. For more information on ACG settings, see the Set firewall (ACG) guide.

    2. Set storage and server

    Specify the storage and node server settings information, and then click the [Next] button.

    • Object Storage buckets: can read and write data in object storage buckets created in preparations. When creating a notebook, select the Object Storage bucket created in preparations.
    • Notebook node subnet: select the subnet where the notebook node will be located.
      • If you create a notebook node on a public subnet, then web domain access based on public IP becomes available.
      • If you create a notebook node on a private subnet, then web domain access becomes available via SSL VPN.
    • Notebook node server type: select the server type to be used for the notebook node. The server type can't be changed after the notebook node is created. For the specifications of servers that can be used as notebook nodes, see Supported server specifications by cluster node.
    • Number of notebook nodes: the number of notebook nodes is fixed at 1.
    • Status to add notebook node storage: you can use it by adding a separate block storage.
    • Notebook node storage type: select the storage type. You can select either SSD and HDD. The storage type can't be changed after the cluster is created.
    • Notebook node storage capacity: select the storage capacity. You can select from 100 GB to 6 TB, and specify the capacity in 10 GB increments.
    • Pricing plan: the pricing plan selected at account creation is applied. For more information on fees, see pricing information.

    3. Set authentication key

    To directly access the notebook node via SSH, you need to set the authentication key (.pem).
    When creating a notebook, select an authentication key that you have or create a new one, and then click the [Next] button.

    • To create a new authentication key, select Create new authentication key, enter the authentication key name, and then click the [Create and save authentication key] button.
      hadoop-use-notebooks-pemkey-vpc_ko
    Note

    The authentication key is required to get the admin password. Keep the saved PEM file in a safe location on your PC.

    4. Final confirmation

    After checking the request details, click the [Create] button.

    Note
    • Cloud Hadoop Notebook ACG is automatically created whenever you create a notebook. If you want to set up a network ACL, then you can edit the rule by selecting the ACG that was created automatically. For more information on ACG settings, see the Set firewall (ACG) guide.
    • It takes about 5 to 10 minutes for a notebook to be created. Once the notebook is created and starts running, 운영중 is displayed in the Status column of the cluster list.

    Access notebook

    Access notebook web page

    Click the [Open in JupyterLab] button or the [Open in Jupyter] button in the Cloud Hadoop (Notebooks) console. You can access the web page of Jupyter Notebook installed on the notebook node.

    • Add allowed ports 8889 for JupyterLab and allowed ports 8888 for Jupyter web page to Cloud Hadoop Notebook ACG.
    • For notebook nodes created in Public Subnet, web access is possible based on public IP.
    • In the case of a notebook node created in a private subnet, web access is possible only when connected through SSLVPN.
    Note

    For detailed descriptions on setting up SSL VPN and ACG, see the guide on setting UI access and password by service .

    Access SSH directly to the notebook node

    When creating a notebook, you can directly access the notebook node via SSH using the authentication key set in the authentication key setting step. For more information, see the Access cluster node via SSH guide.

    Use notebook

    You can use the notebook by integrating it in various ways.

    Prerequisite

    ACG permission between the Cloud Hadoop cluster and the notebook node to be interlocked is required. Set the ACG as follows, including the ACG of the notebook node in the ACG of the Cloud Hadoop cluster to be interlocked.

    • In Default ACG of the Cloud Hadoop cluster you want to integrate with, click the [Inbound] tab. Add the ACG of the notebook node to Access source, add all ports 1 - 65535 to Allowed port, and then click the [Apply] button.
      notebook_acg_ko
    Note

    We recommend that you create the Cloud Hadoop cluster and laptop in the same subnet that can communicate within the same VPC.

    Integrating Object Storage data in notebooks

    Connect to the Object Storage bucket by accessing the Jupyter Notebook Web UI and entering your Object Storage information.

    • You can check the Access key ID and Secret key information for Object Storage on NAVER Cloud Platform portal's My Page > [Authentication Key Management]. For more information, see Getting started with Object Storage.
      cloudhadoop-mypage-authkey1_ko

    <Python3 example code>

    Proceed by selecting Python3 as the kernel setting from the notebook.

    # Import required boto3 module 
    import boto3
    
    # Enter Object Storage information
    service_name = 's3'
    endpoint_url = 'https://kr.object.private.ncloudstorage.com'
    region_name = 'kr-standard'
    access_key = "user's access key"
    secret_key = "user's secret key"
    
    # Integrating Object Storage using Boto3
    if __name__ == "__main__":
       s3 = boto3.client(service_name, endpoint_url=endpoint_url, aws_access_key_id=access_key, aws_secret_access_key= secret_key)
       
    s3.upload_file("my_model.h5", "best", "model/my_model.h5")
    

    Integrating Spark on Cloud Hadoop cluster from your laptop

    After connecting to the Jupyter Notebook web UI, you can use PySpark to integrate with the Cloud Hadoop cluster.

    <PySpark example code>

    Proceed by selecting PySpark as the kernel configuration on your laptop.

    import os
    import pyspark
    import socket
    from pyspark.sql import SQLContext, SparkSession
    
    sc = SparkSession  \
            .builder \
            .appName("SparkFromJupyter") \
            .getOrCreate()
    
    sqlContext = SQLContext(sparkContext=sc.sparkContext, sparkSession=sc)
    print("Spark Version: " + sc.version)
    print("PySpark Version: " + pyspark.__version__)
    
    df = sqlContext.createDataFrame(
        [(1, 'foo'), (2,'bar')], #records
        ['col1', 'col2'] #column names
    )
    df.show()
    
    print(socket.gethostname(), socket.gethostbyname(socket.gethostname()))
    

    notebook_pyspark_example

    Learn TensorFlow MNIST in your notebook

    After accessing the Jupyter Notebook Web UI, you can perform TensorFlow MNIST Training using Python3.

    <Python3 example code>

    Proceed by selecting Python3 as the kernel setting from the notebook.

    # tensorflow library import
    # If you need to install TensorFlow
    # !pip install tensorflow_version
    
    import tensorflow as tf
    mnist = tf.keras.datasets.mnist
    
    # Load MNIST data set
    (x_train, y_train),(x_test, y_test) = mnist.load_data()
    x_train, x_test = x_train / 255.0, x_test / 255.0
    
    # Define model by adding layer
    model = tf.keras.models.Sequential([
      tf.keras.layers.Flatten(input_shape=(28, 28)),
      tf.keras.layers.Dense(128, activation='relu'),
      tf.keras.layers.Dropout(0.2),
      tf.keras.layers.Dense(10, activation='softmax')
    ])
    
    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])
    
    # Train and evaluate model
    model.fit(x_train, y_train, epochs=5)
    model.evaluate(x_test, y_test)
    
    # Save learned model as HDF5 file
    model.save('my_model.h5')
    

    notebook_tensorflow_mnist_example

    Delete notebook

    You can delete notebooks whose usage is complete. The notebook files (files with .ipynb extension) used in Jupyter Notebook are stored under the Object Storage bucket of the Cloud Hadoop cluster.

    Note

    Even if you delete notebooks, notebook files that were used are not deleted.


    Was this article helpful?

    Changing your password will log you out immediately. Use the new password to log back in.
    First name must have atleast 2 characters. Numbers and special characters are not allowed.
    Last name must have atleast 1 characters. Numbers and special characters are not allowed.
    Enter a valid email
    Enter a valid password
    Your profile has been successfully updated.