Create and manage notebook
    • PDF

    Create and manage notebook

    • PDF

    Article Summary

    The latest service changes have not yet been reflected in this content. We will update the content as soon as possible. Please refer to the Korean version for information on the latest updates.

    Available in VPC

    This page describes information about notebooks provided by Data Forest and how to create and manage one. Data Forest provides you with an easy and convenient data analysis environment through a notebook.

    Preparations

    1. Object Storage creation
      Before creating a cluster, you must have created an Object Storage bucket for storing and retrieving data. For more details, see Object Storage guide.

    2. VPC, subnet creation
      Create VPC and subnet in Networking > VPC from the NAVER Cloud Platform console. For more details, see VPC user guide. Regardless of the number of notebooks, at least 1 VPC is required. You can have multiple notebooks in the same VPC and use them. For private VPC environment, only the KR-2 Region can create VPC and upon creating a notebook, only the Public Subnet is supported.

    3. Select server specifications
      Select the server specifications in advance considering the expected usage.

    Note

    In case of locked or setting with access limit for Object Storage bucket, an issue may occur when integrating with a notebook.

    Create notebook

    The following describes how to create a notebook.

    1. Create an account that will own the notebook.

    2. From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest menu, in that order.

    3. Click the Notebooks > Create notebook button on the left.

    4. Enter the details of the notebook you want to create on the notebook settings screen.
      df-notebook-createnb-vpc_ko

      NameDescription
      Account nameAccount to own notebook
      Notebook typeSelect notebook type to create. Only provides JupyterLab type as of now
      Notebook nameDesignate the name of the notebook to create
      • You can enter between 3 to15 characters, and can enter only lowercase letters, numbers, and hyphens (-)
      VPCSelect the VPC created in preparations
      SubnetSelect the Subnet for locating the notebook node
      • Currently, notebooks can be created only in subnets with available IP generations in Public Subnets in the Korea (KR-2) Region (zone), and web domain can be accessed based on public IP
      Server specificationsSelect a server type to use as a notebook node. The server type can't be changed after the notebook node is created
      ACGData Forest Notebook ACGs are automatically created whenever you create a notebook
      Additional storageYou can add a separate Block Storage for use
      Additional storage typeSelect a storage type. You can choose between SSD and HDD. The storage type can't be changed after the notebook is created
      Additional storage capacityEnter storage capacity. You can set from 100 GB to 6 TB and specify the capacity in 10 GB units
      Object Storage bucketsSelect the created Object Storage bucket from the previous job
    5. When setting is complete, click the [Next] button.

    6. Enter the user settings values for the components of the notebook you want to create on the user settings screen.
      df-notebook-usersettingnb-vpc_ko

      ComponentNameDescription
      Jupyter LabAccess PasswordPassword to use when accessing Jupyter Lab Web UI
    7. When setting is complete, click the [Next] button.

    8. On the authentication key setup screen, enter the authentication key information required when directly accessing the notebook node.
      Select an authentication key you have or create a new one and click the [Next] button.
      To create a new authentication key, select Create new authentication key, enter the authentication key name, and then click the [Create and save authentication key] button.

      Note

      The authentication key is required to get the admin password. Keep the saved PEM file in a safe location on your PC.

    9. After final confirmation, click the [Create] button.
      It takes about 5 to 10 minutes for a new notebook to be created. Once it is successfully created, you can check it from the Notebooks console.

    Check notebook details

    The following describes how to check the notebook details.

    1. From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest menu, in that order.

    2. Click the Notebooks menu on the left. You can see the list of the notebooks you've created.
      df-notebook-detailnb-vpc_ko

      NameDescription
      Notebook nameName of created notebook
      AccountAccount that owns notebook
      Notebook typeType of created notebook. Only provides JupyterLab type as of now
      ConditionNotebook node's condition
      Server specificationsServer specifications of notebook node
      VPCVPC in which notebook is created
      SubnetSubnet applied to notebook node
      Creation timeDate and time when the notebook is created
    3. Click df-app_open at the end of the notebook list to see the notebook details.

      ItemDescription
      Account nameAccount that owns notebook
      Notebook typeType of created notebook. Only provides JupyterLab type as of now
      Notebook nameName of created notebook
      Notebook IDUnique notebook ID
      Server specificationsServer specifications of notebook node
      VPCVPC in which notebook is created
      SubnetSubnet applied to notebook node
      ACGACG applied to notebook node
      Additional storageAdditional storage information
      DomainDomain assigned to public IP
      Authentication key nameName of the authentication key applied to the notebook
      SSH access accountOS account name for directly accessing notebook node using SSH
      Set userInformation on user settings applied to the notebook
      BucketObject Storage bucket information

    Access notebook

    The following describes how to access a notebook.

    Access notebook web page

    The following describes how to access a notebook's web page:

    1. Before proceeding, ensure that JupyterLab's port 80 is added to the ACG of the notebook.
    2. From the Notebooks menu, click Go to domain from the notebook's details screen.
      • If the notebook nodes are created within a public subnet, web pages can be accessed directly using the public IP without requiring additional tunneling settings.
    3. Once the login screen of the JupyterLab web page appears, enter your password to log in.
      • The password is the one set in the Access Password field on the user settings screen during the notebook creation process.
      • If you forgot your password or need to change your password, click the user settings [Details/reset] button on the details screen of the notebook to reset your password.
        df-notebook-jupyter-login

    Access directly to notebook node via SSH

    When creating a notebook, you can directly access the notebook node via SSH using the authentication key set in the authentication key setting step.

    Preparations

    The following describes how to add a new fixed IP to the notebook ACG.

    1. From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest menu, in that order.
    2. Click the Go to ACG icon from the details page of the notebook you want to access.
    3. Select the ACG of the notebook you want to access and click [ACG settings].
    4. Enter the 4 information items below and add an ACG Rule.
      • Protocol: TCP
      • Access source: IP of the local equipment used for SSH communication
      • Allowed port: 22
      • Note (optional)

    SSH connection in the mac OS

    It describes the access method using iTerm2. Even if other programs are used, the same action generates the same outcome.
    You can find the notebook's domain information from its details screen.

    chmod 400 </path/to/pem-key>
    ssh -i </path/to/pem-key>  forest@<notebook-domain>
    

    SSH connection in Windows

    It describes the access method using the PuTTY client. Even if other programs are used, the same action generates the same outcome.

    Proceed with the following steps in order:

    1. Authentication key (.pem) conversion

    PuTTY doesn't natively support the private key format (.pem) generated by Data Forest. You can use the PuTTYgen application provided by PuTTY to convert the authentication key into the format (.ppk) used by PuTTY. The private key must be converted to this format (.ppk) before PuTTY can be used to connect to the notebook node.

    1. Run PuTTYgen. (Download puttygen)
    2. Select RSA in Type of key to generate and click the [Load] button.
    3. Select the authentication key (*.pem), and then click the [Open] button.
      • To find a file in the PEM format, select the option to display all file types.
      • The PEM file is the file name of the authentication key that is currently applied to the cluster. This PEM file must be stored on the user's local PC.
      • If the PEM file is missing, the authentication key for connection can be changed from the Console > Manage server access > Change authentication key menu.
    4. Check the details in the completion confirmation pop-up window, and then click the [OK] button.
    5. Click the [Save private key] button and save it as a ppk format file that can be used in PuTTY.
      • If PuTTYgen displays a warning message about saving a key without a password, then select the [Yes] button.
      • Save it with the same name as the previously generated authentication key. PuTTY automatically adds the .ppk file extension.
    Note

    Once a tunnel is created, you can save the session in PuTTY and load it when needed.

    • To save a session, enter the session name in the Saved sessions field of the Load, save or delete a stored session, and then click [Save].
    • To run a session, select the session from the Saved Sessions list, and then click the [Open] button.

    2. Connect to notebook node

    1. Run PuTTY. (Download PuTTY)
    2. Select Session from the Category window and enter info in each of the settings fields as below.
      • HostName (or IP address): forest@notebook-domain (enter the domain address of the notebook in notebook-domain)
      • Port: 22
    3. In the Category window, select Connection > SSH to expand, and then click the Auth item.
    4. Click the [Browse] button to select the PPK file created by converting the PEM file, and then click the [Open] button.
    5. Make sure that you are successfully connected to the notebook node.

    Manage notebook

    Change authentication key

    If you forgot or want to change the authentication key designated during the notebook creation process, the key can be changed.
    The following describes how to change the authentication key.

    1. From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest menu, in that order.
    2. Click the Notebooks menu on the left.
    3. After selecting the account and the notebook to change the authentication key for, click the [Manage server connection] > [Change authentication key] button.
    4. After verifying the user identity, the authentication key can be changed by using a different authentication key that you already own or by creating a new authentication key.

    Reset user settings

    The following describes how to edit the user settings specified when creating a notebook.

    1. From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest menu, in that order.
    2. Click the Notebooks menu on the left.
    3. Select an account, and from the list of notebooks, select the notebook whose user settings you wish to change to go to the detailed view screen.
    4. Click the [View details/Reset] button of a user settings item.
    5. Enter the values of the settings that need to be reset, and click the [Reset] button.
    Note

    When editing the user settings, the job details in the docker's home directory (/home/forest/) reset due to restarting the notebook. Make sure to save necessary data before restarting the notebook.

    Edit bucket

    The following describes how to edit Object Storage bucket.

    1. From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest menu, in that order.
    2. Click the Notebooks menu on the left.
    3. Select the account and the notebook to edit the bucket from the notebook list, and then go to the notebook details screen.
    4. Click the [Edit] button from the bucket list.
    5. Select the bucket for editing and click the [Apply] button.
    Note
    • Duplicate integration is not possible if the bucket is already integrated to the notebook of an identical account.
    • When editing the bucket, the job details in the docker's home directory (/home/forest/) reset due to restarting the notebook. Make sure to save necessary data before restarting the notebook.

    Restart notebook

    The following describes how to restart the notebook.

    1. From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest menu, in that order.
    2. Click the Notebooks menu on the left.
    3. Select an account and the notebook to restart from the notebook list, and then click the [Restart] button.

    Delete notebook

    The following describes how to delete a notebook.

    1. From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest menu, in that order.
    2. Click the Notebooks menu on the left.
    3. Select an account and the notebook to delete from the notebook list, and then click the [Delete] button.
    Note

    When you delete a notebook, the locally stored notebook files are lost. Make sure to save necessary notebook files before deleting the notebook.

    Use notebook

    Use JupyterLab Extension

    df-notebook-jupyter-menu-new

    1. Object Storage Browser - Object Storage integration and bucket information
      • The Data Forest account that created a notebook is automatically integrated with the owned Object Storage bucket.
      • The file information within each bucket can be viewed.
    2. File Browser - home directory information of the Data Forest user connected to the docker
      • The access account name is forest, and the home directory is /home/forest.
      • The keytab file of the Data Forest account is already uploaded under ~/keytab.
      • If you have added additional storage when creating a notebook, you can check it from the ~/data path that it has been mounted on.
      • When creating a notebook, you can check the Object Storage bucket has been mounted on the ~ /data/{bucket_name} folder.
    3. Running Terminals and Kernels - currently running terminals and kernel information
    4. Git - GIT Repository reset and integration information
    5. Terminal - connects to the docker running on the notebook
      • In case of a backup requirement, saving data or file in the Object Storage bucket mounted ~ /data/{bucket_name} folder during notebook creation, a backup of the data or file saved in the Object Storage is available even if the notebook restarts or gets deleted.

    Run code

    After accessing the JupyterLab Web UI, you can use PySpark to integrate with the Data Forest cluster.

    Proceed by selecting PySpark as the kernel configuration on your laptop.

    import os
    import pyspark
    import socket
    from pyspark.sql import SQLContext, SparkSession
    
    sc = SparkSession  \
            .builder \
            .appName("SparkFromJupyter") \
            .getOrCreate()
    
    sqlContext = SQLContext(sparkContext=sc.sparkContext, sparkSession=sc)
    print("Spark Version: " + sc.version)
    print("PySpark Version: " + pyspark.__version__)
    
    df = sqlContext.createDataFrame(
        [(1, 'foo'), (2,'bar')], #records
        ['col1', 'col2'] #column names
    )
    df.show()
    
    print(socket.gethostname(), socket.gethostbyname(socket.gethostname()))
    

    df-notebook-jupyter-pyspark

    Integrate with Object Storage and backup notebook files

    • During a notebook creation if the Object Storage bucket is selecting during the notebook settings process, it is mounted to the /home1/forest/data/{bucket_name} path when the notebook docker is executed. You can back up notebook files (.ipynb files) stored on your local disk by uploading them to the integrated bucket path.
    • The user can check the automatically integrated Object Storage bucket information by accessing the JupyterLab Web.


    To integrate the specified Object Storage bucket when creating a notebook, refer to the following code.
    Proceed by selecting Python3 as the kernel setting from the notebook.

    # Import required boto3 module 
    import boto3
    
    # Enter Object Storage information
    service_name = 's3'
    endpoint_url = 'https://kr.object.private.ncloudstorage.com'
    region_name = 'kr-standard'
    access_key = "user's access key"
    secret_key = "user's secret key"
    
    # Integrating Object Storage using Boto3
    if __name__ == "__main__":
       s3 = boto3.client(service_name, endpoint_url=endpoint_url, aws_access_key_id=access_key, aws_secret_access_key= secret_key)
       
    s3.upload_file("local_file_name", "bucket_name", "s3s3_path/file_name")
    
    

    Train Tensorflow MNIST

    After accessing the JupyterLab Web UI, you can perform TensorFlow MNIST Training using Python3.

    Proceed by selecting Python3 as the kernel setting from the notebook.

    # tensorflow library import
    # If you need to install TensorFlow
    # !pip install tensorflow_version
    
    import tensorflow as tf
    mnist = tf.keras.datasets.mnist
    
    # Load MNIST data set
    (x_train, y_train),(x_test, y_test) = mnist.load_data()
    x_train, x_test = x_train / 255.0, x_test / 255.0
    
    # Define model by adding layer
    model = tf.keras.models.Sequential([
      tf.keras.layers.Flatten(input_shape=(28, 28)),
      tf.keras.layers.Dense(128, activation='relu'),
      tf.keras.layers.Dropout(0.2),
      tf.keras.layers.Dense(10, activation='softmax')
    ])
    
    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])
    
    # Train and evaluate model
    model.fit(x_train, y_train, epochs=5)
    model.evaluate(x_test, y_test)
    
    # Save trained model as HDF5 file
    model.save('my_model.h5')
    

    GIT integration

    After accessing the Jupypterlab Web UI, you can integrate with GIT by entering the GIT repository information to integrate with in the left menu.


    Was this article helpful?

    Changing your password will log you out immediately. Use the new password to log back in.
    First name must have atleast 2 characters. Numbers and special characters are not allowed.
    Last name must have atleast 1 characters. Numbers and special characters are not allowed.
    Enter a valid email
    Enter a valid password
    Your profile has been successfully updated.