Configure data box access

Prev Next

Available in VPC

Once your data box is created, you will be notified through email. After checking the mail, access the NAVER Cloud Platform console and complete the SSL VPN user configuration. Then, access the infrastructure services using the access credentials of the data box, review the sample data, and install modules you need for analysis. As all external network connections are blocked once a data supply request has been made, install the necessary modules or download essential data before submitting your data supply request. Once all external network connections are blocked, file imports are limited to those uploaded to Object Storage bucket.

1. Configure SSL VPN user

SSL VPN user configuration must be completed to start using the data box. To configure SSL VPN user, follow these steps:

  1. In the VPC environment of the NAVER Cloud Platform console, navigate to i_menu > Services > Big Data & Analytics > Cloud Data Box > My Space in order.
  2. Go to the tab of the data box you want to configure and click the [Configure SSL VPN user] button.
    clouddatabox-connect_sslvpn01_new_ko
  3. Check the number of users that can be registered. Enter the user name, password, email, and SMS to be used for authentication and click the [Add] button.
    clouddatabox-connect_sslvpn02_new_ko
  4. When the user configuration is completed, click the [Close] button.
Note

For more information on how to change the number of SSL VPN user accounts, how to delete an account, and how to reset the password, see Databox management.

2. Check the access information of infrastructure services

This is the stage where you check the databox infrastructure service's access details. To check the access details of the infrastructure service, follow these steps:

  1. In the VPC environment of the NAVER Cloud Platform console, navigate to i_menu > Services > Big Data & Analytics > Cloud Data Box > My Space in order.
  2. Confirm that you selected the correct created data box and click the [View server details] button.
    clouddatabox-datarequest_add01_new_ko
  3. Go to the Infrastructure tab and review the IPs and the IDs assigned to each product.
    • Click the clouddatabox-open button in the Cloud Hadoop or TensorFlow's row to see more details.
      clouddatabox-datarequest_add02_new_ko

3. Access SSL VPN and Connect Server

To access data box's infrastructure services, you must access the SSL VPN first and then the Connect Server.

Caution
  • Running SSL VPN Agent while another VPN connection is enabled may result in a crash. Make sure that the other VPN connection is disabled before you run SSL VPN Agent.

To access SSL VPN and the Connect Server, follow these steps:

  1. Install SSL VPN Agent.
    • For more information on how to install the SSL VPN Agent, see Install SSL VPN Agent in the SSL VPN user's guide (VPC).
  2. Run the BIG-IP Edge Client.
    • For more information on how to access the BIG-IP Edge Client, see Access SSL VPN Agent in the SSL VPN user's guide (VPC).
    • Server address: https://sslvpn-kr-vpc-01.ncloud.com
  3. Enter the user name and the password registered during the stage described in 1. Configure SSL VPN user and click the [Log on] button.
  4. Enter the OTP code sent to your mobile phone or email account and click the [Log on] button.
  5. As Connect Server is Windows Server, you must run Remote desktop access on your PC to access Connect Server. Enter the Connect Server's IP address, click the [Connect] button, and enter the user name and the password.
    • If you forgot the password for Connect Server, Ncloud TensorFlow Server, or Hadoop cluster or if you received an password reset email, go to the Cloud Data Box > My Space > Details screen and click the [Reset password] button to update your password.
Note

Once the data supply request has been made, all external network connections are blocked, leaving you unable to install a module using a command such as "pip install." To install a module, you must download the installation file and import it to your data box through the "file import" request process. As such, we encourage you to write the analysis code using the sample data and install all the necessary modules before you submit the data supply request.

4. Access and use Cloud Hadoop server

You can access Hadoop cluster through Chrome browser or PuTTY program installed on the Connect server.

Note

Cloud Hadoop provides access permission directly to the server and management tool (Ambari) to the user, aiding the user to directly manage the cluster. This guide informs you on how to access Hadoop cluster only. For more information on using Cloud Hadoop, see Cloud Hadoop user guides.

Access cluster node through SSH

To access Hadoop edge node through SSH using a PPK file, follow these steps:

  1. Run PuTTY in the Connect Server and enter the access credentials.

    • Host Name: sshuser@HadoopEdgeNodeIP
    • Port : 22
    • Connection type : SSH
    Note

    You can view the Hadoop edge node's IP in the Infrastructure information section on the NAVER Cloud Platform console.

  2. Click Connection > SSH > Auth in order. Then, click the [Browser] button and select the PPK file.

  3. To access Hadoop cluster edge node, click the [Open] button.

  4. To see the entire nodes, access Hadoop cluster and enter the following commands:

    $ cat /etc/hosts 
    
    • Edge node: starts with e-001.
    • Master node: starts with m-001, m-002.
    • Worker node: from d-001 to the number of created worker node
      databox-connect-03
  5. To access a master node or a worker node, access the edge node and enter the following commands. For the name of the master node and worker node, enter m-00# or d-00# (# means the order) and enter the tab key to complete automatically. Enter yes in access check to access a different node from the edge node.
    To access another node, enter exit. Once you have returned to the edge node, you can access another node.

    • "m-001-xxx" is the name of the node viewed in the above example.
    $ ssh sshuser@m-001-xxx 
    $ ...
    $ exit
    

Check Hadoop data storage

Once you have completed the data box subscription, you can see that the NAS you applied for to be used when importing/exporting files has been mounted. Once you have submitted the data supply request, you can see that the NAS for the requested data is mounted as read-only.

  • NAS for file import/export requests: /mnt/nasw# ("#" is a number.)
    The NAS you applied for when creating the data box is mounted on the edge node's subdirectory under the "/mnt" directory. You can see it after accessing the Hadoop edge node through SSH and using the following commands:

    $ df -h
    
    Note

    Data can be shared between the NAS you applied for and Hadoop cluster as they are both mounted in Ncloud TensorFlow Server. When you upload a file to your bucket and request file import, it is stored in the NAS. When you upload a file to the NAS and request file export, it is stored in your Object Storage bucket after a review.

  • Sample data: /user/ncp/sample
    The sample data is uploaded to the following HDFS after the data box has been created:

    $ hdfs dfs -ls /user/ncp/sample
    
  • NAS for provided data: /mnt/xx ("xx" is the data's mount directory.)
    Once the data is supplied, the provided data including NAVER search, shopping, and AI data is mounted in a subdirectory under the "/mnt" directory. You can see it after accessing the Hadoop edge node through SSH and using the following commands:

    $ df -h
    
    Note
    • To use the data you requested, upload the necessary data on Hadoop yourself. For more information, see Upload the provided data to Hadoop cluster.
    • Before you upload new data to Hadoop, make sure that there is enough storage space. If there is not enough storage space, delete unnecessary data.

5. Access and use Ncloud TensorFlow Server

You can access the TensorFlow CPU and TensorFlow GPU server through Chrome browser or PuTTY from the Connect Server.

Access Jupyter Notebook

To access Jupyter Notebook through Chrome browser from the Connect Server, follow these steps:

  1. Double-click the Chrome icon on the background of Connect Server.
  2. Enter the following address and access Jupyter Notebook.
    • http://ServerIP:18888
    • pw: password entered when creating the data Box
    • You must use the HTTP to access it. It may take some time to load when accessing it for the first time.
Note

To see Ncloud TensorFlow Server IP, click the [Details] button on the data box page and go to the [Infrastructure] tab.

Check Ncloud TensorFlow Server data storage

To see the provided data storage, log in to Jupyter Notebook on Chrome. You can find it in the Home screen.
databox-connect-11

Item Description
① Additional Block Storage /home/ncp/workspace/blocks
TensorFlow Server offers additional 2 TB of Block Storage by default. We recommend that you migrate the frequently used provided data from NAS to Block Storage for better performance.
② For file import/export requests nas: /home/ncp/workspace/nasw# ("#" is a number.)
When you request file import or export, the data is forwarded to this NAS.
As the NAS that you applied for when creating the data box is also mounted on Hadoop cluster, its data can be shared directly.
③ Sample data /home/ncp/workspace/sample
The location where the sample data is supplied to when you create a data box. You can utilize the sample data to install necessary modules and configure analysis environment.
④ Provided data (read-only) /home/ncp/workspace/xx ("xx" refers to the data's mount directory.)
Once the data is supplied, the provided data including NAVER search, shopping, and AI data is mounted on Jupyter Notebook's home directory for your use. Requested data's NAS is supplied as read-only.
Note

"/home/ncp/workspace" is the Jupyter Notebook's home directory.

Access server through SSH

To access Ncloud TensorFlow Server, follow these steps:

  1. Run PuTTY in the Connect Server and enter the access credentials.
    • Host Name: root@Server IP
    • Port : 22
    • Connection type : SSH
  2. Click [Open].
Note

To see Ncloud TensorFlow Server IP, click the [Details] button on the data box page and go to the [Infrastructure] tab.

Restart Ncloud TensorFlow Server docker

To restart TensorFlow docker, enter the commands as follows. Then, restart Jupyter Notebook.

  • Restart TensorFlow CPU
    docker restart tf-server-mkl  
    
  • Restart TensorFlow GPU
    docker restart tf-server-gpu
    

Restart Jupyter Notebook

To restart Jupyter Notebook, enter the following commands:

jup restart or
run jup stop, and then jup start