Databox connection settings

release/20240425
English

Databox connection settings

Article Summary

Share feedback

Thanks for sharing your feedback!

Available in Classic and VPC

Once the databox is created, an email notification is sent to the user's email account. After checking the notification email, connect to NAVER Cloud Platform's console and complete SSL VPN user settings. Then, use the databox connection details to connect to the infrastructure service, view the sample data, and install additional modules required for analysis. After requesting data supply, external network communication is blocked. Make sure to install the required modules to the databox or download the necessary data before requesting data supply. Once external communications are blocked, only files imported to buckets in Object Storage can be imported with limitations.

1. SSL VPN user settings

To use the databox, SSL VPN user settings must be completed. The following describes how to set SSL VPN users.

From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Cloud Data Box > My Space menus, in that order.
Select a databox created, and click [View server details].
Click the [SSL VPN user settings] button under the Infrastructure tab.
After checking the number of users that can be registered, enter the username, password, email, and SMS for the authentication, and then click the [Add] button.
Once the user setting is completed, click the [Close] button.

Note

For more details about how to change the number of SSL VPN user accounts, delete accounts, or change passwords, refer to Databox management.

2. Check infrastructure service connection details

This is the step for checking the connection details of databox infrastructure services. The following describes how to check the connection details of infrastructure services.

From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Cloud Data Box > My Space menus, in that order.
Check the databox created, and click [View server details].
Check the IP and ID for each product under the Infrastructure tab.
- Click the for Cloud Hadoop and TensorFlow to see the details.

3. Access SSL VPN and Connect Server

To connect to the databox infrastructure service, you must first access Connect Server through SSL VPN.

Caution

Running SSL VPN agent while other VPN connections are active may result in a conflict. Make sure to close other VPNs completely before running SSL VPN agent.

The following describes how to establish SSL VPN connection and access Connect Server.

Install the SSL VPN agent.
- For more details about how to install SSL VPN agent, refer to Install SSL VPN agent in the SSL VPN guide (VPC).
Run BIG-IP Edge Client.
- For more details about connecting to BIG-IP Edge Client, refer to Connect to SSL VPN agent in the SSL VPN guide (VPC).
- Server address: https://sslvpn-kr-vpc-01.ncloud.com
Enter the username and password registered in 1. SSL VPN user settings and click the [Login] button.
Enter the OTP verification code for the password sent through mobile phone or email, and then click the [Login] button.
Connect Server is Windows Server. To connect to Connect Server, you must run Remote desktop connection on your PC, enter the IP of Connect Server, click the [Connect] button, and then enter the username and password.
- If you have forgotten the password for Connect Server, Ncloud TensorFlow Server, or Hadoop cluster, or if you've received a password reset notification email, then change your password by clicking the [Reset password] button from the Cloud Data Box > My Space > Details page.

Note

After requesting data supply, all external network communication is blocked and you can't install modules using commands such as pip install. You must download the installation file manually, and import it to the databox through by requesting "file import" to install the module. Therefore, it is recommended that you install all required modules when you're writing the analysis code using sample data before performing "data supply request."

4. Cloud Hadoop server connection and usage

You can connect to a Hadoop cluster using the PuTTY program or the Chrome browser installed on Connect Server.

Note

Cloud Hadoop is a service that helps users manage clusters directly by providing the management tools (Ambari) and direct server access permissions. This guide only provides details about how to connect to a Hadoop cluster. For more details about how to use Cloud Hadoop, refer to Cloud Hadoop Guide.

PEM key conversion

The provided PEM file must be converted to a PPK file to connect to a Hadoop cluster node using PuTTY.
The following describes how to convert a PEM file to a PPK file.

In the Search Windows at the bottom of Connect Server screen, enter puttygen and execute PuTTY Key Generator.
Check that RSA is selected in the Type of key to generate field, and then click the [Load] button.
Select "All Files (.)," and select the PEM file provided in "C:\Users\Public\Desktop."
Click the [Save private key] button.
- The key will be saved in a format usable by PuTTY.

Note

If the PEM file is not visible on Connect Server, then check Hidden items in the View menu of File Explorer to make hidden files visible.

Connect to cluster nodes through SSH

The following describes how to connect to Hadoop edge node through SSH using the PPK file.

Run PuTTY on Connect Server and enter the connection details.
- Host name: sshuser@Hadoop edge node IP
- Port : 22
- Connection type : SSH
Note
The Hadoop edge node IP can be found under the infrastructure information in NAVER Cloud Platform's console.
Click Connection > SSH > Auth in this order, then click the [Browser] button to select the PPK file.
- For more details about how to generate a PPK file, refer to PEM key conversion.
To connect to a Hadoop cluster edge node, click the [Open] button.
To check all nodes, log in to the Hadoop cluster and enter the command shown below.
```
$ cat /etc/hosts 
```
- Edge node: starts from e-001
- Master node: starts from m-001, m-002
- Worker node: starts from d-001 to the number of worker nodes created
To connect to a master node or a worker node, enter the following command after connecting to the edge node. For master and worker node names, you can enter up to m-00# or d-00# (# is a sequential number), and press the tab key to auto-complete the rest. Enter yes when prompted to confirm connection, then you can connect to another node from the edge node.
To connect to yet another node, come back to the edge node with exit before connecting to another node.
- m-001-xxx is the node name searched above
```
$ ssh sshuser@m-001-xxx 
$ ...
$ exit
```

Check Hadoop data storage

You can see that the NAS requested for file import/export is mounted after requesting the databox. After requesting data supply, you can see that the requested data's NAS is mounted as read-only.

NAS for requesting file import/export: /mnt/nasw# (# is a number)
The NAS you requested when creating the databox is mounted under the /mnt directory of the edge node. You can check by connecting to Hadoop edge node through SSH as shown below.
```
$ df -h
```
Note
The requested NAS is mounted to both the Hadoop cluster and Ncloud TensorFlow Server so that you can share data between the two servers. When a file is uploaded to the customer's bucket and file import is requested, the file is saved on this NAS. If you upload file to this NAS and request file export, then it is reviewed and stored in the customer's Object Storage bucket.
Sample data: /user/ncp/sample
Sample data is uploaded to HDFS shown below after the databox is created.
```
$ hdfs dfs -ls /user/ncp/sample
```
Provided data NAS: /mnt/xx (xx is the data's mount directory)
NAVER Search, Shopping, and AI data provided is mounted as read-only under the /mnt directory after the data supply. You can check by connecting to Hadoop edge node through SSH as shown below.
```
$ df -h
```
Note
- Upload necessary data among requested data manually to Hadoop. For more details, refer to Upload provided data to Hadoop cluster.
- Before uploading new data, make sure you have enough space in Hadoop storage. If you don't have enough space, then delete any unnecessary data.

Web UI connection using tunneling

While Ambari, HUE, Zeppelin Notebook, etc., can be accessed from the web UI using IP and ports, certain services such as HDFS NameNode can only be accessed from the web UI via tunneling. Services other than Ambari, HUE, and Zeppelin Notebook must be accessed via tunneling using a web browser with tunnel settings applied.

You can access the web UI through tunneling as follows.

Run PuTTY on Connect Server and enter the connection details.
- Host name: sshuser@Hadoop edge node IP
- Port: 22
- Connection type: SSH
Note
Hadoop edge node IP can be found under the [Infrastructure] tab that appears when you click the [Details] button of the databox.
Under Category on the left, click Connection > SSH > Auth, in that order.
Click the [Browser] button, and select the PPK file.
- For more details about how to generate a PPK file, refer to 1. PEM key conversion.
Under Category on the left, click Connection > SSH > Tunnels, in that order.
Enter 9876 under Source port, select Dynamic, and then click the [Add] button.
Under Category on the left, select Session, enter an identifiable name under Saved sessions, and then click the [Save] button.
- The settings will be saved.
- Once the settings are saved, you can load the saved information to connect.
Connect to the Hadoop cluster by clicking the [Open] button.
In the desktop of Connect Server, double-click the Chrome-Tunnel icon.
- Chrome-Tunnel is basically a Chrome shortcut with --proxy-server="socks5://127.0.0.1:9876" added to the end of the target in shortcut properties as shown below.
  "C:\Program Files\Google\Chrome\Application\chrome.exe" --proxy-server="socks5://127.0.0.1:9876"
Enter the address and connect to Ambari.
- https://Hadoop edge node IP:8443
- id: ncp
- pw: the password entered when creating the databox
- Make sure to connect through https. Initial connection may take a while to establish.
If you're prompted by a warning message when connecting for the first time, then select Advanced and Proceed to ....
After connecting to Ambari, you can access the web UI of other services such as HDFS NameNode using the Quick links of the service.

5. Ncloud TensorFlow Server connection and usage

You can connect to TensorFlow CPU and TensorFlow GPU servers using PuTTY or Chrome web browser from Connect Server.

Connect to Jupyter Notebook

The following describes how to connect to Jupyter Notebook using Chrome browser from Connect Server.

In the desktop of Connect Server, double-click the Chrome icon.
Enter the address and connect to Jupyter Notebook.
- http://server IP:18888
- pw: the password entered when creating the databox
- Make sure to connect through https. Initial connection may take a while to establish.

Note

The Ncloud TensorFlow Server IP can be found under the [Infrastructure] tab that appears when you click the [Details] button of the databox.

Check Ncloud TensorFlow Server data storage

You can check the data storage provided in the Home screen by logging into Jupyter Notebook using Chrome browser.
databox-connect-11_ko

Item	Description
① Added block storage	/home/ncp/workspace/blocks 2 TB of additional block storage is provided on the Ncloud TensorFlow Server by default. Saving the frequently used data from the provided NAS data on the block storage is recommended for enhanced performance.
② For requesting file import/export	nas: /home/ncp/workspace/nasw# (# is a number) NAS used to transfer data when requesting file import or export. The NAS requested when creating the databox is also mounted to the Hadoop cluster for instant data sharing.
③ Sample data	/home/ncp/workspace/sample Location where sample data is provided when the databox is created. Sample data can be used to install required modules and create the necessary analysis environment
④ Provided data (read-only)	/home/ncp/workspace/xx (xx is the mount directory of the data) NAVER Search, Shopping, and AI data provided after the data supply request is provided mounted to the home directory of Jupyter. Requested data on NAS is read-only

Note

"/home/ncp/workspace" is Jupyter Notebook's home directory.

Connect to server through SSH

The following describes how to connect to the Ncloud TensorFlow Server.

Run PuTTY on Connect Server and enter the connection details.
- Host name: root@server IP
- Port : 22
- Connection type : SSH
Click the [Open] button.

Note

The Ncloud TensorFlow Server IP can be found under the [Infrastructure] tab that appears when you click the [Details] button of the databox.

Restart Ncloud TensorFlow Server Docker

When TensorFlow Docker must be restarted, enter the following command to restart, and then restart Jupyter Notebook.

Restart TensorFlow CPU
```
docker restart tf-server-mkl  
```
Restart TensorFlow GPU
```
docker restart tf-server-gpu
```

Restart Jupyter Notebook

When Jupyter Notebook must be restarted, enter the following command.

jup restart or
jup start after running jup stop

Was this article helpful?

What's Next

Searching sample data

Table of contents

1. SSL VPN user settings
2. Check infrastructure service connection details
3. Access SSL VPN and Connect Server
4. Cloud Hadoop server connection and usage
5. Ncloud TensorFlow Server connection and usage