Available in VPC
1. Access Box
The analyst needs to access the Box via the received Box information to analyze the target data. A guide on how to access the Box and how to access the analysis environment within the Box is informed.
The analyst needs an SSL VPN connection before accessing Box. For more information on SSL VPN, see Configure SSL-VPN.
Access Connect Server
To use Cloud Hadoop and Tensor Flow installed in Box, access to Connect Server is required. Therefore, this guide describes how to access Connect Server.
- As Connect Server is Windows Server, you must run remote desktop access on your PC to access Connect Server. Enter the Connect Server's IP address, click the [Connect] button, and enter the user name and the password.
- If you forgot your Connect Server password or an invalid password has been entered 5 times, the administrator can reset the password from the SSL VPN user settings page.
Check data storage
All Connect Servers have NAS mounted on the identical drive, and the drive location has the rules as follows:
- NAS for requesting file import/export: mounted onto F, E, B, and A drives in the mounted order.
- NAS provided data: mounted with read-only permission onto Z-G drives in the mounted order.
Cautions when using Connect Server NAS
NAS is currently used with limits in Connect Server. Thoroughly review the following items before use:
Mismatch of character encoding method
NAS provided by Databox Frame uses an NFS protocol usable for Linux servers. The following issues can occur from the difference between the encoding method used by Linux and Windows Operating System (OS):
- When the file name is created using Korean characters, the file name from each OS appears inaccurately.
- When a file is created using Korean characters in NAS from Windows, the file name appears in an unknown format in Windows and Linux.
- When a file is created using Korean characters in NAS from Linux, the file name appears in an unknown format in Windows.
- When the file name is in Korean when importing files, the file name appears in an unknown format in Windows.
- When a file is exported, the file name that appears in Linux is used as the standard. When the name of the file created in Windows is in Korean, it appears in an unknown format in Linux so export is impossible.
- The new line difference between Windows and Linux can cause each file content to appear differently.
Actions usable for files and folders
Actions usable for files and folders of NAS in Connect Server are as follows:
File
| GUI (file explorer) | Command prompt | |
|---|---|---|
| Creation | O (unable to specify name) | O |
| Read | O | O |
| Copy | O | O |
| Move | O | O |
| Delete | O | O |
| Edit name | X | O |
| Edit file content | O | - |
Folder
| GUI (file explorer) | Command prompt | |
|---|---|---|
| Creation | O (unable to specify name) | O |
| Copy | O | O |
| Move | O | X |
| Delete | O | O |
| Edit name | X | X |
Use and access Cloud Hadoop
You can access Hadoop cluster through Chrome browser or PuTTY program installed on the Connect Server.
Cloud Hadoop provides access permission directly to the server and management tool (Ambari) to the user, aiding the user to directly manage the cluster. This guide informs you on how to access Hadoop cluster only. For more information on using Cloud Hadoop, see Cloud Hadoop user guides.
PEM key conversion
To access Hadoop cluster node using PuTTY, you must convert the provided PEM file to PPK format.
To convert a PEM file to PPK format, follow these steps:
- Enter puttygen in Search Windows at the bottom of the screen in Connect Server, and run PuTTY Key Generator.
- Check if "RSA" is selected in "Type of key to generate" and click the [Load] button.
- Select "All Files(.)" and select the provided PEM file in "C:\Users\Public\Desktop."
- Click the [Save private key] button.
- The key is then saved in a format accessible in PuTTY.
If the PEM file cannot be seen in Connect Server, select Hidden items of the View menu in File Explorer to see hidden files.
Access cluster node through SSH
To access Hadoop edge node through SSH using a PPK file, follow these steps:
- Run PuTTY in the Connect Server and enter the access credentials.
- Host Name: sshuser@HadoopEdgeNodeIP
- Port : 22
- Connection type : SSH
You can view the Hadoop edge node's IP in the Infrastructure information section on the NAVER Cloud Platform console.
-
Click Connection > SSH > Auth in order. Then, click the [Browser] button and select the PPK file.
- For more information on how to create a PPK file, see PEM key conversion.
-
To access Hadoop cluster edge node, click the [Open] button.
-
To see the entire nodes, access Hadoop cluster and enter the following commands:
$ cat /etc/hosts- Edge node: starts with e-001.
- Master node: starts with m-001, m-002.
- Worker node: from d-001 to the number of created worker node
-
To access a master node or a worker node, access the edge node and enter the following commands. For the name of the master node and worker node, enter m-00# or d-00# (# means the order) and enter the tab key to complete automatically. Enter yes in access check to access a different node from the edge node.
To access another node, enter exit. Once you have returned to the edge node, you can access another node.
- m-001-xxx is the name of the viewed node above.
$ ssh sshuser@m-001-xxx
$ ...
$ exit
Check data storage
You can check the requested NAS being mounted for use when importing and exporting files within Box, and you can check the NAS of requested data being mounted as Read-only after completing the request for data supply.
-
NAS used for requesting file import/export: /mnt/nasw*
- NAS used for requesting file import/export is mounted on the /mnt sub directory of the edge node. Access Hadoop edge node through SSH and check the following:
$ df -h -
Provided data NAS: /mnt/nasr/pub*
- Data provided by Data Box Frame is mounted on the /mnt sub directory as Read-only after data supply. Access Hadoop edge node through SSH and check the following:
$ df -h
Use and access Tensor Flow
You can access the TensorFlow CPU and TensorFlow GPU server through Chrome browser or PuTTY from the Connect Server.
Access Jupyter Notebook
To access Jupyter Notebook through Chrome browser from the Connect Server, follow these steps:
- Double-click the Chrome icon on the background of Connect Server.
- Enter the following address and access Jupyter Notebook.
- http://ServerIP:18888
- pw: password entered when creating the data Box
- You must use the HTTP to access it. It may take some time to load when accessing it for the first time.
Access the server through SSH
To access Ncloud TensorFlow Server, follow these steps:
- Run PuTTY in the Connect Server and enter the access credentials.
- Host Name: root@Server IP
- Port : 22
- Connection type : SSH
- Click [Open].
Restart Ncloud TensorFlow Server docker
In case of restarting TensorFlow docker, enter the following commands to restart, and then restart Jupyter Notebook:
- Restart TensorFlow CPU
docker restart tf-server-mkl - Restart TensorFlow GPU
docker restart tf-server-gpu
Restart Jupyter Notebook
To restart Jupyter Notebook, enter the following commands:
jup restart or
run jup stop, and then jup start
2. Shared data analysis
You can immediately analyze the existing data in the shared NAS provided by Data Box Frame administrator or use it by loading it onto HDFS. The shared data is located in the following directory:
- /mnt/nasr/pub*
- If the shared data does not exist, request a confirmation on changing the shared data view status to the Data Box Frame administrator.
3. Save analysis results
You can export the results from the internal Box data externally through the separate storage. The separate storage is located in the following directory: