Data analysis of each Box

release/20240425
English

Data analysis of each Box

Article Summary

Share feedback

Thanks for sharing your feedback!

Available in VPC

1. Access Box

The analyst needs to access the Box via the received Box information to analyze the target data. A guide on how to access the Box and how to access the analysis environment within the Box is informed.

Note

The analyst needs SSL VPN connection before accessing the Box. For more information of SSL VPN, see SSL-VPN settings.

Access Connect Server

To use Cloud Hadoop and Tensor Flow installed in the Box, access to Connect Server is required. Therefore, a guide on how to access Connect Server is informed.

Execute PC desktop connection remotely to access the Connect Server which is a Windows Server, enter the Connect Server IP and click the [Connect] button, and enter the user name and password.
If you forgot your password or an invalid password has been entered 5 times, the administrator can reset the password of the Connect Server from the SSL VPN user setting page.

Check data storage

All of the Connect Server has the NAS mounted on the identical drive, and drive location has the rules as follows:

NAS for requesting file import/export: mounted onto F, E, B, and A drive in the mounted order.
NAS provided data: mounted with read-only permission onto Z-G drive in the mounted order.

Cautions when using Connect Server NAS

Caution

NAS is currently used with limits in the Connect Server. Thoroughly review the items below before use.

Mismatch of encoding character method

NAS provided by Databox Frame uses a NFS protocol usable for Linux Server. The following issue can occur from the difference between encoding method used by Linux and Windows Operating System (OS):

When the file name is created using Korean characters, the file name from each OS appears inaccurately.
- When a file is created using Korean characters in NAS from Windows, the file name appears in an unknown format in Windows and Linux.
- When a file is created using Korean characters in NAS from Linux, the file name appears in an unknown format in Windows.
When the file name is in Korean when importing files, the file name appears in an unknown format in Windows.
When a file is exported the file name that appears in Linux is used as the standard. When the name of the created file is in Korean in Windows, it appears as an unknown format in Linux so export is impossible.
The new line difference between Windows and Linux can cause each file description to appear differently.

Actions usable for file and folder

Actions usable for file and folder of NAS in Connect Server are as follows:

File

	GUI (file explorer)	Commands prompt
Creation	O (unable to specify name)	O
Read	O	O
Copy	O	O
Move	O	O
Delete	O	O
Edit name	X	O
Edit file description	O	-

Folder

	GUI (file explorer)	Commands prompt
Creation	O (unable to specify name)	O
Copy	O	O
Move	O	X
Delete	O	O
Edit name	X	X

Using and accessing Cloud Hadoop

You can access Hadoop cluster via Chrome browser or Putty program installed on Connect server.

Note

Cloud Hadoop provides access permission directly to the server and management tool (Ambari) to the user, aiding the user to directly manage the cluster. This guide informs you on how to access Hadoop cluster, and for more information on using Cloud Hadoop, see Cloud Hadoop user guide.

Pem key conversion

Convert the provided pem file to ppk file to access Hadoop cluster node using PuTTY.
To convert pem file to ppk file, follow these steps:

Enter puttygen in Search Windows at the bottom of the screen in Connect Server, and execute PuTTY Key Generator.
Check if "RSA" is selected in "Type of key to generate" and click the [Load] button.
Select "All Files(.)" and select the provided pem file in "C:\Users\Public\Desktop".
Click the [Save private key] button.
The key is saved in a usable format in PuTTY.

Note

If the pem file cannot be seen in Connect Server, select Hidden items of the View menu in File Explorer to see hidden files.

Access SSH in cluster node

To access Hadoop edge node via SSH using ppk file, follow these steps:

Execute PuTTY in Connect server and enter the access information.
- Host Name: sshuser@HadoopedgenodeIP
- Port : 22
- Connection type : SSH

Note

Hadoop edge node IP can check the infrastructure information on the NAVER Cloud Platform console.

Click Connection > SSH > Auth in order and select the ppk file after clicking the [Browser] button.
- For more information on how to create ppk file, see pem key conversion.
To access Hadoop cluster edge node click the [Open] button.
To check the entire node, access Hadoop cluster and enter the following commands:
```
$ cat /etc/hosts 
```
- Edge node: start with e-001
- Master node: start with m-001, m-002
- Worker node: from d-001 to the number of created worker node
Enter the following commands after accessing edge node to access master node or worker node. For the name of master node and worker node enter m-00# or d-00# (# means the order) and enter the tab key to complete automatically. Enter "yes" in access check to access a different node from edge node.
To access another different node enter "exit" to return to edge node and access a different node.

m-001-xxx is the name of the searched node above

$ ssh sshuser@m-001-xxx 
$ ...
$ exit

Check data storage

You can check the requested NAS being mounted for use when importing and exporting file within the Box, and you can check the NAS of requested data being mounted as Read-only after completing the request for data supply.

NAS used for requesting file import/export: /mnt/nasw*
- NAS used for requesting file import/export is mounted on the /mnt sub directory of edge node. Access Hadoop edge node via SSH and check the following:
```
$ df -h
```
Provided data NAS: /mnt/nasr/pub*
- Data provided by Data Box Frame is mounted on the /mnt sub directory as Read-only after data supply. Access Hadoop edge node via SSH and check the following:
```
$ df -h
```

Web UI access using tunneling

Web UI can be accessed by using IP and port such as Ambari, Hue, and Zeppelin Notebook, but certain services such as HDFS NameNode can only access web UI via tunneling. To access services other than Ambari, Hue, and Zeppelin Notebook, you have to access a web browser with a complete tunneling setting after tunneling.

To access web UI using tunneling, follow these steps:

Execute PuTTY in Connect server and enter the access information.
- Host Name: sshuser@HadoopedgenodeIP
- Port : 22
- Connection type: SSH
Click Connection > SSH > Auth in order from the Category in the left side of the PuTTY screen.
Click the [Browser] button and select the ppk file.
- For more information on how to create ppk file, see pem key conversion.
Click Connection > SSH > Tunnels in order from the Category in the left side of the screen.
Enter 9876 in source port and click the [Add] button after selecting Dynamic.
Select Session from the Category in the left side of the screen, enter a name to identify the Saved Session, and click the [Save] button.
- The information setting is saved and you can access via saved information.
Click the [Open] button to access Hadoop cluster.
Double-click the Chrome-Tunnel icon on the background of the Connect Server.
- Chrome-Tunnel adds --proxy-server="socks5://127.0.0.1:9876" at the last target of the Chrome shortcut Properties, as shown below:
  "C:\Program Files\Google\Chrome\Application\chrome.exe" --proxy-server="socks5://127.0.0.1:9876"
Enter the address and access Ambari.
- https://HadoopedgenodeIP:8443
- ID : ncp
- PW: password entered when creating Box
- You must access through HTTPS. It may take some time to access when accessing for the first time.
Select "Advanced and Proceed to..." when a warning message appears during the first access.
After accessing Ambari, you can access different service web UI such as HDFS NameNode via service Quick Links.

Using and accessing Tensor Flow

You can access the TensorFlow CPU and TensorFlow GPU server via Chrome browser or PuTTY from the Connect Server.

Access Jupyter Notebook

To access Jupyter Notebook via Chrome browser from the Connect Server, follow these steps:

Double-click the Chrome icon on the background of the Connect Server.
Enter the address and access Jupyter Notebook.
- http://ServerIP:18888
- pw: password entered when creating data Box
- You must access through HTTP. It may take some time to access when accessing for the first time.

Accessing server SSH

To access Ncloud TensorFlow Server, follow these steps:

Execute PuTTY in Connect server and enter the access information.
- Host Name: root@ServerIP
- Port : 22
- Connection type : SSH
Click the [Open] button.

Restart Ncloud TensorFlow Server docker

In case of restarting TensorFlow docker, enter the following commands to restart and restart Jupyter notebook:

Restart TensorFlow CPU
```
docker restart tf-server-mkl  
```
Restart TensorFlow GPU
```
docker restart tf-server-gpu
```

Restart Jupyter Notebook

In case of restarting Jupyter notebook, enter the following commands:

jup restart or
Execute jup stop, then jup start

2. Shared data analysis

Shared NAS provided by Data Box Frame administrator immediately analyzes the existing data or it can be used by loading onto HDFS. The shared data is located at the directory shown below:

/mnt/nasr/pub*

Note

If the shared data does not exist, request a confirmation on changing the shared data view status to the Data Box Frame administrator.

3. Save analysis results

You can export the results from the internal Box data externally via separate storage. The separate storage is located at the directory shown below:

/mnt/nasw*

Was this article helpful?

What's Next

Importing files for each Box

Table of contents

1. Access Box
2. Shared data analysis
3. Save analysis results