Data analysis of each box

Prev Next

Available in VPC

1. Access Box

The analyst needs to access the Box via the received Box information to analyze the target data. A guide on how to access the Box and how to access the analysis environment within the Box is informed.

Note

The analyst needs an SSL VPN connection before accessing Box. For more information on SSL VPN, see Configure SSL-VPN.

Access Connect Server

To use Cloud Hadoop and Tensor Flow installed in Box, access to Connect Server is required. Therefore, this guide describes how to access Connect Server.

  1. As Connect Server is Windows Server, you must run remote desktop access on your PC to access Connect Server. Enter the Connect Server's IP address, click the [Connect] button, and enter the user name and the password.
  2. If you forgot your Connect Server password or an invalid password has been entered 5 times, the administrator can reset the password from the SSL VPN user settings page.

Check data storage

All Connect Servers have NAS mounted on the identical drive, and the drive location has the rules as follows:

  • NAS for requesting file import/export: mounted onto F, E, B, and A drives in the mounted order.
  • NAS provided data: mounted with read-only permission onto Z-G drives in the mounted order.

Cautions when using Connect Server NAS

Caution

NAS is currently used with limits in Connect Server. Thoroughly review the following items before use:

Mismatch of character encoding method

NAS provided by Databox Frame uses an NFS protocol usable for Linux servers. The following issues can occur from the difference between the encoding method used by Linux and Windows Operating System (OS):

  • When the file name is created using Korean characters, the file name from each OS appears inaccurately.
    • When a file is created using Korean characters in NAS from Windows, the file name appears in an unknown format in Windows and Linux.
    • When a file is created using Korean characters in NAS from Linux, the file name appears in an unknown format in Windows.
  • When the file name is in Korean when importing files, the file name appears in an unknown format in Windows.
  • When a file is exported, the file name that appears in Linux is used as the standard. When the name of the file created in Windows is in Korean, it appears in an unknown format in Linux so export is impossible.
  • The new line difference between Windows and Linux can cause each file content to appear differently.

Actions usable for files and folders

Actions usable for files and folders of NAS in Connect Server are as follows:

File

GUI (file explorer) Command prompt
Creation O (unable to specify name) O
Read O O
Copy O O
Move O O
Delete O O
Edit name X O
Edit file content O -

Folder

GUI (file explorer) Command prompt
Creation O (unable to specify name) O
Copy O O
Move O X
Delete O O
Edit name X X

Use and access Cloud Hadoop

You can access Hadoop cluster through Chrome browser or PuTTY program installed on the Connect Server.

Note

Cloud Hadoop provides access permission directly to the server and management tool (Ambari) to the user, aiding the user to directly manage the cluster. This guide informs you on how to access Hadoop cluster only. For more information on using Cloud Hadoop, see Cloud Hadoop user guides.

PEM key conversion

To access Hadoop cluster node using PuTTY, you must convert the provided PEM file to PPK format.
To convert a PEM file to PPK format, follow these steps:

  1. Enter puttygen in Search Windows at the bottom of the screen in Connect Server, and run PuTTY Key Generator.
  2. Check if "RSA" is selected in "Type of key to generate" and click the [Load] button.
  3. Select "All Files(.)" and select the provided PEM file in "C:\Users\Public\Desktop."
  4. Click the [Save private key] button.
  5. The key is then saved in a format accessible in PuTTY.
Note

If the PEM file cannot be seen in Connect Server, select Hidden items of the View menu in File Explorer to see hidden files.

Access cluster node through SSH

To access Hadoop edge node through SSH using a PPK file, follow these steps:

  • Run PuTTY in the Connect Server and enter the access credentials.
    • Host Name: sshuser@HadoopEdgeNodeIP
    • Port : 22
    • Connection type : SSH
Note

You can view the Hadoop edge node's IP in the Infrastructure information section on the NAVER Cloud Platform console.

  • Click Connection > SSH > Auth in order. Then, click the [Browser] button and select the PPK file.

    • For more information on how to create a PPK file, see PEM key conversion.
  • To access Hadoop cluster edge node, click the [Open] button.

  • To see the entire nodes, access Hadoop cluster and enter the following commands:

    $ cat /etc/hosts 
    
    • Edge node: starts with e-001.
    • Master node: starts with m-001, m-002.
    • Worker node: from d-001 to the number of created worker node
  • To access a master node or a worker node, access the edge node and enter the following commands. For the name of the master node and worker node, enter m-00# or d-00# (# means the order) and enter the tab key to complete automatically. Enter yes in access check to access a different node from the edge node.
    To access another node, enter exit. Once you have returned to the edge node, you can access another node.

  1. m-001-xxx is the name of the viewed node above.
$ ssh sshuser@m-001-xxx 
$ ...
$ exit

Check data storage

You can check the requested NAS being mounted for use when importing and exporting files within Box, and you can check the NAS of requested data being mounted as Read-only after completing the request for data supply.

  • NAS used for requesting file import/export: /mnt/nasw*

    • NAS used for requesting file import/export is mounted on the /mnt sub directory of the edge node. Access Hadoop edge node through SSH and check the following:
    $ df -h
    
  • Provided data NAS: /mnt/nasr/pub*

    • Data provided by Data Box Frame is mounted on the /mnt sub directory as Read-only after data supply. Access Hadoop edge node through SSH and check the following:
    $ df -h
    

Use and access Tensor Flow

You can access the TensorFlow CPU and TensorFlow GPU server through Chrome browser or PuTTY from the Connect Server.

Access Jupyter Notebook

To access Jupyter Notebook through Chrome browser from the Connect Server, follow these steps:

  1. Double-click the Chrome icon on the background of Connect Server.
  2. Enter the following address and access Jupyter Notebook.
    • http://ServerIP:18888
    • pw: password entered when creating the data Box
    • You must use the HTTP to access it. It may take some time to load when accessing it for the first time.

Access the server through SSH

To access Ncloud TensorFlow Server, follow these steps:

  1. Run PuTTY in the Connect Server and enter the access credentials.
    • Host Name: root@Server IP
    • Port : 22
    • Connection type : SSH
  2. Click [Open].

Restart Ncloud TensorFlow Server docker

In case of restarting TensorFlow docker, enter the following commands to restart, and then restart Jupyter Notebook:

  • Restart TensorFlow CPU
    docker restart tf-server-mkl  
    
  • Restart TensorFlow GPU
    docker restart tf-server-gpu
    

Restart Jupyter Notebook

To restart Jupyter Notebook, enter the following commands:

jup restart or
run jup stop, and then jup start

2. Shared data analysis

You can immediately analyze the existing data in the shared NAS provided by Data Box Frame administrator or use it by loading it onto HDFS. The shared data is located in the following directory:

  • /mnt/nasr/pub*
Note
  • If the shared data does not exist, request a confirmation on changing the shared data view status to the Data Box Frame administrator.

3. Save analysis results

You can export the results from the internal Box data externally through the separate storage. The separate storage is located in the following directory: