Available in VPC
This section describes how to request data supply, how to request additional data you overlooked when creating your data box, and how to subscribe to Insight Option and receive the latest data for the duration of your contract period.
Request data supply
To receive the data you applied for after your data box's access settings have been configured, submit a data supply request. Once the data supply request has been made, all external network connections will be blocked and the data you applied for will be supplied.
To request data supply, follow these steps:
- In the VPC environment of the NAVER Cloud Platform console, navigate to
> Services > Big Data & Analytics > Cloud Data Box > My Space in order. - Select the created data box and click the [Request data supply] button.

- When a data supply request window appears, enter the name of the data box and click the [OK] button.
- The data supply process takes 5 to 10 minutes. Once it is completed, the Data box status will change from Data supply requested to Data supply completed.
- The data list does not show the data provided by default. To see the detailed status of the supplied data, go to [View server details] > [Data] of the box.
- Once a data supply request has been made, all external network connections are blocked. This action is irreversible.
- Making a data supply request restarts the TensorFlow docker and the Jupyter Notebook. If you have a task in progress, complete it before submitting the request.
- If you submit a data supply request, subscribe to Insight Option, or request additional data after the data supply process has been completed, your connection to the SSL VPN and other servers will be lost. You will have to re-establish the connections after the data supply process is completed.
Add data
You can add the newest half-yearly data. While the data supply is in progress, you cannot submit a request for additional data. You can only do so after the data supply process has been completed.
To add the latest half-yearly data, follow these steps:
- In the VPC environment of the NAVER Cloud Platform console, navigate to
> Services > Big Data & Analytics > Cloud Data Box > My Space in order. - Click the [View server details] on the data box you want to add data to.

- Go to the [Data] tab and click the [Add] button.

- Select the data to be added and click the [OK] button.
- If you add data when your data box's status says Data supply completed, the additional data supply process takes 5-10 minutes. Once it is completed, the Data supply status will be changed to Data view ready.
If you add data when your data box's status says Infrastructure creation completed, you must request data supply to receive the data.
Insight Option
Insight Option offers the latest data dating back from to 2 years to the previous month for the 12-month duration of their contract. If you subscribe to Insight Option and return your data box within 12 months of the subscription date, a penalty will be incurred. To access Insight Option data, communication with external networks must be blocked. As such, a data supply request should be submitted beforehand.
Subscribe to Insight Option
To subscribe to Insight Option, follow these steps:
- In the VPC environment of the NAVER Cloud Platform console, navigate to
> Services > Big Data & Analytics > Cloud Data Box > My Space in order. - Check if the data supply request has been submitted properly.
- The proper submission is necessary for the [Upgrade] button to be enabled.
- Select the created data box.
- Click [Upgrade] > [Subscribe to Insight Option] in order.

- When the Insight Option subscription window appears, review the provided data standards and the penalty notice and click the [Subscribe to Insight Option] button.
- Read the TensorFlow docker and Jupyter Notebook restart prompt and click the [OK] button.
- The Insight Option data supply process takes 5 to 10 minutes.
- Once it is completed, the Data box status will change to Data supply completed.
- Check that the Insight Option data is being serviced to you mounted on the Ncloud TensorFlow Server and the Hadoop node. Then, access the Ncloud TensorFlow Server and restart its docker and Jupyter Notebook. You must restart the docker and Jupyter Notebook to view the data in each directory on Jupyter Notebook.
- Restart TensorFlow CPU
docker restart tf-server-mkl - Restart TensorFlow GPU
docker restart tf-server-gpu - Restart Jupyter Notebook
jup restart or run jup stop, and then jup start
- Restart TensorFlow CPU
Insight Pro Option
Subscribers to Insight Option can subscribe to Insight Pro Option and receive additional user group unit data of searches and shopping. To subscribe, you need to obtain the necessary permissions by contacting Sales inquiries. Insight Pro Option, just like Insight Option, offers the latest data dating back from to 2 years to the previous month. If you subscribe to Insight Pro Option and cancel your Insight Pro Option subscription within 12 months of your Insight Option subscription date, a penalty will be incurred.
Subscribe to Insight Pro Option
To subscribe to Insight Pro Option, follow these steps:
- In the VPC environment of the NAVER Cloud Platform console, navigate to
> Services > Big Data & Analytics > Cloud Data Box > My Space in order. - Select the created data box.
- Click [Upgrade] > [Subscribe to Insight Pro Option] in order.

- Review the penalty notice and click the [Subscribe to Insight Pro Option] button.
- Only Insight Option subscribers can subscribe to Insight Pro Option.
- If you cancel your Insight Pro Option subscription within the duration of the Insight Option contract, a penalty will be incurred.
- If you do not have the necessary permissions for Insight Pro Option, click [Sales inquiries] to submit a ticket.
- If you have the necessary permissions for Insight Pro Option, the Insight Pro Option subscription window will appear. Select the Insight Pro Option you want and click the [Subscribe] button.
- Read the anonymized data protection pledge, select the checkbox to agree to the pledge, and click the [OK] button.
- The Insight Pro Option data supply process takes 5-10 minutes.
- Once it is completed, the Data box status will change to Data supply completed.
- Check that the Insight Option data is being serviced to you mounted on the Ncloud TensorFlow Server and the Hadoop node. Then, access the Ncloud TensorFlow Server and restart its docker and Jupyter Notebook. You must restart the docker and Jupyter Notebook to view the data in each directory on Jupyter Notebook.
- Restart TensorFlow CPU
docker restart tf-server-mkl - Restart TensorFlow GPU
docker restart tf-server-gpu - Restart Jupyter Notebook
jup restart or run jup stop, and then jup start
- Restart TensorFlow CPU
View Insight Pro Option penalty
To view the penalty for canceling Insight Pro Option subscription, follow these steps:
- In the VPC environment of the NAVER Cloud Platform console, navigate to
> Services > Big Data & Analytics > Cloud Data Box > My Space in order. - Select the data box you want to view the Pro Option cancellation penalty.
- Click [Unsubscribe from Pro Option and view penalty] > [View penalty] in order and see the estimated penalty.
Unsubscribe from Insight Pro Option
To unsubscribe from Insight Pro Option, follow these steps:
- In the VPC environment of the NAVER Cloud Platform console, navigate to
> Services > Big Data & Analytics > Cloud Data Box > My Space in order. - Select the data box for which you want to unsubscribe from Pro Option.
- Click [Unsubscribe from Insight Pro Option and view penalty] > [Unsubscribe from Insight Pro Option] in order.
- Read the Pro Option early cancellation penalty notice and review the estimated penalty. If you still want to proceed, indicate your agreement and click the [Proceed with Option unsubscription] button.
Upload the provided data to Hadoop cluster
To access the default data you applied to, Insight Option data, or Insight Pro Option data, you must upload them on Hadoop cluster.
Make sure that there is enough space on Hadoop before you upload the data on Hadoop cluster.
The following is a quickstart for access to "shopping25y1h."
-
Access the Cloud Hadoop edge node using PuTTy and create a directory in the cluster. (
shopping25y1his an example.)$ hdfs dfs -mkdir -p /user/ncp/shopping25y1h/shopping // shopping directory in hdfs -
Upload the data to Hadoop using the put command.
- Hadoop cluster's name takes the form of
hadoop-000-000. You can see it from the NCP console or the Hadoop node's name that you're accessing. - Make sure to upload the data directory only as uploading it with the .snapshot directory in the subdirectory of the requested data volume (/mnt/shopping20y1h/ in this example) can cause an error.
- The half-yearly shopping data's size ranges from 60 to 70 GB. It takes approximately 30 minutes to be uploaded to Hadoop.
- Comparatively, the half-yearly search data's size is bigger, ranging from 5 to 8 TB. As a result, it takes approximately 5 to 10 hours to be uploaded to Hadoop. (Run time can vary depending on the Hadoop node's specifications.)
$ hadoop dfs -put file:///mnt/shopping25y1h/shopping hdfs://hadoop-000-000/user/ncp/shopping25y1h - Hadoop cluster's name takes the form of
-
See the data uploaded to Hadoop.
$ find /mnt/shopping25y1h/v3/shopping -type f | wc -l // Check the number of files in the local file path. $ hdfs dfs -ls -R /user/ncp/shopping25y1h/shopping | grep -v ‘^d' | wc -l // Check the number of files in the hdfs file path.