Creating Ncloud TensorFlow Server clusters
    • PDF

    Creating Ncloud TensorFlow Server clusters

    • PDF

    Article Summary

    Available in Classic

    Before use

    Q. What is Ncloud TensorFlow Cluster?

    • Ncloud TensorFlow Cluster is a service that helps you more quickly run the tensorflow code with massive learning data or very large operations in clusters.
    • Using tcm commands from the command line, you can create, add and delete cluster nodes, and easily share storage between clusters and expand it. You can also easily submit jobs to clusters with your own code.
    • "Ncloud Tensorflow Cluster" uses Tensorflow, an open source machine learning software library developed by Google Brain.

    Q. What kind of OS does Ncloud TensorFlow Cluster use?

    • The supported OS is “ubuntu-16.04-64-server.”

    Q. What packages are available and can I use only the provided packages?

    • The Ncloud TensorFLow Cluster master server comes with Anaconda and the tcm CLI package that enables you to run clusters. (except for TensorFlow installation)
    • TensorFlow is installed in a cluster node that is created by the Ncloud TensorFlow Cluster master server.

    Q. Can I use Java or other languages besides Python?

    • TensorFlow also provides APIs such as Java and Go, but it does not guarantee stability. Therefore, we recommend using Python.

    Q. How do I create a Ncloud TensorFlow cluster?

    • Create servers with standard specifications as you need, and create a cluster with the servers using the tcm commands through a terminal.
    • You can choose between the monthly and hourly price plans for your TensorFlow master server, which you can use by setting up your access environment after you create a server. (Workers and parameter server nodes are created based on the hourly plans.)

    Q. What server types are available?

    • The standard server types are available for the Ncloud TensorFlow Cluster master server provided by NAVER CLOUD PLATFORM. You can choose among the following specifications for server nodes that you can create with the tcm CLI commands through a terminal. (Note that all server nodes should be created with the same specifications.)

    • The server specifications available for Ncloud TensorFlow Cluster workers and parameter server nodes are listed below: (All server nodes are created with the same specifications, and GPU server types will be supported later.)

    Spec codeSpecificationsDescription
    minivCPU 4ea, Memory 16GB, HDD 50GBAppropriate for testing clusters or processing small workload.
    basicvCPU 8ea, Memory 32GB, HDD 50GBAppropriate for processing mid-sized workload.
    highvCPU 16ea, Memory 32GB, HDD 50GBAppropriate for processing large workload.
    gpu1GPU 1ea, GPU Mem 24GB, vCPU 4ea, Memory 30GB, SSD 50GBUse as many single GPUs as the number of cluster nodes.
    gpu2GPU 2ea, GPU Mem 48GB, vCPU 8ea, Memory 60GB, SSD 50GBUse as many dual GPUs as the number of cluster nodes to process very large workload.

    (Note that server nodes are created with the same specifications, and all nodes are recognized as worker server nodes. The number of parameter server nodes can be specified.)

    Q. I have created a Ncloud TensorFlow Cluster master server. How can I create a worker server or parameter server?

    Q. How can I execute TensorFlow code in the cluster?

    About Ncloud TensorFlow Cluster

    A Ncloud TensorFlow cluster is a set of tasks that participate in the distributed execution of a TensorFlow graph. Each task is associated with a TensorFlow server node, which contains a master server that can be used to create sessions, a worker server that executes operations in the graph and a parameter server that shares computed gradients.
    For servers to do their tasks, you should pass ClusterSpec that describes all of the tasks in the cluster to each server node. Ncloud TensorFlow Cluster helps you do that relatively easily.

    With Ncloud TensorFlow Cluster, you don’t need to care about ClusterSpec and can concentrate on writing learning code. You need to edit some code to pass ClusterSpec to the sessions though. For more information about editing the code, see “Ncloud TensorFlow Cluster MNIST Example.”

    This service is designed to provide distributed parallel processing, but also provides basic data preprocessing and visualization as Anaconda is installed in the server.

    Ncloud TensorFlow Cluster Configuration

    tensorflow-2-1_configuration_en.png

    Create Ncloud TensorFlow Cluster

    Step 1. Connect to Console

    Connect to the Console and select Server > Server.

    tensorflow-2-1-100_en.png

    ① To create a server, click [Create server].

    Step 2. Select server image

    Select a TensorFlow server image to create a server.
    tensorflow-2-1-105_en.png

    ① Select Application > Tensorflow.

    ② Select “tensorflow-cluster-ubuntu-16.04-64-master.” from Image and click [Next].

    Step 3. Set server

    Select a storage type, server type, pricing plan, and zone, and enter a server name.

    tensorflow-2-1-110_en.png

    ① Select a zone. You can select between “KR-1” and “KR-2.”

    ② Select a server storage type.

    • Select SSD for services that require high-performance I/O and HDD for general services. Please note that you can use SSD as additional storage only if the boot storage is SSD.

    ③ Select the server type you want. You can only select between the Standard server types for your master server.

    ④ You can select either the monthly or hourly pricing plan.

    ⑤ Enter the number of servers. (Select 1 as one master server composes one cluster, and select more than 1 only if you want multiple clusters.)

    ⑥ Enter a server name and click [Next].

    Step 4. Set authentication key

    If you have an existing authentication key, select Use an Existing Authentication Key. Otherwise, create a new authentication key according to the following procedure.

    tensorflow-2-1-115_en.png

    ① Select Create a New Authentication Key.

    ② Enter an authentication key name.

    ③ Click [Create & Save Auth Key] to save the authentication key file to your local PC.

    • Issue a new authentication key.
    • After saving it, please keep the authentication key in a safe place on your PC.
    • The authentication key is used to obtain an initial administrator password.

    ④ Click [Next].

    Step 5. Set ACG

    In Ncloud Tensorflow Cluster, worker nodes communicate with each other to perform jobs. The service needs a specific port for such communications, which can be specified by adding an ACG in the NAVER CLOUD PLATFORM Console. There are two ways to add an ACG for Ncloud Tensorflow Cluster:

    ① Creating a custom ACG

    • Go to the ACG setting page.
    • Click Create ACG.
    • Enter an ACG name and click Create.
    • Select the ACG from the ACG list and click Set ACG.
    • Add the following 3 access sources as shown in the table below.
    ProtocolAccess sourceAllowed port
    TCP0.0.0.022
    TCP0.0.0.02222
    TCP0.0.0.03333
    • Click Apply.

    ② Using “ncloud-default-acg”

    “ncloud-default-acg” is an ACG that is registered by default. This ACG setting allows your nodes to access all ports.
    With this ACG setting, each worker node can communicate with each other in Ncloud Tensorflow Cluster.

    Select the ACG you set and added in Step 2.

    tensorflow-2-1-120_en.png

    Step 6. Confirm

    Confirm the settings.

    tensorflow-2-1-125_en.png

    ① Make sure that the server image, server, authentication key, and ACG are set properly.

    ② After the final confirmation, click [Create server].

    • This may take several minutes or longer for the server to be created.

    Check in the server list

    Check if the created server is in the list.

    ① The server you have created will appear in the list.

    ② Wait until the server is created, the package is installed, and the server status becomes Running.

    Set port forwarding and connect

    Set port forwarding

    To connect to a server via a terminal program (such as Putty), you must set port forwarding.

    tensorflow-2-1-130_en.png

    ① Select the server from the server list and click [Set port forwarding].

    ② Set an external port number in the Set port forwarding window. The range of external port numbers is between 1024 and 65,534. It cannot be used for service purposes other than the feature to connect to a server.(The internal port number is set to 22.)

    ③ Click [Add] to add the setting to the bottom and select [Edit] or [Delete] to edit or delete the setting.

    ④ Click [Apply] for SSH connection to the configured external port using a terminal program.

    Connect to server and use tcm CLI command

    Connect to your server with ssh root@[public IP address for access] -p [port forwarding port number] via a terminal program as shown in the figure below, and you can use tcm CLI commands.

    tensorflow-2-1-135_en.png

    ① Connect to the Ncloud TensorFlow Cluster master server.

    ② Execute “tcm” and the list of CLI commands is displayed.

    ③ Execute “tcm info” to see the cluster nodes previously created. (If there is no cluster node, use “create” command to create a new one.)


    Was this article helpful?

    Changing your password will log you out immediately. Use the new password to log back in.
    First name must have atleast 2 characters. Numbers and special characters are not allowed.
    Last name must have atleast 1 characters. Numbers and special characters are not allowed.
    Enter a valid email
    Enter a valid password
    Your profile has been successfully updated.