Categorizing MNIST handwriting images with TensorFlow
    • PDF

    Categorizing MNIST handwriting images with TensorFlow

    • PDF

    Article Summary

    Available in VPC

    This guide explains how to submit single batch jobs using Data Forest.

    Step 1. Create account

    Step 2. Check dataset

    This example uses the MNIST data set. MNIST data set is a numeric dataset that consists of handwriting images. It is made up of 60,000 training data sets and 10,000 test data sets, and numeric labeling data exists for each image.

    FileDescription
    train-images-idx3-ubyte.gzTraining set image
    train-labels-idx1-ubyte.gzTraining set label
    t10k-images-idx3-ubyte.gzTest set image
    t10k-labels-idx1-ubyte.gzTest set label

    Step 3. Create workspace

    The following describes how to create a workspace.

    1. From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest menus, in that order.

    2. Click AI Forest > Workspace > [Create workspace] > [Advanced workspace].

    3. Select a Data Forest account, set the workspace name, and then select "Singlebatch" for the workspace type.
      df-af-mnist_1-3_vpc_ko

    4. Select TensorFlow for the Docker image.

      Note

      TensorFlow is an open source machine learning library developed by Google. For more information about TensorFlow, refer to the TensorFlow website.

    5. Select the image version.
      Select r2.1 in this example.
      df-af-mnist_1-4_vpc_ko

    6. Select the GPU model name, number of GPU cores, and memory capacity.
      We'll go with the default values set in this example.
      df-af-mnist_1-5_vpc_ko

    7. Enter the information in the data settings area, and then click the [Add] button.

    • Enter
      • InputPath: Enter the path of input data to be copied into the container, enter '/user/{username}/data_in'
      • Input Container Local Path: Enter container path to store input data
    • Output
      • OutputPath: Enter HDFS path to store output, enter '/user/{username}/data_out'
      • Output Container Local Path: Enter the path of the container where the output data resides
      • Overwrite: Set whether to overwrite if a file already exists when storing output data in HDFS
        df-af-mnist_1-6_vpc_ko
    1. Click the [Next] button. The workspace creation is https://cdn.document360.io/6998976f-9d95-4df8-b847-d375892b92c2/Images/Documentation/df-af-mnist_1-6_vpc_en.pngcompleted.

    Step 4. Download example code

    Download the example file according to the Docker image and version in the workspace.
    The following is the example code required for running the example.

    VersionFile
    Tensorflow-r1.14mnist_tf1.zip
    Tensorflow-r2.1mnist_tf2.zip
    Tensorflow-r2.3.1mnist_tf2_3_1.zip
    Note

    Proceed with the example code that goes with the TensorFlow version set when creating the workspace. We downloaded mnist_tf2.zip in the example because Tensorflow-r2.1 was selected.

    tf-mnist.py is code for a model consisting of one hidden layer to be trained with the MNIST training data set provided by "tf.keras.datasets.mnist" and evaluated with the test set.

    ...
    # Import dataset
    mnist = tf.keras.datasets.mnist
      (x_train, y_train), (x_test,y_test) = mnist.load_data()
    
    # Preprocess data 
      x_train, x_test = x_train/255.0, x_test/255.0
    
    # Configure model
      model = tf.keras.models.Sequential([
      tf.keras.layers.Flatten(input_shape=(28, 28)),
      tf.keras.layers.Dense(128, activation='relu'), 
      tf.keras.layers.Dropout(0.2),
      tf.keras.layers.Dense(10, activation='softmax')
      ])
    
    # Compile
      model.compile(optimizer='adam',
            loss='sparse_categorical_crossentropy',
            metrics=['accuracy'])
    
    # Designate checkpoints and call-back function for storage and recovery
      checkpoint_path = FLAGS.log_dir+"/cp-{epoch:04d}.ckpt"
      checkpoint_dir = os.path.dirname(checkpoint_path)
    
      cp_callback = tf.keras.callbacks.ModelCheckpoint(
        filepath=checkpoint_path, 
        verbose=1,
        save_weights_only=True,
        period=5)
    
    # Save weights
      model.save_weights(checkpoint_path.format(epoch=0))
    
    # Train model
      model.fit(x_train, y_train, epochs=FLAGS.max_steps, callbacks=[cp_callback])
    
    ...
    
    

    Step 5. Upload files to workspace browser

    The following describes how to decompress mnist.zip and upload files to the workspace browser.

    1. From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest menus, in that order.
    2. Click AI Forest > Workspace.
    3. Select an account, and then click the Workspace link icon next to the workspace name.
      af-singlebatch_4-1_vpc_ko(1)
    4. Select the workspace to upload the files, and then click the [Upload] icon.
    5. When the upload window appears, drag the decompressed files from "mnist.zip" to the upload window.
    6. Click the [Start upload] button.
    7. Click the [OK] button when the upload is completed.
      af-df-mnist_01_vpc_ko

    Step 6. Submit single batch jobs

    The following describes how to submit single batch jobs.

    1. From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest > AI Forest > Workspace browser menus, in that order.
    2. Select an account, and then a workspace.
    3. Select files to run, and then click the [Run] button.
    4. Enter the following information.
      af-singlebatch_06_vpc_en
    5. Click the [OK] button. The DL app will run.

    Step 7. Check job log and result

    The following describes how to view execution log after running a job.

    1. From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest > App menus, in that order.
    2. Select an account, click the app whose details you want to view.
    3. Access the URL under Quick links > AppMaster UI in the app's details.
    4. When the login window appears, enter the account name and password you entered when creating the Data Forest account.
    5. In the Applications menu, find the ID executed by the app name entered when running the job, and then click it.
      df-af-coco_appid_vpc
    6. Click Logs of the application ID. You can check the log of the executed job.
      df-qs_logs_vpc_ko.png
    7. After the job is completed, check the result from the {path you entered in the output HDFS path when creating the workspace}/{value delivered as the --log_dir argument}.
      df-af-mnist_result_vpc_ko.png

    Was this article helpful?

    Changing your password will log you out immediately. Use the new password to log back in.
    First name must have atleast 2 characters. Numbers and special characters are not allowed.
    Last name must have atleast 1 characters. Numbers and special characters are not allowed.
    Enter a valid email
    Enter a valid password
    Your profile has been successfully updated.