Categorizing MNIST handwriting images with TensorFlow

release/20240425
English

Categorizing MNIST handwriting images with TensorFlow

Article Summary

Share feedback

Thanks for sharing your feedback!

Available in VPC

This guide explains how to submit single batch jobs using Data Forest.

Step 1. Create account

For how to create Data Forest account, refer to Create and manage accounts.
For how to create apps, refer to Create and manage apps.

Step 2. Check dataset

This example uses the MNIST data set. MNIST data set is a numeric dataset that consists of handwriting images. It is made up of 60,000 training data sets and 10,000 test data sets, and numeric labeling data exists for each image.

File	Description
train-images-idx3-ubyte.gz	Training set image
train-labels-idx1-ubyte.gz	Training set label
t10k-images-idx3-ubyte.gz	Test set image
t10k-labels-idx1-ubyte.gz	Test set label

Step 3. Create workspace

The following describes how to create a workspace.

From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest menus, in that order.
Click AI Forest > Workspace > [Create workspace] > [Advanced workspace].
Select a Data Forest account, set the workspace name, and then select "Singlebatch" for the workspace type.
Select TensorFlow for the Docker image.
Note
TensorFlow is an open source machine learning library developed by Google. For more information about TensorFlow, refer to the TensorFlow website.
Select the image version.
Select r2.1 in this example.
Select the GPU model name, number of GPU cores, and memory capacity.
We'll go with the default values set in this example.
Enter the information in the data settings area, and then click the [Add] button.

Enter
- InputPath: Enter the path of input data to be copied into the container, enter '/user/{username}/data_in'
- Input Container Local Path: Enter container path to store input data
Output
- OutputPath: Enter HDFS path to store output, enter '/user/{username}/data_out'
- Output Container Local Path: Enter the path of the container where the output data resides
- Overwrite: Set whether to overwrite if a file already exists when storing output data in HDFS

Click the [Next] button. The workspace creation is https://cdn.document360.io/6998976f-9d95-4df8-b847-d375892b92c2/Images/Documentation/df-af-mnist_1-6_vpc_en.pngcompleted.

Step 4. Download example code

Download the example file according to the Docker image and version in the workspace.
The following is the example code required for running the example.

Version	File
Tensorflow-r1.14	mnist_tf1.zip
Tensorflow-r2.1	mnist_tf2.zip
Tensorflow-r2.3.1	mnist_tf2_3_1.zip

Note

Proceed with the example code that goes with the TensorFlow version set when creating the workspace. We downloaded mnist_tf2.zip in the example because Tensorflow-r2.1 was selected.

tf-mnist.py is code for a model consisting of one hidden layer to be trained with the MNIST training data set provided by "tf.keras.datasets.mnist" and evaluated with the test set.

...
# Import dataset
mnist = tf.keras.datasets.mnist
  (x_train, y_train), (x_test,y_test) = mnist.load_data()

# Preprocess data 
  x_train, x_test = x_train/255.0, x_test/255.0

# Configure model
  model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'), 
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax')
  ])

# Compile
  model.compile(optimizer='adam',
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy'])

# Designate checkpoints and call-back function for storage and recovery
  checkpoint_path = FLAGS.log_dir+"/cp-{epoch:04d}.ckpt"
  checkpoint_dir = os.path.dirname(checkpoint_path)

  cp_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_path, 
    verbose=1,
    save_weights_only=True,
    period=5)

# Save weights
  model.save_weights(checkpoint_path.format(epoch=0))

# Train model
  model.fit(x_train, y_train, epochs=FLAGS.max_steps, callbacks=[cp_callback])

...

Step 5. Upload files to workspace browser

The following describes how to decompress mnist.zip and upload files to the workspace browser.

From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest menus, in that order.
Click AI Forest > Workspace.
Select an account, and then click the Workspace link icon next to the workspace name.
Select the workspace to upload the files, and then click the [Upload] icon.
When the upload window appears, drag the decompressed files from "mnist.zip" to the upload window.
Click the [Start upload] button.
Click the [OK] button when the upload is completed.

Step 6. Submit single batch jobs

The following describes how to submit single batch jobs.

From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest > AI Forest > Workspace browser menus, in that order.
Select an account, and then a workspace.
Select files to run, and then click the [Run] button.
Enter the following information.
Click the [OK] button. The DL app will run.

Step 7. Check job log and result

The following describes how to view execution log after running a job.

From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest > App menus, in that order.
Select an account, click the app whose details you want to view.
Access the URL under Quick links > AppMaster UI in the app's details.
When the login window appears, enter the account name and password you entered when creating the Data Forest account.
In the Applications menu, find the ID executed by the app name entered when running the job, and then click it.
Click Logs of the application ID. You can check the log of the executed job.
After the job is completed, check the result from the {path you entered in the output HDFS path when creating the workspace}/{value delivered as the --log_dir argument}.

Was this article helpful?

What's Next

Detecting objects from pedestrian datasets with PyTorch

Table of contents

Step 1. Create account
Step 2. Check dataset
Step 3. Create workspace
Step 4. Download example code
Step 5. Upload files to workspace browser
Step 6. Submit single batch jobs
Step 7. Check job log and result