- Print
- PDF
Categorizing MNIST handwriting images with TensorFlow
- Print
- PDF
Available in VPC
This guide explains how to submit single batch jobs using Data Forest.
Step 1. Create account
- For how to create Data Forest account, refer to Create and manage accounts.
- For how to create apps, refer to Create and manage apps.
Step 2. Check dataset
This example uses the MNIST data set. MNIST data set is a numeric dataset that consists of handwriting images. It is made up of 60,000 training data sets and 10,000 test data sets, and numeric labeling data exists for each image.
File | Description |
---|---|
train-images-idx3-ubyte.gz | Training set image |
train-labels-idx1-ubyte.gz | Training set label |
t10k-images-idx3-ubyte.gz | Test set image |
t10k-labels-idx1-ubyte.gz | Test set label |
Step 3. Create workspace
The following describes how to create a workspace.
From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest menus, in that order.
Click AI Forest > Workspace > [Create workspace] > [Advanced workspace].
Select a Data Forest account, set the workspace name, and then select "Singlebatch" for the workspace type.
Select TensorFlow for the Docker image.
NoteTensorFlow is an open source machine learning library developed by Google. For more information about TensorFlow, refer to the TensorFlow website.
Select the image version.
Select r2.1 in this example.
Select the GPU model name, number of GPU cores, and memory capacity.
We'll go with the default values set in this example.
Enter the information in the data settings area, and then click the [Add] button.
- Enter
- InputPath: Enter the path of input data to be copied into the container, enter '/user/{username}/data_in'
- Input Container Local Path: Enter container path to store input data
- Output
- OutputPath: Enter HDFS path to store output, enter '/user/{username}/data_out'
- Output Container Local Path: Enter the path of the container where the output data resides
- Overwrite: Set whether to overwrite if a file already exists when storing output data in HDFS
- Click the [Next] button. The workspace creation is https://cdn.document360.io/6998976f-9d95-4df8-b847-d375892b92c2/Images/Documentation/df-af-mnist_1-6_vpc_en.pngcompleted.
Step 4. Download example code
Download the example file according to the Docker image and version in the workspace.
The following is the example code required for running the example.
Version | File |
---|---|
Tensorflow-r1.14 | mnist_tf1.zip |
Tensorflow-r2.1 | mnist_tf2.zip |
Tensorflow-r2.3.1 | mnist_tf2_3_1.zip |
Proceed with the example code that goes with the TensorFlow version set when creating the workspace. We downloaded mnist_tf2.zip
in the example because Tensorflow-r2.1 was selected.
tf-mnist.py
is code for a model consisting of one hidden layer to be trained with the MNIST training data set provided by "tf.keras.datasets.mnist" and evaluated with the test set.
...
# Import dataset
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test,y_test) = mnist.load_data()
# Preprocess data
x_train, x_test = x_train/255.0, x_test/255.0
# Configure model
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='softmax')
])
# Compile
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# Designate checkpoints and call-back function for storage and recovery
checkpoint_path = FLAGS.log_dir+"/cp-{epoch:04d}.ckpt"
checkpoint_dir = os.path.dirname(checkpoint_path)
cp_callback = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_path,
verbose=1,
save_weights_only=True,
period=5)
# Save weights
model.save_weights(checkpoint_path.format(epoch=0))
# Train model
model.fit(x_train, y_train, epochs=FLAGS.max_steps, callbacks=[cp_callback])
...
Step 5. Upload files to workspace browser
The following describes how to decompress mnist.zip and upload files to the workspace browser.
- From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest menus, in that order.
- Click AI Forest > Workspace.
- Select an account, and then click the Workspace link icon next to the workspace name.
- Select the workspace to upload the files, and then click the [Upload] icon.
- When the upload window appears, drag the decompressed files from "mnist.zip" to the upload window.
- Click the [Start upload] button.
- Click the [OK] button when the upload is completed.
Step 6. Submit single batch jobs
The following describes how to submit single batch jobs.
- From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest > AI Forest > Workspace browser menus, in that order.
- Select an account, and then a workspace.
- Select files to run, and then click the [Run] button.
- Enter the following information.
- Click the [OK] button. The DL app will run.
Step 7. Check job log and result
The following describes how to view execution log after running a job.
- From the NAVER Cloud Platform console, click the Services > Big Data & Analytics > Data Forest > App menus, in that order.
- Select an account, click the app whose details you want to view.
- Access the URL under Quick links > AppMaster UI in the app's details.
- When the login window appears, enter the account name and password you entered when creating the Data Forest account.
- In the Applications menu, find the ID executed by the app name entered when running the job, and then click it.
- Click Logs of the application ID. You can check the log of the executed job.
- After the job is completed, check the result from the
{path you entered in the output HDFS path when creating the workspace}/{value delivered as the --log_dir argument}
.