Keras image classification

Article Summary

Share feedback

Thanks for sharing your feedback!

Available in Classic and VPC

In this tutorial, we're going to train a neural network model that classifies images of clothes such as sneakers or shirts. It's OK if you don't understand all the details. We're going to have a quick look at a complete TensorFlow program. More details will be explained later as we learn more.

Here, we'll use tf.keras, which is a high-level API that can create and train TensorFlow models.

# Import tensorflow and tf.keras.
import tensorflow as tf
from tensorflow import keras

# Import a helper library.
import numpy as np
import matplotlib.pyplot as plt

print(tf.__version__)

Import fashion MNIST dataset

We're going to use a fashion MNIST dataset that contains 10 categories and 70,000 black-and-white images. The images are low-resolution (28 x 28 pixel) and indicate individual items of clothes as follows:

The fashion MNIST is often used in place of the classic MNIST dataset which is like "Hello, World" of the computer vision field. An MNIST dataset consists of images of handwritten numbers (0, 1, 2, etc.) It's the same format as the clothes images we're going to use here.

The fashion MNIST is a bit more difficult than the general MNIST, and is chosen to create diverse examples. Two datasets are comparably small, so often used to check whether an algorithm works. It's helpful to use it for testing and debugging codes.

60,000 images are used to train a network. And then, we'll evaluate how accurate the network classifies images with 10,000 images. The fashion MNIST dataset can directly be imported and mounted on TensorFlow:

fashion_mnist = keras.datasets.fashion_mnist

(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

Calling the load_data() function returns four NumPy sequences:

train_images and train_labels sequences are a training set used in the model's learning.
test_images and test_labels sequences are a test set used in the model's testing.

Images are NumPy sequences of 28 x 28 in size, and the pixel value is between 0 and 255. Labels are sequences of integers from 0 to 9. This value indicates the clothes' class in the image:

Label	Class
0	T-shirt/top
1	Trouser
2	Pullover
3	Dress
4	Coat
5	Sandal
6	Shirt
7	Sneaker
8	Bag
9	Ankle boot

Each image is mapped to a label. Class names are not in the dataset, so create and save a separate parameter to use for printing the image later:

class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

Explore data

Let's look at the dataset's structure before training the model. The following code shows that the training set has 60,000 images. Each image is displayed in 28 x 28 pixels:

train_images.shape

# (60000, 28, 28)

Similarly, the training set has 60,000 labels:

len(train_labels)

# 60000

Each label is an integer between 0 and 9:

train_labels

# array([9, 0, 0, ..., 3, 0, 5], dtype=uint8)

The test set has 10,000 images. This image is also displayed in 28 x 28 pixels:

test_images.shape

# (10000, 28, 28)

The test set has labels for 10,000 images:

len(test_labels)

# 10000

Preprocess data

Data should be preprocessed before training a network. Looking at the first image in the training set, you can see that the range of the pixel value is between 0 and 255:

plt.figure()
plt.imshow(train_images[0])
plt.colorbar()
plt.grid(False)
plt.show()

Let us adjust this value's range between 0 and 1 before injecting it to the neural network model. In order to do it, you should divide it by 255. It's important to preprocess the training set and test set in the same way:

train_images = train_images / 255.0

test_images = test_images / 255.0

Let's print the first 25 images and class names under them in the training set. Check if the data format is valid, and finish the network configuration and preparation for the training.

plt.figure(figsize=(10,10))
for i in range(25):
    plt.subplot(5,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(train_images[i], cmap=plt.cm.binary)
    plt.xlabel(class_names[train_labels[i]])
plt.show()

Configure model

In order to create a neural network model, configure the model's layers and compile the model.

Set layers

A neural network's basic component is a layer. Layers extract expressions from the injected data. More meaningful expressions for solving problem will probably be extracted.

In deep learning, configuration is done mostly by connecting simple layers. The weights (parameters) of layers such as tf.keras.layers.Dense are learned while training.

model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])

tf.keras.layers.Flatten, which is the first layer in this network, converts the image format of two-dimensional sequence (28 x 28 pixels) to one-dimensional sequence of 28 * 28 = 784 pixels. This layer unfolds the row of the pixels in an image and spreads it out in a single line. This layer doesn't have learned weights, and only converts data.

After spreading the pixels, two tf.keras.layers.Dense layers are connected back to back. This layer is called a densely-connected or fully-connected layer. The first dense layer has 128 nodes (or neurons). The second (last) layer is a softmax layer with 10 nodes. This layer returns 10 probabilities, and the total sum of returned values is 1. Each node prints a probability of the current image to belong to one of 10 classes.

Compile model

A few settings required before training a model are added in the model compiling stage:

Loss function: It measures the model's errors while training. This function needs to be minimized so the model's learning goes to the right direction.
Optimizer: It decides the update method for the model based on the data and loss function.
Metrics: It's used to monitor the training and test stages. Accuracy is used in the following example, which is the ratio of images that have been classified correctly.

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

Train model

Steps to train a neural network model are as follows:

Inject the training data to the model; it'd be train_images and train_labels sequences in this example.
The model learns how to map images and labels.
Create the model's prediction for the test set; it'd be the test_images sequence in this example. Check if this prediction matches the label in the test_labels sequence.
Call the model.fit method in order to start training, then the model learns the training data:

model.fit(train_images, train_labels, epochs=5)

"""
Epoch 1/5
1875/1875 [==============================] - 3s 2ms/step - loss: 0.4985 - accuracy: 0.8238
Epoch 2/5
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3755 - accuracy: 0.8645
Epoch 3/5
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3355 - accuracy: 0.8769
Epoch 4/5
1875/1875 [==============================] - 3s 2ms/step - loss: 0.3130 - accuracy: 0.8852
Epoch 5/5
1875/1875 [==============================] - 3s 2ms/step - loss: 0.2945 - accuracy: 0.8907
<tensorflow.python.keras.callbacks.History at 0x7f5c9cc0f400>
"""

Indicators of loss and accuracy are printed as the model is trained. This model achieves the accuracy of about 0.88 (88%) in the training set.

Evaluate accuracy

Next, compare the model's performance in the test set:

test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)

print('\ntest accuracy:', test_acc)

"""
313/313 - 0s - loss: 0.3619 - accuracy: 0.8754
Test accuracy: 0.8754000067710876
"""

The test set's accuracy is slightly lower than that of the training set. The difference in accuracy between the training set and test set comes from overfitting. Overfitting refers to the phenomena of a machine learning model having poorer performance with new data than with training data.

Create prediction

You can create prediction for images using a trained model.

predictions = model.predict(test_images)

Here, the label of each image in the test set is predicted. Let's check the first prediction:

predictions[0]

"""
array([1.7927578e-04, 9.7309680e-07, 2.0041271e-05, 1.7340941e-06,
       5.4875236e-06, 7.3947711e-03, 2.7816868e-04, 1.0243144e-01,
       1.9015789e-04, 8.8949794e-01], dtype=float32)
"""

This prediction is displayed in a sequence of 10 numbers. This value indicates confidence of the model that corresponds to 10 items of clothes. Let's look for the label with the highest confidence:

np.argmax(predictions[0])

# 9

The model is most confident that this image is of ankle boots (class_name[9]). Let's see the test label if this value is correct:

test_labels[0]

# 9

Let's put all the predictions about 10 classes in a graph:

def plot_image(i, predictions_array, true_label, img):
predictions_array, true_label, img = predictions_array[i], true_label[i], img[i]
plt.grid(False)
plt.xticks([])
plt.yticks([])

plt.imshow(img, cmap=plt.cm.binary)

predicted_label = np.argmax(predictions_array)
if predicted_label == true_label:
    color = 'blue'
else:
    color = 'red'

plt.xlabel("{} {:2.0f}% ({})".format(class_names[predicted_label],
                                100*np.max(predictions_array),
                                class_names[true_label]),
                                color=color)

def plot_value_array(i, predictions_array, true_label):
predictions_array, true_label = predictions_array[i], true_label[i]
plt.grid(False)
plt.xticks([])
plt.yticks([])
thisplot = plt.bar(range(10), predictions_array, color="#777777")
plt.ylim([0, 1])
predicted_label = np.argmax(predictions_array)

thisplot[predicted_label].set_color('red')
thisplot[true_label].set_color('blue')

Let's check the 0th element's image, prediction, and confidence score sequence.

i = 0
plt.figure(figsize=(6,3))
plt.subplot(1,2,1)
plot_image(i, predictions, test_labels, test_images)
plt.subplot(1,2,2)
plot_value_array(i, predictions,  test_labels)
plt.show()

i = 12
plt.figure(figsize=(6,3))
plt.subplot(1,2,1)
plot_image(i, predictions, test_labels, test_images)
plt.subplot(1,2,2)
plot_value_array(i, predictions,  test_labels)
plt.show()

Let's print out predictions for a few images. Correctly predicted labels are in blue, and incorrectly predicted labels are in red. The numbers indicate the confidence of the predicted label in percentage. (100 is the perfect score.) It's possible to make wrong predictions even when the confidence score is high.

# Print the first X number of test images, predicted labels, and genuine labels.
# Correct predictions appear in blue, and incorrect predictions in red.
num_rows = 5
num_cols = 3
num_images = num_rows*num_cols
plt.figure(figsize=(2*2*num_cols, 2*num_rows))
for i in range(num_images):
plt.subplot(num_rows, 2*num_cols, 2*i+1)
plot_image(i, predictions, test_labels, test_images)
plt.subplot(num_rows, 2*num_cols, 2*i+2)
plot_value_array(i, predictions, test_labels)
plt.show()

Create a prediction for an image using the model trained last.

# Choose an image from the test set.
img = test_images[0]

print(img.shape)

# (28, 28)

The tf.keras model is optimized for creating predictions as a group or batch of samples at once. It should be made into a two-dimensional sequence even when using a single image:

# Add the image to a batch, even when a single image is used.
img = (np.expand_dims(img,0))

print(img.shape)

# (1, 28, 28)

Now, create the prediction for this image:

predictions_single = model.predict(img)

print(predictions_single)

"""
[[1.7927596e-04 9.7309771e-07 2.0041271e-05 1.7340958e-06 5.4875236e-06
7. 3947711e-03 2.7816897e-04 1.0243144e-01 1.9015789e-04 8.8949794e-01]]
"""

plot_value_array(0, predictions_single, test_labels)
_ = plt.xticks(range(10), class_names, rotation=45)

Since model.predict returns a two-dimensional NumPy sequence, select the first image's prediction:

np.argmax(predictions_single[0])

# 9

The model's prediction is label 9, the same as before.

# MIT License
#
# Copyright (c) 2017 François Chollet
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.

Was this article helpful?

What's Next

Keras text classification

Table of contents