Sometimes, getting more data is not an option. But it need not spell doom for your Deep Learning project. There is a technique that can help you make the most of limited data. It's called [Data augmentation] and that's what I'll talk about today.

Hi there! Welcome back to my gentle introduction to Deep Learning. With hot dogs.

Last time we finally got our first actual taste of what Deep Learning, and specifically ConvNets, are. We got a decent classifier, but it’s still far from production ready. Jian-Yang is proud, but the Periscope guys are not impressed. We need a better classifier so that we can sell the company to them and become really rich.

There are many different techniques that you can apply to an image classifier that does not perform as well as necessary. The most powerful, and the most obvious, is to get more data (pdf link). A simple model with a lot of data will most of the time outperform a more complex model with little data.

However, enlarging our data set is not always possible, and most of the time it is actually pretty expensive, either in terms of money or time. I’m not about to go out and spend a few weeks photographing hot dogs!

What if we could make the data up? Well, for a certain definition of “making up”, we can. It actually makes a lot of sense and can improve performance quite a bit. Think about it: our sample images each represent just one possible view of the hotdog. Our classifier can get hung up on minor details that would not be so apparent if the hotdog was partially cropped or distorted, so if we show it modified images on top of the ones we already have we will both have more images and a more generalizable classifier!

Data augmentation

The idea is simple: we don’t have that many images, specially of hotdogs, so let’s make the most of the few we have. We’ll generate new images by applying a number of transformations to the ones we have: we will zoom in, out, distort them a bit, translate them, rotate them…

Luckily, we basically don’t have to code any of this: it’s already provided by the ImageDataGenerator class in Keras!

# Delete this line if you are not running the notebook in colab
%tensorflow_version 1.x 
# Silence some annoying deprecation warnings
import logging
logging.getLogger('tensorflow').disabled = True

import keras
from keras.preprocessing.image import ImageDataGenerator
from keras.layers import Conv2D, MaxPooling2D, InputLayer, Flatten, Dense
from keras.optimizers import Adam
import os

# Download the data
!wget -q "" -O
!rm -rf data/
!unzip -oq
!ls -lh data
base_dir = 'data/'

train_dir = os.path.join(base_dir, 'train')
validation_dir = os.path.join(base_dir, 'validation')

train_datagen = ImageDataGenerator(rescale=1 / 255,

test_datagen = ImageDataGenerator(rescale=1 / 255)
It’s important to only apply this to the training generator: we don’t want to be transforming the validation set, since we want it to be reflective of the kinds of images we might find in the wild.

train_generator = train_datagen.flow_from_directory(train_dir, 

validation_generator = test_datagen.flow_from_directory(validation_dir,

validation_generator_noshuffle = test_datagen.flow_from_directory(validation_dir,
Found 4765 images belonging to 2 classes.
Found 888 images belonging to 2 classes.
Found 888 images belonging to 2 classes.
my_2nd_cnn = keras.Sequential()
my_2nd_cnn.add(Conv2D(32, (3, 3), activation='relu', input_shape=(120, 120, 3)))
my_2nd_cnn.add(Conv2D(32, (3, 3), activation='relu'))
my_2nd_cnn.add(Dense(64, activation='relu'))
my_2nd_cnn.add(Dense(1, activation='sigmoid'))

history = my_2nd_cnn.fit_generator(train_generator,
                                   class_weight = {0: 7, 1: 1},
from mateosio import plot_training_histories
from mateosio import plot_confusion_matrix

ax, precision, recall = plot_confusion_matrix(my_2nd_cnn, validation_generator_noshuffle)
print(precision, recall)
0.415384615385 0.947368421053


Wow, now I am underfitting! I guess that means I can make my model even a bit more complex, let’s see.

One more layer

my_3rd_cnn = keras.Sequential()
my_3rd_cnn.add(Conv2D(32, (3, 3), activation='relu', input_shape=(120, 120, 3)))
my_3rd_cnn.add(Conv2D(32, (3, 3), activation='relu'))
my_3rd_cnn.add(Dense(128, activation='relu'))
my_3rd_cnn.add(Dense(128, activation='relu'))
my_3rd_cnn.add(Dense(64, activation='relu'))
my_3rd_cnn.add(Dense(1, activation='sigmoid'))

history = my_3rd_cnn.fit_generator(train_generator,
                                   class_weight = {0: 7, 1: 1},
ax, precision, recall = plot_confusion_matrix(my_3rd_cnn, validation_generator_noshuffle)
print(precision, recall)
0.509202453988 0.728070175439


One of the best pieces of advice I got from Jeremy Howard’s Deep Learning for Coders is that you should first attempt to overfit, then deal with that through regularization. It makes a lot of sense: once you have overfitting, you know you’ve juiced your model to the max. If you don’t, you don’t know whether there is still a lot of life left on it or it’s at the maximum performance it’s going to get. Let’s go for that overfitting.

Once a model stops improving with a particular learning rate, it’s often useful to reduce the learning rate and keep training.


history_pt2 = my_3rd_cnn.fit_generator(train_generator,
                                       class_weight = {0: 7, 1: 1},
plot_training_histories(history, history_pt2);


ax, precision, recall = plot_confusion_matrix(my_3rd_cnn, validation_generator_noshuffle)
print(precision, recall)
0.456221198157 0.868421052632


Great! We have improved quite a lot! Even better, judging by the loss/validation plot we have still quite a way to go in making the model better. However, in what direction should we take it? More fully connected layers? More convolutional ones? The possibility space is endless, so it’s hard to say. Additionally, the more complex we make the model, the more parameters we’ll have to train.

Turns out, there’s a technique that sidesteps both potential problems. I’ll show it to you in the next episode!

Further Reading

Deep Learning with Python: A great introductory book by François Chollet, author of Keras. Explains the practice first, then goes down to theory.’s course image classification part: Very good course for learning the concepts, even if they insist in using their own library.

Image preprocessing in Keras