Coke vs. Pepsi: A deep learning brand logo classifier built from scratch

Goal

The goal of this exercise is to train a (deep) neural net from scratch to classify brand logo images. Specifically, the network should detect whether an image contains the Cocal Cola or the Pepsi logo.

To easily train our model, we rely on the open source Python library keras that has been ported to R.

# load keras
library(keras)

# for reproducibility
myseed <- 42
set.seed(myseed)
use_session_with_seed(myseed, disable_gpu = FALSE, disable_parallel_cpu = FALSE)

## Set session seed to 42

The data

The dataset is based on the Flickr Logos 27 dataset¹. After some preprocessing (see flickr27_data_preprocessing.R), 120 training and 40 validation images were sampled for both Pepsi and Coke (70:30 split).

Let’s check out some of the images.

Preparation

The data is stored in different folders for training, validation, and test, and in each folder there exists a subfolder with images of Pepsi and Coke, respectively. Thus, we have to set the corresponding directories first.

# set data directories (i.e., location of images)
#
# beware of the folder structure!
# the labels (i.e., brand names) will be inferred from the folder names
train_directory <- "flickr27/training/"
val_directory <- "flickr27/validation/"
test_directory <- "flickr27/test/"

# how many images in training, validatiton and test set
train_samples = length(list.files(path = train_directory, recursive = TRUE))
validation_samples = length(list.files(path = val_directory, recursive = TRUE))
test_samples = length(list.files(path = test_directory, recursive = TRUE))

Next, we have to decide on the image size and the batch size (because we will sample subsets of images for training, see below).

# set parameters: image width and height to use for the model
img_width <- 100
img_height <- 100

# batch size: how many samples (i.e., images) should be used in one training iteration
batch_size <- 10

Because we have more than just few images, it makes sense to sample images on the fly when they are needed instead of loading all images at once. To this end, we can define an image_data_generator function that performs specific operations when loading the images. Here, we just scale the values between 0 and 1. Note that we could define much more parameters here if we would want to do data augmentation.

datagen_train <- image_data_generator(rescale = 1/255)
datagen_val <- image_data_generator(rescale = 1/255)

Note that there are three subfolders within the training, test, and validation folders (Cocacola, Pepsi, Heineken). We’ll need all three for a later example. However, for now we just want to use the images from Pepsi and Coke. Therefore, we define a list of classes we want to use in our model.

Furthermore, we define the flow_images_from_directory function which samples images from a folder with the given parameters (i.e., data generator function, image size, batch size, …). Because we are doing a binary classification task, we also have to specify binary as class mode.

class_list <- c("Pepsi", "Cocacola")

train_generator <- flow_images_from_directory(train_directory, generator = datagen_train,
                                              target_size = c(img_width, img_height),
                                              class_mode = "binary", batch_size = batch_size,
                                              classes = class_list,
                                              seed = myseed)

validation_generator <- flow_images_from_directory(val_directory, generator = datagen_val,
                                                   target_size = c(img_width, img_height),
                                                   class_mode = "binary", batch_size = batch_size,
                                                   classes = class_list,
                                                   seed = myseed)

As a sanity check, we test whether all 120 training images were detected.

# check label coding
train_generator$class_indices

## $Pepsi
## [1] 0
## 
## $Cocacola
## [1] 1

# note that coke is coded as 1 and pepsi as 0
table(train_generator$classes)

## 
##   0   1 
## 120 120

Build the neural network

Building a deep learning model includes two steps. First, you have to specify your model’s network architecture (i.e., define the layers). Second, you have to compile the model.

Setup the model layers

Setting up layers usually means chaining together simple layers (where the layers’ parameters are learned during training). For this example, we use two hidden layers.

# define our neural network
model <- keras_model_sequential() %>%
  # first hidden (convolutional) layer
  layer_conv_2d(filters = 32, kernel_size = c(3, 3), activation = "relu", input_shape = c(img_width, img_height, 3)) %>%
  # max pooling
  layer_max_pooling_2d(pool_size = c(2, 2)) %>%
  # second hidden (convolutional) layer
  layer_conv_2d(filters = 64, kernel_size = c(3, 3), activation = "relu") %>%
  # max pooling
  layer_max_pooling_2d(pool_size = c(2, 2)) %>%
  # flatten max filtered outpot into (dense) feature vector
  layer_flatten() %>%
  layer_dense(units = 128, activation = "relu") %>%
  # output from dense layer are projected onto output layer
  layer_dense(units = 1, activation = "sigmoid")

Note that the last layer only hase one unit (or “neuron”) and a sigmoid activation function. This is because we have a binary classification task and therefore want to predict a single class probability score (in our example the probability of having the class Cocacola).

We can also plot the model using Andrie de Vries’ deepviz package.

# devtools::install_github("andrie/deepviz")
deepviz::plot_model(model)

Compile the model

Next, we compile the model with the appropriate settings. We need to define the loss function, an optimizer, and the metric with which we want to monitor the training steps (here accuracy). Note that because of the binary classification, we have to specify binary_crossentropy for the loss function.

# compile the model
# 
# note that we use "binary_crossentropy" as loss function because of our binary
# classification task
model %>% compile(
  loss = "binary_crossentropy",
  optimizer = optimizer_rmsprop(lr = 1e-3, decay = 1e-6),
  metrics = "accuracy"
)

Train the model

Finally, we train our neural network. Note that depending on your hardware setup this may take a while. If you’re using RStudio the training progress is automatically visualized.

# train the model (this may take some time ...)
# 
# note that the number of epochs defines how many iterations should be done.
hist <- model %>% fit_generator(
  train_generator,
  steps_per_epoch = as.integer(train_samples/batch_size), 
  epochs = 10,
  validation_data = validation_generator,
  validation_steps = as.integer(validation_samples/batch_size)
)

# You can also plot the results of the training
plot(hist)

Model evaluation

After training the model, its performance can be evaluated and predictions can be made.

Make predictions

First, we want to make predictions on two simple images downloaded from Google. To this end, we define a image prediction function. The funtcion does some image preprocessing (so that the data fit the data format with which the network was trained) and predicts the probability that the image contains a coke logo (remember that coke was coded as 1 and pepsi as zero, see above).

pred_img <- function(path) {
  img <- image_load(path, target_size = c(img_width, img_height))
  x <- image_to_array(img)
  x <- x/255
  x <- array_reshape(x, c(1, dim(x)))
  return(paste0(round(model %>% predict(x)*100,3),"%"))
}

Let’s see the verdict.

# create a helper function to plot an image
img_plot <- function(path){
    img <- image_load(path, target_size = c(img_width, img_height))
  x <- image_to_array(img)
  x <- x/255
  grid::grid.raster(x)
}

# reminder: Coca Cola: 1, Pepsi: 0
# pred_img gives prob of the image containing a coca Cola logo
img_plot("cocacola.jpg")

pred_img("cocacola.jpg")

## [1] "99.972%"

img_plot("pepsi.jpg")

pred_img("pepsi.jpg")

## [1] "0%"

Test set performance

We can also make predictions on the official test set of the Flickr 27 dataset. To this end, we first have to create again a generator function that samples the images from the corresponding directory. Then, we can evaluate the model’s performance on the test dataset.

Here are some of the test images.

Let’s see the verdict!

# test on 10 images from the test set
test_generator <- flow_images_from_directory(
  test_directory, generator = datagen_val,
  target_size = c(img_width, img_height),
  class_mode = "binary", batch_size = batch_size,
  classes = class_list,
  seed = myseed)

test_performance <- model %>% evaluate_generator(test_generator, steps = 5)
print(paste0("Test accuracy: ", round(test_performance$acc*100,4), "%"))

## [1] "Test accuracy: 50%"

The example is inspired by Florian Teschner’s blog posts on deep learning (see https://flovv.github.io/Logo_detection_deep_learning/ and https://flovv.github.io/Logo_detection_deep_learning_part2/).↩