How to correctly use ImageDataGenerator in Keras?

0

I am playing with augmentation of data in Keras lately and I am using basic ImageDataGenerator. I learned the hard way it is actually a generator, not iterator (because type(train_aug_ds) gives <class 'keras.preprocessing.image.DirectoryIterator'> I thought it is an iterator). I also checked few blogs about using it, but they don't answer all my questions.

So, I loaded my data like this:

train_aug = ImageDataGenerator(
    rescale=1./255,
    horizontal_flip=True,
    height_shift_range=0.1,
    width_shift_range=0.1,
    brightness_range=(0.5,1.5),
    zoom_range = [1, 1.5],
)
train_aug_ds = train_aug.flow_from_directory(
    directory='./train',
    target_size=image_size,
    batch_size=batch_size,
)

And to train my model I did the following:

model.fit(
    train_aug_ds,
    epochs=150,
    validation_data=(valid_aug_ds,),
)

And it worked. I am a bit confused how it works, because train_aug_ds is generator, so it should give infinitely big dataset. And documentation says:

When passing an infinitely repeating dataset, you must specify the steps_per_epoch argument.

Which I didn't do, yet, it works. Does it somehow infer number of steps? Also, does it use only augmented data, or it also uses non-augmented images in batch?

So basically, my question is how to use this generator correctly with function fit to have all data in my training set, including original, non-augmented images and augmented images, and to cycle through it several times/steps (right now it seems it does only one step per epoch)?

keras python tensorflow
2021-11-23 11:26:56
1

1

I think the documentation can be quite confusing and I imagine the behavior is different depending on your Tensorflow and Keras version. For example, in this post, the user is describing the exact behavior you are expecting. Generally, the flow_from_directory() method allows you to read the images directly from a directory and augment them while your model is being trained and as already stated here, it iterates for every sample in each folder every epoch. Using the following example, you can check that this is the case (on TF 2.7) by looking at the steps per epoch in the progress bar:

import tensorflow as tf

BATCH_SIZE = 64

flowers = tf.keras.utils.get_file(
    'flower_photos',
    'https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz',
    untar=True)

img_gen = tf.keras.preprocessing.image.ImageDataGenerator(
    rescale=1./255,
    horizontal_flip=True,
)

train_ds = img_gen.flow_from_directory(flowers, batch_size=BATCH_SIZE, shuffle=True, class_mode='sparse')
num_classes = 5

model = tf.keras.Sequential([
  tf.keras.layers.Conv2D(16, 3, padding='same', activation='relu', input_shape=(256, 256, 3)),
  tf.keras.layers.MaxPooling2D(),
  tf.keras.layers.Conv2D(32, 3, padding='same', activation='relu'),
  tf.keras.layers.MaxPooling2D(),
  tf.keras.layers.Conv2D(64, 3, padding='same', activation='relu'),
  tf.keras.layers.MaxPooling2D(),
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(num_classes)
])

model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True))

epochs=10
history = model.fit(
  train_ds,
  epochs=epochs
)
Found 3670 images belonging to 5 classes.
Epoch 1/10
 6/58 [==>...........................] - ETA: 3:02 - loss: 2.0608

If you wrap flow_from_directory with tf.data.Dataset.from_generator like this:

train_ds = tf.data.Dataset.from_generator(
    lambda: img_gen.flow_from_directory(flowers, batch_size=BATCH_SIZE, shuffle=True, class_mode='sparse'),
    output_types=(tf.float32, tf.float32))

You will notice that the progress bar looks like this because steps_per_epoch has not been explicitly defined:

Epoch 1/10
Found 3670 images belonging to 5 classes.
     29/Unknown - 104s 4s/step - loss: 2.0364

And if you add this parameter, you will see the steps in the progress bar:

history = model.fit(
  train_ds,
  steps_per_epoch = len(from_directory),
  epochs=epochs
)
Found 3670 images belonging to 5 classes.
Epoch 1/10
 3/58 [>.............................] - ETA: 3:19 - loss: 4.1357

Finally, to your question:

How to use this generator correctly with function fit to have all data in my training set, including original, non-augmented images and augmented images, and to cycle through it several times/step?

You can simply increase the steps_per_epoch beyond number of samples // batch_size by multiplying by some factor:

history = model.fit(
  train_ds,
  steps_per_epoch = len(from_directory)*2,
  epochs=epochs
)
Found 3670 images belonging to 5 classes.
Epoch 1/10
  1/116 [..............................] - ETA: 12:11 - loss: 1.5885

Now instead of 58 steps per epoch you have 116.

2021-11-23 17:22:32

In other languages

This page is in other languages

Русский
..................................................................................................................
Italiano
..................................................................................................................
Polski
..................................................................................................................
Română
..................................................................................................................
한국어
..................................................................................................................
हिन्दी
..................................................................................................................
Français
..................................................................................................................
Türk
..................................................................................................................
Česk
..................................................................................................................
Português
..................................................................................................................
ไทย
..................................................................................................................
中文
..................................................................................................................
Español
..................................................................................................................
Slovenský
..................................................................................................................