CVML-MachineLearning/course/6 Convolutional Neural Networks.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Convolutional Neural Networks"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Machine learning on images"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "%matplotlib inline\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### MNIST"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from tensorflow.keras.datasets import mnist"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "(X_train, y_train), (X_test, y_test) = mnist.load_data('/tmp/mnist.npz')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_test.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.imshow(X_train[0], cmap='gray')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train = X_train.reshape(-1, 28*28)\n",
    "X_test = X_test.reshape(-1, 28*28)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train = X_train.astype('float32')\n",
    "X_test = X_test.astype('float32')\n",
    "X_train /= 255.0\n",
    "X_test /= 255.0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from tensorflow.keras.utils import to_categorical"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_train_cat = to_categorical(y_train)\n",
    "y_test_cat = to_categorical(y_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_train[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_train_cat[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_train_cat.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_test_cat.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Fully connected on images"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from tensorflow.keras.models import Sequential\n",
    "from tensorflow.keras.layers import Dense\n",
    "import tensorflow.keras.backend as K\n",
    "\n",
    "# K.clear_session()\n",
    "\n",
    "model = Sequential()\n",
    "model.add(Dense(512, input_dim=28*28, activation='relu'))\n",
    "model.add(Dense(256, activation='relu'))\n",
    "model.add(Dense(128, activation='relu'))\n",
    "model.add(Dense(32, activation='relu'))\n",
    "model.add(Dense(10, activation='softmax'))\n",
    "model.compile(loss='categorical_crossentropy',\n",
    "              optimizer='rmsprop',\n",
    "              metrics=['accuracy'])\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "h = model.fit(X_train, y_train_cat, batch_size=128, epochs=10, verbose=1, validation_split=0.3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.plot(h.history['accuracy'])\n",
    "plt.plot(h.history['val_accuracy'])\n",
    "plt.legend(['Training', 'Validation'])\n",
    "plt.title('Accuracy')\n",
    "plt.xlabel('Epochs');"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test_accuracy = model.evaluate(X_test, y_test_cat)[1]\n",
    "test_accuracy"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Tensor Math"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A = np.random.randint(10, size=(2, 3, 4, 5))\n",
    "B = np.random.randint(10, size=(2, 3))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A[0, 1, 0, 3]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "B"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### A random colored image"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "img = np.random.randint(255, size=(4, 4, 3), dtype='uint8')\n",
    "img"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(5, 5))\n",
    "plt.subplot(221)\n",
    "plt.imshow(img)\n",
    "plt.title(\"All Channels combined\")\n",
    "\n",
    "plt.subplot(222)\n",
    "plt.imshow(img[:, : , 0], cmap='Reds')\n",
    "plt.title(\"Red channel\")\n",
    "\n",
    "plt.subplot(223)\n",
    "plt.imshow(img[:, : , 1], cmap='Greens')\n",
    "plt.title(\"Green channel\")\n",
    "\n",
    "plt.subplot(224)\n",
    "plt.imshow(img[:, : , 2], cmap='Blues')\n",
    "plt.title(\"Blue channel\");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Tensor operations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "2 * A"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A + A"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "B.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.tensordot(A, B, axes=([0, 1], [0, 1]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.tensordot(A, B, axes=([0], [0])).shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1D convolution"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "a = np.array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0], dtype='float32')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "b = np.array([-1, 1], dtype='float32')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "c = np.convolve(a, b)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "a"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "c"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.subplot(211)\n",
    "plt.plot(a, 'o-')\n",
    "\n",
    "plt.subplot(212)\n",
    "plt.plot(c, 'o-');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Image filters with convolutions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from scipy.ndimage.filters import convolve\n",
    "from scipy.signal import convolve2d\n",
    "from scipy import misc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "img = misc.ascent()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "img.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.imshow(img, cmap='gray');"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "h_kernel = np.array([[ 1,  2,  1],\n",
    "                     [ 0,  0,  0],\n",
    "                     [-1, -2, -1]])\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "plt.imshow(h_kernel, cmap='gray');"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "res = convolve2d(img, h_kernel)\n",
    "\n",
    "plt.imshow(res, cmap='gray');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Convolutional neural networks"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from tensorflow.keras.layers import Conv2D"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "img.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(5, 5))\n",
    "plt.imshow(img, cmap='gray');"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "img_tensor = img.reshape((1, 512, 512, 1))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = Sequential()\n",
    "model.add(Conv2D(1, (3, 3), strides=(2,1), input_shape=(512, 512, 1)))\n",
    "model.compile('adam', 'mse')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "img_pred_tensor = model.predict(img_tensor)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "img_pred_tensor.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "img_pred = img_pred_tensor[0, :, :, 0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.imshow(img_pred, cmap='gray');"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "weights = model.get_weights()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "weights[0].shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.imshow(weights[0][:, :, 0, 0], cmap='gray');"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "weights[0] = np.ones(weights[0].shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.set_weights(weights)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "img_pred_tensor = model.predict(img_tensor)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "img_pred = img_pred_tensor[0, :, :, 0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.imshow(img_pred, cmap='gray');"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = Sequential()\n",
    "model.add(Conv2D(1, (3, 3), input_shape=(512, 512, 1), padding='same'))\n",
    "model.compile('adam', 'mse')\n",
    "\n",
    "img_pred_tensor = model.predict(img_tensor)\n",
    "\n",
    "\n",
    "img_pred_tensor.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pooling layers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from tensorflow.keras.layers import MaxPool2D, AvgPool2D"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = Sequential()\n",
    "model.add(MaxPool2D((5, 5), input_shape=(512, 512, 1)))\n",
    "model.compile('adam', 'mse')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "img_pred = model.predict(img_tensor)[0, :, :, 0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.imshow(img_pred, cmap='gray')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = Sequential()\n",
    "model.add(AvgPool2D((5, 5), input_shape=(512, 512, 1)))\n",
    "model.compile('adam', 'mse')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "img_pred = model.predict(img_tensor)[0, :, :, 0]\n",
    "plt.imshow(img_pred, cmap='gray');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Final architecture"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train = X_train.reshape(-1, 28, 28, 1)\n",
    "X_test = X_test.reshape(-1, 28, 28, 1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from tensorflow.keras.layers import Flatten, Activation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "K.clear_session()\n",
    "\n",
    "model = Sequential()\n",
    "\n",
    "model.add(Conv2D(32, (3, 3), input_shape=(28, 28, 1)))\n",
    "model.add(MaxPool2D(pool_size=(2, 2)))\n",
    "model.add(Activation('relu'))\n",
    "\n",
    "model.add(Flatten())\n",
    "\n",
    "model.add(Dense(128, activation='relu'))\n",
    "\n",
    "model.add(Dense(10, activation='softmax'))\n",
    "\n",
    "model.compile(loss='categorical_crossentropy',\n",
    "              optimizer='rmsprop',\n",
    "              metrics=['accuracy'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.fit(X_train, y_train_cat, batch_size=128,\n",
    "          epochs=2, verbose=1, validation_split=0.3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.evaluate(X_test, y_test_cat)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "### Exercise 1\n",
    "You've been hired by a shipping company to overhaul the way they route mail, parcels and packages. They want to build an image recognition system  capable of recognizing the digits in the zipcode on a package, so that it can be automatically routed to the correct location.\n",
    "You are tasked to build the digit recognition system. Luckily, you can rely on the MNIST dataset for the intial training of your model!\n",
    "\n",
    "Build a deep convolutional neural network with at least two convolutional and two pooling layers before the fully connected layer.\n",
    "\n",
    "- Start from the network we have just built\n",
    "- Insert a `Conv2D` layer after the first `MaxPool2D`, give it 64 filters.\n",
    "- Insert a `MaxPool2D` after that one\n",
    "- Insert an `Activation` layer\n",
    "- retrain the model\n",
    "- does performance improve?\n",
    "- how many parameters does this new model have? More or less than the previous model? Why?\n",
    "- how long did this second model take to train? Longer or shorter than the previous model? Why?\n",
    "- did it perform better or worse than the previous model?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 2\n",
    "\n",
    "Pleased with your performance with the digits recognition task, your boss decides to challenge you with a harder task. Their online branch allows people to upload images to a website that generates and prints a postcard that is shipped to destination. Your boss would like to know what images people are loading on the site in order to provide targeted advertising on the same page, so he asks you to build an image recognition system capable of recognizing a few objects. Luckily for you, there's a dataset ready made with a collection of labeled images. This is the [Cifar 10 Dataset](http://www.cs.toronto.edu/~kriz/cifar.html), a very famous dataset that contains images for 10 different categories:\n",
    "\n",
    "- airplane \t\t\t\t\t\t\t\t\t\t\n",
    "- automobile \t\t\t\t\t\t\t\t\t\t\n",
    "- bird \t\t\t\t\t\t\t\t\t\t\n",
    "- cat \t\t\t\t\t\t\t\t\t\t\n",
    "- deer \t\t\t\t\t\t\t\t\t\t\n",
    "- dog \t\t\t\t\t\t\t\t\t\t\n",
    "- frog \t\t\t\t\t\t\t\t\t\t\n",
    "- horse \t\t\t\t\t\t\t\t\t\t\n",
    "- ship \t\t\t\t\t\t\t\t\t\t\n",
    "- truck\n",
    "\n",
    "In this exercise we will reach the limit of what you can achieve on your laptop and get ready for the next session on cloud GPUs.\n",
    "\n",
    "Here's what you have to do:\n",
    "- load the cifar10 dataset using `keras.datasets.cifar10.load_data()`\n",
    "- display a few images, see how hard/easy it is for you to recognize an object with such low resolution\n",
    "- check the shape of X_train, does it need reshape?\n",
    "- check the scale of X_train, does it need rescaling?\n",
    "- check the shape of y_train, does it need reshape?\n",
    "- build a model with the following architecture, and choose the parameters and activation functions for each of the layers:\n",
    "    - conv2d\n",
    "    - conv2d\n",
    "    - maxpool\n",
    "    - conv2d\n",
    "    - conv2d\n",
    "    - maxpool\n",
    "    - flatten\n",
    "    - dense\n",
    "    - output\n",
    "- compile the model and check the number of parameters\n",
    "- attempt to train the model with the optimizer of your choice. How fast does training proceed?\n",
    "- If training is too slow (as expected) stop the execution and move to the next session!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from tensorflow.keras.datasets import cifar10"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}