{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Gradient Descent"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "%matplotlib inline\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 1\n",
    "\n",
    "You've just been hired at a wine company and they would like you to help them build a model that predicts the quality of their wine based on several measurements. They give you a dataset with wine\n",
    "\n",
    "- Load the ../data/wines.csv into Pandas\n",
    "- Use the column called \"Class\" as target\n",
    "- Check how many classes are there in target, and if necessary use dummy columns for a multi-class classification\n",
    "- Use all the other columns as features, check their range and distribution (using seaborn pairplot)\n",
    "- Rescale all the features using either MinMaxScaler or StandardScaler\n",
    "- Build a deep model with at least 1 hidden layer to classify the data\n",
    "- Choose the cost function, what will you use? Mean Squared Error? Binary Cross-Entropy? Categorical Cross-Entropy?\n",
    "- Choose an optimizer\n",
    "- Choose a value for the learning rate, you may want to try with several values\n",
    "- Choose a batch size\n",
    "- Train your model on all the data using a `validation_split=0.2`. Can you converge to 100% validation accuracy?\n",
    "- What's the minumum number of epochs to converge?\n",
    "- Repeat the training several times to verify how stable your results are"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_csv('../data/wines.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y = df['Class']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y.value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_cat = pd.get_dummies(y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_cat.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X = df.drop('Class', axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import seaborn as sns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.pairplot(df, hue='Class')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.preprocessing import StandardScaler"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sc = StandardScaler()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "Xsc = sc.fit_transform(X)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from tensorflow.keras.models import Sequential\n",
    "from tensorflow.keras.layers import Dense\n",
    "from tensorflow.keras.optimizers import SGD, Adam, Adadelta, RMSprop\n",
    "import tensorflow.keras.backend as K"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "K.clear_session()\n",
    "model = Sequential()\n",
    "model.add(Dense(5, input_shape=(13,),\n",
    "                kernel_initializer='he_normal',\n",
    "                activation='relu'))\n",
    "model.add(Dense(3, activation='softmax'))\n",
    "\n",
    "model.compile(RMSprop(learning_rate=0.1),\n",
    "              'categorical_crossentropy',\n",
    "              metrics=['accuracy'])\n",
    "\n",
    "model.fit(Xsc, y_cat.values,\n",
    "          batch_size=8,\n",
    "          epochs=10,\n",
    "          verbose=1,\n",
    "          validation_split=0.2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 2\n",
    "\n",
    "Since this dataset has 13 features we can only visualize pairs of features like we did in the Paired plot. We could however exploit the fact that a neural network is a function to extract 2 high level features to represent our data.\n",
    "\n",
    "- Build a deep fully connected network with the following structure:\n",
    "    - Layer 1: 8 nodes\n",
    "    - Layer 2: 5 nodes\n",
    "    - Layer 3: 2 nodes\n",
    "    - Output : 3 nodes\n",
    "- Choose activation functions, inizializations, optimizer and learning rate so that it converges to 100% accuracy within 20 epochs (not easy)\n",
    "- Remember to train the model on the scaled data\n",
    "- Define a Feature Function like we did above between the input of the 1st layer and the output of the 3rd layer\n",
    "- Calculate the features and plot them on a 2-dimensional scatter plot\n",
    "- Can we distinguish the 3 classes well?\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "K.clear_session()\n",
    "model = Sequential()\n",
    "model.add(Dense(8, input_shape=(13,),\n",
    "                kernel_initializer='he_normal', activation='tanh'))\n",
    "model.add(Dense(5, kernel_initializer='he_normal', activation='tanh'))\n",
    "model.add(Dense(2, kernel_initializer='he_normal', activation='tanh'))\n",
    "model.add(Dense(3, activation='softmax'))\n",
    "\n",
    "model.compile(RMSprop(learning_rate=0.05),\n",
    "              'categorical_crossentropy',\n",
    "              metrics=['accuracy'])\n",
    "\n",
    "model.fit(Xsc, y_cat.values,\n",
    "          batch_size=16,\n",
    "          epochs=20,\n",
    "          verbose=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "inp = model.layers[0].input\n",
    "out = model.layers[2].output"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "features_function = K.function([inp], [out])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "features = features_function([Xsc])[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "features.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.scatter(features[:, 0], features[:, 1], c=y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 3\n",
    "\n",
    "Keras functional API. So far we've always used the Sequential model API in Keras. However, Keras also offers a Functional API, which is much more powerful. You can find its [documentation here](https://keras.io/getting-started/functional-api-guide/). Let's see how we can leverage it.\n",
    "\n",
    "- define an input layer called `inputs`\n",
    "- define two hidden layers as before, one with 8 nodes, one with 5 nodes\n",
    "- define a `second_to_last` layer with 2 nodes\n",
    "- define an output layer with 3 nodes\n",
    "- create a model that connect input and output\n",
    "- train it and make sure that it converges\n",
    "- define a function between inputs and second_to_last layer\n",
    "- recalculate the features and plot them"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from tensorflow.keras.layers import Input\n",
    "from tensorflow.keras.models import Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "K.clear_session()\n",
    "\n",
    "inputs = Input(shape=(13,))\n",
    "x = Dense(8, kernel_initializer='he_normal', activation='tanh')(inputs)\n",
    "x = Dense(5, kernel_initializer='he_normal', activation='tanh')(x)\n",
    "second_to_last = Dense(2, kernel_initializer='he_normal',\n",
    "                       activation='tanh')(x)\n",
    "outputs = Dense(3, activation='softmax')(second_to_last)\n",
    "\n",
    "model = Model(inputs=inputs, outputs=outputs)\n",
    "\n",
    "model.compile(RMSprop(learning_rate=0.05),\n",
    "              'categorical_crossentropy',\n",
    "              metrics=['accuracy'])\n",
    "\n",
    "model.fit(Xsc, y_cat.values, batch_size=16, epochs=20, verbose=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "features_function = K.function([inputs], [second_to_last])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "features = features_function([Xsc])[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.scatter(features[:, 0], features[:, 1], c=y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise 4 \n",
    "\n",
    "Keras offers the possibility to call a function at each epoch. These are Callbacks, and their [documentation is here](https://keras.io/callbacks/). Callbacks allow us to add some neat functionality. In this exercise we'll explore a few of them.\n",
    "\n",
    "- Split the data into train and test sets with a test_size = 0.3 and random_state=42\n",
    "- Reset and recompile your model\n",
    "- train the model on the train data using `validation_data=(X_test, y_test)`\n",
    "- Use the `EarlyStopping` callback to stop your training if the `val_loss` doesn't improve\n",
    "- Use the `ModelCheckpoint` callback to save the trained model to disk once training is finished\n",
    "- Use the `TensorBoard` callback to output your training information to a `/tmp/` subdirectory\n",
    "- Watch the next video for an overview of tensorboard"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping, TensorBoard"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "checkpointer = ModelCheckpoint(filepath=\"/tmp/udemy/weights.hdf5\",\n",
    "                               verbose=1, save_best_only=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "earlystopper = EarlyStopping(monitor='val_loss', min_delta=0,\n",
    "                             patience=1, verbose=1, mode='auto')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tensorboard = TensorBoard(log_dir='/tmp/udemy/tensorboard/')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.model_selection import train_test_split"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train, X_test, y_train, y_test = train_test_split(Xsc, y_cat.values,\n",
    "                                                    test_size=0.3,\n",
    "                                                    random_state=42)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "K.clear_session()\n",
    "\n",
    "inputs = Input(shape=(13,))\n",
    "\n",
    "x = Dense(8, kernel_initializer='he_normal', activation='tanh')(inputs)\n",
    "x = Dense(5, kernel_initializer='he_normal', activation='tanh')(x)\n",
    "second_to_last = Dense(2, kernel_initializer='he_normal',\n",
    "                       activation='tanh')(x)\n",
    "outputs = Dense(3, activation='softmax')(second_to_last)\n",
    "\n",
    "model = Model(inputs=inputs, outputs=outputs)\n",
    "\n",
    "model.compile(RMSprop(learning_rate=0.05), 'categorical_crossentropy',\n",
    "              metrics=['accuracy'])\n",
    "\n",
    "model.fit(X_train, y_train, batch_size=32,\n",
    "          epochs=20, verbose=2,\n",
    "          validation_data=(X_test, y_test),\n",
    "          callbacks=[checkpointer, earlystopper, tensorboard])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Run Tensorboard with the command:\n",
    "\n",
    "    tensorboard --logdir /tmp/udemy/tensorboard/\n",
    "    \n",
    "and open your browser at http://localhost:6006"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}