{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Gradient Descent"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "%matplotlib inline\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Linear Algebra with Numpy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "a = np.array([1, 3, 2, 4])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "a"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "type(a)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A = np.array([[3, 1, 2],\n",
    "              [2, 3, 4]])\n",
    "\n",
    "B = np.array([[0, 1],\n",
    "              [2, 3],\n",
    "              [4, 5]])\n",
    "\n",
    "C = np.array([[0, 1],\n",
    "              [2, 3],\n",
    "              [4, 5],\n",
    "              [0, 1],\n",
    "              [2, 3],\n",
    "              [4, 5]])\n",
    "\n",
    "print(\"A is a {} matrix\".format(A.shape))\n",
    "print(\"B is a {} matrix\".format(B.shape))\n",
    "print(\"C is a {} matrix\".format(C.shape))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "C[2, 0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "B[:, 0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Elementwise operations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "3 * A"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A + A"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A * A"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A / A"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A - A"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Uncomment the code in the next cells. You will see that tensors of different shape cannot be added or multiplied:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# A + B"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# A * B"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Dot product"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "B.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A.dot(B)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.dot(A, B)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "B.dot(A)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "C.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "C.dot(A)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Uncomment the code in the next cell to visualize the error:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# A.dot(C)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Gradient descent"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](../data/banknotes.png)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.read_csv('../data/banknotes.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df['class'].value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import seaborn as sns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.pairplot(df, hue=\"class\");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Baseline model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.model_selection import train_test_split, cross_val_score\n",
    "from sklearn.ensemble import RandomForestClassifier\n",
    "from sklearn.preprocessing import scale"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X = scale(df.drop('class', axis=1).values)\n",
    "y = df['class'].values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = RandomForestClassifier()\n",
    "cross_val_score(model, X, y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Logistic Regression Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train, X_test, y_train, y_test = train_test_split(X, y,\n",
    "                                                    test_size=0.3,\n",
    "                                                    random_state=42)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import tensorflow.keras.backend as K\n",
    "from tensorflow.keras.models import Sequential\n",
    "from tensorflow.keras.layers import Dense, Activation\n",
    "from tensorflow.keras.optimizers import SGD"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "K.clear_session()\n",
    "\n",
    "model = Sequential()\n",
    "model.add(Dense(1, input_shape=(4,), activation='sigmoid'))\n",
    "\n",
    "model.compile(loss='binary_crossentropy',\n",
    "              optimizer='sgd',\n",
    "              metrics=['accuracy'])\n",
    "\n",
    "history = model.fit(X_train, y_train, epochs=10)\n",
    "result = model.evaluate(X_test, y_test, verbose=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "historydf = pd.DataFrame(history.history, index=history.epoch)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "historydf.plot(ylim=(0,1))\n",
    "plt.title(\"Test accuracy: {:3.1f} %\".format(result[1]*100), fontsize=15);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Learning Rates"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dflist = []\n",
    "\n",
    "learning_rates = [0.01, 0.05, 0.1, 0.5]\n",
    "\n",
    "for lr in learning_rates:\n",
    "\n",
    "    K.clear_session()\n",
    "\n",
    "    model = Sequential()\n",
    "    model.add(Dense(1, input_shape=(4,), activation='sigmoid'))\n",
    "    model.compile(loss='binary_crossentropy',\n",
    "                  optimizer=SGD(learning_rate=lr),\n",
    "                  metrics=['accuracy'])\n",
    "    h = model.fit(X_train, y_train, batch_size=16, epochs=10, verbose=0)\n",
    "    \n",
    "    dflist.append(pd.DataFrame(h.history, index=h.epoch))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "historydf = pd.concat(dflist, axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "historydf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "metrics_reported = dflist[0].columns\n",
    "idx = pd.MultiIndex.from_product([learning_rates, metrics_reported],\n",
    "                                 names=['learning_rate', 'metric'])\n",
    "\n",
    "historydf.columns = idx"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "historydf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(12,8))\n",
    "\n",
    "ax = plt.subplot(211)\n",
    "historydf.xs('loss', axis=1, level='metric').plot(ylim=(0,1), ax=ax)\n",
    "plt.title(\"Loss\")\n",
    "\n",
    "ax = plt.subplot(212)\n",
    "historydf.xs('accuracy', axis=1, level='metric').plot(ylim=(0,1), ax=ax)\n",
    "plt.title(\"Accuracy\")\n",
    "plt.xlabel(\"Epochs\")\n",
    "\n",
    "plt.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Batch Sizes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dflist = []\n",
    "\n",
    "batch_sizes = [16, 32, 64, 128]\n",
    "\n",
    "for batch_size in batch_sizes:\n",
    "    K.clear_session()\n",
    "\n",
    "    model = Sequential()\n",
    "    model.add(Dense(1, input_shape=(4,), activation='sigmoid'))\n",
    "    model.compile(loss='binary_crossentropy',\n",
    "                  optimizer='sgd',\n",
    "                  metrics=['accuracy'])\n",
    "    h = model.fit(X_train, y_train, batch_size=batch_size, epochs=10, verbose=0)\n",
    "    \n",
    "    dflist.append(pd.DataFrame(h.history, index=h.epoch))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "historydf = pd.concat(dflist, axis=1)\n",
    "metrics_reported = dflist[0].columns\n",
    "idx = pd.MultiIndex.from_product([batch_sizes, metrics_reported],\n",
    "                                 names=['batch_size', 'metric'])\n",
    "historydf.columns = idx"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "historydf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(12,8))\n",
    "\n",
    "ax = plt.subplot(211)\n",
    "historydf.xs('loss', axis=1, level='metric').plot(ylim=(0,1), ax=ax)\n",
    "plt.title(\"Loss\")\n",
    "\n",
    "ax = plt.subplot(212)\n",
    "historydf.xs('accuracy', axis=1, level='metric').plot(ylim=(0,1), ax=ax)\n",
    "plt.title(\"Accuracy\")\n",
    "plt.xlabel(\"Epochs\")\n",
    "\n",
    "plt.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Optimizers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from tensorflow.keras.optimizers import SGD, Adam, Adagrad, RMSprop"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dflist = []\n",
    "\n",
    "optimizers = ['SGD(learning_rate=0.01)',\n",
    "              'SGD(learning_rate=0.01, momentum=0.3)',\n",
    "              'SGD(learning_rate=0.01, momentum=0.3, nesterov=True)',  \n",
    "              'Adam(learning_rate=0.01)',\n",
    "              'Adagrad(learning_rate=0.01)',\n",
    "              'RMSprop(learning_rate=0.01)']\n",
    "\n",
    "for opt_name in optimizers:\n",
    "\n",
    "    K.clear_session()\n",
    "    \n",
    "    model = Sequential()\n",
    "    model.add(Dense(1, input_shape=(4,), activation='sigmoid'))\n",
    "    model.compile(loss='binary_crossentropy',\n",
    "                  optimizer=eval(opt_name),\n",
    "                  metrics=['accuracy'])\n",
    "    h = model.fit(X_train, y_train, batch_size=16, epochs=5, verbose=0)\n",
    "    \n",
    "    dflist.append(pd.DataFrame(h.history, index=h.epoch))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "historydf = pd.concat(dflist, axis=1)\n",
    "metrics_reported = dflist[0].columns\n",
    "idx = pd.MultiIndex.from_product([optimizers, metrics_reported],\n",
    "                                 names=['optimizers', 'metric'])\n",
    "historydf.columns = idx"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(12,8))\n",
    "\n",
    "ax = plt.subplot(211)\n",
    "historydf.xs('loss', axis=1, level='metric').plot(ylim=(0,1), ax=ax)\n",
    "plt.title(\"Loss\")\n",
    "\n",
    "ax = plt.subplot(212)\n",
    "historydf.xs('accuracy', axis=1, level='metric').plot(ylim=(0,1), ax=ax)\n",
    "plt.title(\"Accuracy\")\n",
    "plt.xlabel(\"Epochs\")\n",
    "\n",
    "plt.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Initialization\n",
    "\n",
    "https://keras.io/initializers/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dflist = []\n",
    "\n",
    "initializers = ['zeros', 'uniform', 'normal',\n",
    "                'he_normal', 'lecun_uniform']\n",
    "\n",
    "for init in initializers:\n",
    "\n",
    "    K.clear_session()\n",
    "\n",
    "    model = Sequential()\n",
    "    model.add(Dense(1, input_shape=(4,),\n",
    "                    kernel_initializer=init,\n",
    "                    activation='sigmoid'))\n",
    "\n",
    "    model.compile(loss='binary_crossentropy',\n",
    "                  optimizer='rmsprop',\n",
    "                  metrics=['accuracy'])\n",
    "\n",
    "    h = model.fit(X_train, y_train, batch_size=16, epochs=5, verbose=0)\n",
    "    \n",
    "    dflist.append(pd.DataFrame(h.history, index=h.epoch))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "historydf = pd.concat(dflist, axis=1)\n",
    "metrics_reported = dflist[0].columns\n",
    "idx = pd.MultiIndex.from_product([initializers, metrics_reported],\n",
    "                                 names=['initializers', 'metric'])\n",
    "\n",
    "historydf.columns = idx"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure(figsize=(12,8))\n",
    "\n",
    "ax = plt.subplot(211)\n",
    "historydf.xs('loss', axis=1, level='metric').plot(ylim=(0,1), ax=ax)\n",
    "plt.title(\"Loss\")\n",
    "\n",
    "ax = plt.subplot(212)\n",
    "historydf.xs('accuracy', axis=1, level='metric').plot(ylim=(0,1), ax=ax)\n",
    "plt.title(\"Accuracy\")\n",
    "plt.xlabel(\"Epochs\")\n",
    "\n",
    "plt.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Inner layer representation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "K.clear_session()\n",
    "\n",
    "model = Sequential()\n",
    "model.add(Dense(2, input_shape=(4,), activation='relu'))\n",
    "model.add(Dense(1, activation='sigmoid'))\n",
    "model.compile(loss='binary_crossentropy',\n",
    "              optimizer=RMSprop(learning_rate=0.01),\n",
    "              metrics=['accuracy'])\n",
    "\n",
    "h = model.fit(X_train, y_train, batch_size=16, epochs=20,\n",
    "              verbose=1, validation_split=0.3)\n",
    "result = model.evaluate(X_test, y_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.summary()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.layers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "inp = model.layers[0].input\n",
    "out = model.layers[0].output"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "inp"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "out"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "features_function = K.function([inp], [out])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "features_function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "features_function([X_test])[0].shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "features = features_function([X_test])[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.scatter(features[:, 0], features[:, 1], c=y_test, cmap='coolwarm')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "K.clear_session()\n",
    "\n",
    "model = Sequential()\n",
    "model.add(Dense(3, input_shape=(4,), activation='relu'))\n",
    "model.add(Dense(2, activation='relu'))\n",
    "model.add(Dense(1, activation='sigmoid'))\n",
    "model.compile(loss='binary_crossentropy',\n",
    "              optimizer=RMSprop(learning_rate=0.01),\n",
    "              metrics=['accuracy'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "inp = model.layers[0].input\n",
    "out = model.layers[1].output\n",
    "features_function = K.function([inp], [out])\n",
    "\n",
    "plt.figure(figsize=(15,10))\n",
    "\n",
    "for i in range(1, 26):\n",
    "    plt.subplot(5, 5, i)\n",
    "    h = model.fit(X_train, y_train, batch_size=16, epochs=1, verbose=0)\n",
    "    test_accuracy = model.evaluate(X_test, y_test, verbose=0)[1]\n",
    "    features = features_function([X_test])[0]\n",
    "    plt.scatter(features[:, 0], features[:, 1], c=y_test, cmap='coolwarm')\n",
    "    plt.xlim(-0.5, 3.5)\n",
    "    plt.ylim(-0.5, 4.0)\n",
    "    plt.title('Epoch: {}, Test Acc: {:3.1f} %'.format(i, test_accuracy * 100.0))\n",
    "\n",
    "plt.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 1\n",
    "\n",
    "You've just been hired at a wine company and they would like you to help them build a model that predicts the quality of their wine based on several measurements. They give you a dataset with wine\n",
    "\n",
    "- Load the ../data/wines.csv into Pandas\n",
    "- Use the column called \"Class\" as target\n",
    "- Check how many classes are there in target, and if necessary use dummy columns for a multi-class classification\n",
    "- Use all the other columns as features, check their range and distribution (using seaborn pairplot)\n",
    "- Rescale all the features using either MinMaxScaler or StandardScaler\n",
    "- Build a deep model with at least 1 hidden layer to classify the data\n",
    "- Choose the cost function, what will you use? Mean Squared Error? Binary Cross-Entropy? Categorical Cross-Entropy?\n",
    "- Choose an optimizer\n",
    "- Choose a value for the learning rate, you may want to try with several values\n",
    "- Choose a batch size\n",
    "- Train your model on all the data using a `validation_split=0.2`. Can you converge to 100% validation accuracy?\n",
    "- What's the minumum number of epochs to converge?\n",
    "- Repeat the training several times to verify how stable your results are"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 2\n",
    "\n",
    "Since this dataset has 13 features we can only visualize pairs of features like we did in the Paired plot. We could however exploit the fact that a neural network is a function to extract 2 high level features to represent our data.\n",
    "\n",
    "- Build a deep fully connected network with the following structure:\n",
    "    - Layer 1: 8 nodes\n",
    "    - Layer 2: 5 nodes\n",
    "    - Layer 3: 2 nodes\n",
    "    - Output : 3 nodes\n",
    "- Choose activation functions, inizializations, optimizer and learning rate so that it converges to 100% accuracy within 20 epochs (not easy)\n",
    "- Remember to train the model on the scaled data\n",
    "- Define a Feature Funtion like we did above between the input of the 1st layer and the output of the 3rd layer\n",
    "- Calculate the features and plot them on a 2-dimensional scatter plot\n",
    "- Can we distinguish the 3 classes well?\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 3\n",
    "\n",
    "Keras functional API. So far we've always used the Sequential model API in Keras. However, Keras also offers a Functional API, which is much more powerful. You can find its [documentation here](https://keras.io/getting-started/functional-api-guide/). Let's see how we can leverage it.\n",
    "\n",
    "- define an input layer called `inputs`\n",
    "- define two hidden layers as before, one with 8 nodes, one with 5 nodes\n",
    "- define a `second_to_last` layer with 2 nodes\n",
    "- define an output layer with 3 nodes\n",
    "- create a model that connect input and output\n",
    "- train it and make sure that it converges\n",
    "- define a function between inputs and second_to_last layer\n",
    "- recalculate the features and plot them"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 4 \n",
    "\n",
    "Keras offers the possibility to call a function at each epoch. These are Callbacks, and their [documentation is here](https://keras.io/callbacks/). Callbacks allow us to add some neat functionality. In this exercise we'll explore a few of them.\n",
    "\n",
    "- Split the data into train and test sets with a test_size = 0.3 and random_state=42\n",
    "- Reset and recompile your model\n",
    "- train the model on the train data using `validation_data=(X_test, y_test)`\n",
    "- Use the `EarlyStopping` callback to stop your training if the `val_loss` doesn't improve\n",
    "- Use the `ModelCheckpoint` callback to save the trained model to disk once training is finished\n",
    "- Use the `TensorBoard` callback to output your training information to a `/tmp/` subdirectory\n",
    "- Watch the next video for an overview of tensorboard"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}