{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Gradient Descent" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "%matplotlib inline\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Linear Algebra with Numpy" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = np.array([1, 3, 2, 4])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "type(a)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "A = np.array([[3, 1, 2],\n", " [2, 3, 4]])\n", "\n", "B = np.array([[0, 1],\n", " [2, 3],\n", " [4, 5]])\n", "\n", "C = np.array([[0, 1],\n", " [2, 3],\n", " [4, 5],\n", " [0, 1],\n", " [2, 3],\n", " [4, 5]])\n", "\n", "print(\"A is a {} matrix\".format(A.shape))\n", "print(\"B is a {} matrix\".format(B.shape))\n", "print(\"C is a {} matrix\".format(C.shape))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "A[0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "C[2, 0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "B[:, 0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Elementwise operations" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "3 * A" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "A + A" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "A * A" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "A / A" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "A - A" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Uncomment the code in the next cells. You will see that tensors of different shape cannot be added or multiplied:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# A + B" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# A * B" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Dot product" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "A.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "B.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "A.dot(B)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.dot(A, B)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "B.dot(A)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "C.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "A.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "C.dot(A)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Uncomment the code in the next cell to visualize the error:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# A.dot(C)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Gradient descent" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![](../data/banknotes.png)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv('../data/banknotes.csv')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df['class'].value_counts()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import seaborn as sns" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sns.pairplot(df, hue=\"class\");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Baseline model" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sklearn.model_selection import train_test_split, cross_val_score\n", "from sklearn.ensemble import RandomForestClassifier\n", "from sklearn.preprocessing import scale" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "X = scale(df.drop('class', axis=1).values)\n", "y = df['class'].values" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model = RandomForestClassifier()\n", "cross_val_score(model, X, y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Logistic Regression Model" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "X_train, X_test, y_train, y_test = train_test_split(X, y,\n", " test_size=0.3,\n", " random_state=42)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import tensorflow.keras.backend as K\n", "from tensorflow.keras.models import Sequential\n", "from tensorflow.keras.layers import Dense, Activation\n", "from tensorflow.keras.optimizers import SGD" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "K.clear_session()\n", "\n", "model = Sequential()\n", "model.add(Dense(1, input_shape=(4,), activation='sigmoid'))\n", "\n", "model.compile(loss='binary_crossentropy',\n", " optimizer='sgd',\n", " metrics=['accuracy'])\n", "\n", "history = model.fit(X_train, y_train, epochs=10)\n", "result = model.evaluate(X_test, y_test, verbose=0)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "historydf = pd.DataFrame(history.history, index=history.epoch)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "historydf.plot(ylim=(0,1))\n", "plt.title(\"Test accuracy: {:3.1f} %\".format(result[1]*100), fontsize=15);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Learning Rates" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dflist = []\n", "\n", "learning_rates = [0.01, 0.05, 0.1, 0.5]\n", "\n", "for lr in learning_rates:\n", "\n", " K.clear_session()\n", "\n", " model = Sequential()\n", " model.add(Dense(1, input_shape=(4,), activation='sigmoid'))\n", " model.compile(loss='binary_crossentropy',\n", " optimizer=SGD(learning_rate=lr),\n", " metrics=['accuracy'])\n", " h = model.fit(X_train, y_train, batch_size=16, epochs=10, verbose=0)\n", " \n", " dflist.append(pd.DataFrame(h.history, index=h.epoch))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "historydf = pd.concat(dflist, axis=1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "historydf" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "metrics_reported = dflist[0].columns\n", "idx = pd.MultiIndex.from_product([learning_rates, metrics_reported],\n", " names=['learning_rate', 'metric'])\n", "\n", "historydf.columns = idx" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "historydf" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.figure(figsize=(12,8))\n", "\n", "ax = plt.subplot(211)\n", "historydf.xs('loss', axis=1, level='metric').plot(ylim=(0,1), ax=ax)\n", "plt.title(\"Loss\")\n", "\n", "ax = plt.subplot(212)\n", "historydf.xs('accuracy', axis=1, level='metric').plot(ylim=(0,1), ax=ax)\n", "plt.title(\"Accuracy\")\n", "plt.xlabel(\"Epochs\")\n", "\n", "plt.tight_layout()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Batch Sizes" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dflist = []\n", "\n", "batch_sizes = [16, 32, 64, 128]\n", "\n", "for batch_size in batch_sizes:\n", " K.clear_session()\n", "\n", " model = Sequential()\n", " model.add(Dense(1, input_shape=(4,), activation='sigmoid'))\n", " model.compile(loss='binary_crossentropy',\n", " optimizer='sgd',\n", " metrics=['accuracy'])\n", " h = model.fit(X_train, y_train, batch_size=batch_size, epochs=10, verbose=0)\n", " \n", " dflist.append(pd.DataFrame(h.history, index=h.epoch))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "historydf = pd.concat(dflist, axis=1)\n", "metrics_reported = dflist[0].columns\n", "idx = pd.MultiIndex.from_product([batch_sizes, metrics_reported],\n", " names=['batch_size', 'metric'])\n", "historydf.columns = idx" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "historydf" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.figure(figsize=(12,8))\n", "\n", "ax = plt.subplot(211)\n", "historydf.xs('loss', axis=1, level='metric').plot(ylim=(0,1), ax=ax)\n", "plt.title(\"Loss\")\n", "\n", "ax = plt.subplot(212)\n", "historydf.xs('accuracy', axis=1, level='metric').plot(ylim=(0,1), ax=ax)\n", "plt.title(\"Accuracy\")\n", "plt.xlabel(\"Epochs\")\n", "\n", "plt.tight_layout()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Optimizers" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from tensorflow.keras.optimizers import SGD, Adam, Adagrad, RMSprop" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dflist = []\n", "\n", "optimizers = ['SGD(learning_rate=0.01)',\n", " 'SGD(learning_rate=0.01, momentum=0.3)',\n", " 'SGD(learning_rate=0.01, momentum=0.3, nesterov=True)', \n", " 'Adam(learning_rate=0.01)',\n", " 'Adagrad(learning_rate=0.01)',\n", " 'RMSprop(learning_rate=0.01)']\n", "\n", "for opt_name in optimizers:\n", "\n", " K.clear_session()\n", " \n", " model = Sequential()\n", " model.add(Dense(1, input_shape=(4,), activation='sigmoid'))\n", " model.compile(loss='binary_crossentropy',\n", " optimizer=eval(opt_name),\n", " metrics=['accuracy'])\n", " h = model.fit(X_train, y_train, batch_size=16, epochs=5, verbose=0)\n", " \n", " dflist.append(pd.DataFrame(h.history, index=h.epoch))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "historydf = pd.concat(dflist, axis=1)\n", "metrics_reported = dflist[0].columns\n", "idx = pd.MultiIndex.from_product([optimizers, metrics_reported],\n", " names=['optimizers', 'metric'])\n", "historydf.columns = idx" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.figure(figsize=(12,8))\n", "\n", "ax = plt.subplot(211)\n", "historydf.xs('loss', axis=1, level='metric').plot(ylim=(0,1), ax=ax)\n", "plt.title(\"Loss\")\n", "\n", "ax = plt.subplot(212)\n", "historydf.xs('accuracy', axis=1, level='metric').plot(ylim=(0,1), ax=ax)\n", "plt.title(\"Accuracy\")\n", "plt.xlabel(\"Epochs\")\n", "\n", "plt.tight_layout()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Initialization\n", "\n", "https://keras.io/initializers/" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dflist = []\n", "\n", "initializers = ['zeros', 'uniform', 'normal',\n", " 'he_normal', 'lecun_uniform']\n", "\n", "for init in initializers:\n", "\n", " K.clear_session()\n", "\n", " model = Sequential()\n", " model.add(Dense(1, input_shape=(4,),\n", " kernel_initializer=init,\n", " activation='sigmoid'))\n", "\n", " model.compile(loss='binary_crossentropy',\n", " optimizer='rmsprop',\n", " metrics=['accuracy'])\n", "\n", " h = model.fit(X_train, y_train, batch_size=16, epochs=5, verbose=0)\n", " \n", " dflist.append(pd.DataFrame(h.history, index=h.epoch))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "historydf = pd.concat(dflist, axis=1)\n", "metrics_reported = dflist[0].columns\n", "idx = pd.MultiIndex.from_product([initializers, metrics_reported],\n", " names=['initializers', 'metric'])\n", "\n", "historydf.columns = idx" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.figure(figsize=(12,8))\n", "\n", "ax = plt.subplot(211)\n", "historydf.xs('loss', axis=1, level='metric').plot(ylim=(0,1), ax=ax)\n", "plt.title(\"Loss\")\n", "\n", "ax = plt.subplot(212)\n", "historydf.xs('accuracy', axis=1, level='metric').plot(ylim=(0,1), ax=ax)\n", "plt.title(\"Accuracy\")\n", "plt.xlabel(\"Epochs\")\n", "\n", "plt.tight_layout()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Inner layer representation" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "K.clear_session()\n", "\n", "model = Sequential()\n", "model.add(Dense(2, input_shape=(4,), activation='relu'))\n", "model.add(Dense(1, activation='sigmoid'))\n", "model.compile(loss='binary_crossentropy',\n", " optimizer=RMSprop(learning_rate=0.01),\n", " metrics=['accuracy'])\n", "\n", "h = model.fit(X_train, y_train, batch_size=16, epochs=20,\n", " verbose=1, validation_split=0.3)\n", "result = model.evaluate(X_test, y_test)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model.summary()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model.layers" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "inp = model.layers[0].input\n", "out = model.layers[0].output" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "inp" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "out" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "features_function = K.function([inp], [out])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "features_function" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "features_function([X_test])[0].shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "features = features_function([X_test])[0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.scatter(features[:, 0], features[:, 1], c=y_test, cmap='coolwarm')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "K.clear_session()\n", "\n", "model = Sequential()\n", "model.add(Dense(3, input_shape=(4,), activation='relu'))\n", "model.add(Dense(2, activation='relu'))\n", "model.add(Dense(1, activation='sigmoid'))\n", "model.compile(loss='binary_crossentropy',\n", " optimizer=RMSprop(learning_rate=0.01),\n", " metrics=['accuracy'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "inp = model.layers[0].input\n", "out = model.layers[1].output\n", "features_function = K.function([inp], [out])\n", "\n", "plt.figure(figsize=(15,10))\n", "\n", "for i in range(1, 26):\n", " plt.subplot(5, 5, i)\n", " h = model.fit(X_train, y_train, batch_size=16, epochs=1, verbose=0)\n", " test_accuracy = model.evaluate(X_test, y_test, verbose=0)[1]\n", " features = features_function([X_test])[0]\n", " plt.scatter(features[:, 0], features[:, 1], c=y_test, cmap='coolwarm')\n", " plt.xlim(-0.5, 3.5)\n", " plt.ylim(-0.5, 4.0)\n", " plt.title('Epoch: {}, Test Acc: {:3.1f} %'.format(i, test_accuracy * 100.0))\n", "\n", "plt.tight_layout()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 1\n", "\n", "You've just been hired at a wine company and they would like you to help them build a model that predicts the quality of their wine based on several measurements. They give you a dataset with wine\n", "\n", "- Load the ../data/wines.csv into Pandas\n", "- Use the column called \"Class\" as target\n", "- Check how many classes are there in target, and if necessary use dummy columns for a multi-class classification\n", "- Use all the other columns as features, check their range and distribution (using seaborn pairplot)\n", "- Rescale all the features using either MinMaxScaler or StandardScaler\n", "- Build a deep model with at least 1 hidden layer to classify the data\n", "- Choose the cost function, what will you use? Mean Squared Error? Binary Cross-Entropy? Categorical Cross-Entropy?\n", "- Choose an optimizer\n", "- Choose a value for the learning rate, you may want to try with several values\n", "- Choose a batch size\n", "- Train your model on all the data using a `validation_split=0.2`. Can you converge to 100% validation accuracy?\n", "- What's the minumum number of epochs to converge?\n", "- Repeat the training several times to verify how stable your results are" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 2\n", "\n", "Since this dataset has 13 features we can only visualize pairs of features like we did in the Paired plot. We could however exploit the fact that a neural network is a function to extract 2 high level features to represent our data.\n", "\n", "- Build a deep fully connected network with the following structure:\n", " - Layer 1: 8 nodes\n", " - Layer 2: 5 nodes\n", " - Layer 3: 2 nodes\n", " - Output : 3 nodes\n", "- Choose activation functions, inizializations, optimizer and learning rate so that it converges to 100% accuracy within 20 epochs (not easy)\n", "- Remember to train the model on the scaled data\n", "- Define a Feature Funtion like we did above between the input of the 1st layer and the output of the 3rd layer\n", "- Calculate the features and plot them on a 2-dimensional scatter plot\n", "- Can we distinguish the 3 classes well?\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 3\n", "\n", "Keras functional API. So far we've always used the Sequential model API in Keras. However, Keras also offers a Functional API, which is much more powerful. You can find its [documentation here](https://keras.io/getting-started/functional-api-guide/). Let's see how we can leverage it.\n", "\n", "- define an input layer called `inputs`\n", "- define two hidden layers as before, one with 8 nodes, one with 5 nodes\n", "- define a `second_to_last` layer with 2 nodes\n", "- define an output layer with 3 nodes\n", "- create a model that connect input and output\n", "- train it and make sure that it converges\n", "- define a function between inputs and second_to_last layer\n", "- recalculate the features and plot them" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 4 \n", "\n", "Keras offers the possibility to call a function at each epoch. These are Callbacks, and their [documentation is here](https://keras.io/callbacks/). Callbacks allow us to add some neat functionality. In this exercise we'll explore a few of them.\n", "\n", "- Split the data into train and test sets with a test_size = 0.3 and random_state=42\n", "- Reset and recompile your model\n", "- train the model on the train data using `validation_data=(X_test, y_test)`\n", "- Use the `EarlyStopping` callback to stop your training if the `val_loss` doesn't improve\n", "- Use the `ModelCheckpoint` callback to save the trained model to disk once training is finished\n", "- Use the `TensorBoard` callback to output your training information to a `/tmp/` subdirectory\n", "- Watch the next video for an overview of tensorboard" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" } }, "nbformat": 4, "nbformat_minor": 2 }