{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Gradient Descent" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "%matplotlib inline\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 1\n", "\n", "You've just been hired at a wine company and they would like you to help them build a model that predicts the quality of their wine based on several measurements. They give you a dataset with wine\n", "\n", "- Load the ../data/wines.csv into Pandas\n", "- Use the column called \"Class\" as target\n", "- Check how many classes are there in target, and if necessary use dummy columns for a multi-class classification\n", "- Use all the other columns as features, check their range and distribution (using seaborn pairplot)\n", "- Rescale all the features using either MinMaxScaler or StandardScaler\n", "- Build a deep model with at least 1 hidden layer to classify the data\n", "- Choose the cost function, what will you use? Mean Squared Error? Binary Cross-Entropy? Categorical Cross-Entropy?\n", "- Choose an optimizer\n", "- Choose a value for the learning rate, you may want to try with several values\n", "- Choose a batch size\n", "- Train your model on all the data using a `validation_split=0.2`. Can you converge to 100% validation accuracy?\n", "- What's the minumum number of epochs to converge?\n", "- Repeat the training several times to verify how stable your results are" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv('../data/wines.csv')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "y = df['Class']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "y.value_counts()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "y_cat = pd.get_dummies(y)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "y_cat.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "X = df.drop('Class', axis=1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "X.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import seaborn as sns" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sns.pairplot(df, hue='Class')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sklearn.preprocessing import StandardScaler" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sc = StandardScaler()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Xsc = sc.fit_transform(X)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from tensorflow.keras.models import Sequential\n", "from tensorflow.keras.layers import Dense\n", "from tensorflow.keras.optimizers import SGD, Adam, Adadelta, RMSprop\n", "import tensorflow.keras.backend as K" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "K.clear_session()\n", "model = Sequential()\n", "model.add(Dense(5, input_shape=(13,),\n", " kernel_initializer='he_normal',\n", " activation='relu'))\n", "model.add(Dense(3, activation='softmax'))\n", "\n", "model.compile(RMSprop(learning_rate=0.1),\n", " 'categorical_crossentropy',\n", " metrics=['accuracy'])\n", "\n", "model.fit(Xsc, y_cat.values,\n", " batch_size=8,\n", " epochs=10,\n", " verbose=1,\n", " validation_split=0.2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 2\n", "\n", "Since this dataset has 13 features we can only visualize pairs of features like we did in the Paired plot. We could however exploit the fact that a neural network is a function to extract 2 high level features to represent our data.\n", "\n", "- Build a deep fully connected network with the following structure:\n", " - Layer 1: 8 nodes\n", " - Layer 2: 5 nodes\n", " - Layer 3: 2 nodes\n", " - Output : 3 nodes\n", "- Choose activation functions, inizializations, optimizer and learning rate so that it converges to 100% accuracy within 20 epochs (not easy)\n", "- Remember to train the model on the scaled data\n", "- Define a Feature Function like we did above between the input of the 1st layer and the output of the 3rd layer\n", "- Calculate the features and plot them on a 2-dimensional scatter plot\n", "- Can we distinguish the 3 classes well?\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "K.clear_session()\n", "model = Sequential()\n", "model.add(Dense(8, input_shape=(13,),\n", " kernel_initializer='he_normal', activation='tanh'))\n", "model.add(Dense(5, kernel_initializer='he_normal', activation='tanh'))\n", "model.add(Dense(2, kernel_initializer='he_normal', activation='tanh'))\n", "model.add(Dense(3, activation='softmax'))\n", "\n", "model.compile(RMSprop(learning_rate=0.05),\n", " 'categorical_crossentropy',\n", " metrics=['accuracy'])\n", "\n", "model.fit(Xsc, y_cat.values,\n", " batch_size=16,\n", " epochs=20,\n", " verbose=1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model.summary()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "inp = model.layers[0].input\n", "out = model.layers[2].output" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "features_function = K.function([inp], [out])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "features = features_function([Xsc])[0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "features.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.scatter(features[:, 0], features[:, 1], c=y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 3\n", "\n", "Keras functional API. So far we've always used the Sequential model API in Keras. However, Keras also offers a Functional API, which is much more powerful. You can find its [documentation here](https://keras.io/getting-started/functional-api-guide/). Let's see how we can leverage it.\n", "\n", "- define an input layer called `inputs`\n", "- define two hidden layers as before, one with 8 nodes, one with 5 nodes\n", "- define a `second_to_last` layer with 2 nodes\n", "- define an output layer with 3 nodes\n", "- create a model that connect input and output\n", "- train it and make sure that it converges\n", "- define a function between inputs and second_to_last layer\n", "- recalculate the features and plot them" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from tensorflow.keras.layers import Input\n", "from tensorflow.keras.models import Model" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "K.clear_session()\n", "\n", "inputs = Input(shape=(13,))\n", "x = Dense(8, kernel_initializer='he_normal', activation='tanh')(inputs)\n", "x = Dense(5, kernel_initializer='he_normal', activation='tanh')(x)\n", "second_to_last = Dense(2, kernel_initializer='he_normal',\n", " activation='tanh')(x)\n", "outputs = Dense(3, activation='softmax')(second_to_last)\n", "\n", "model = Model(inputs=inputs, outputs=outputs)\n", "\n", "model.compile(RMSprop(learning_rate=0.05),\n", " 'categorical_crossentropy',\n", " metrics=['accuracy'])\n", "\n", "model.fit(Xsc, y_cat.values, batch_size=16, epochs=20, verbose=1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "features_function = K.function([inputs], [second_to_last])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "features = features_function([Xsc])[0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.scatter(features[:, 0], features[:, 1], c=y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise 4 \n", "\n", "Keras offers the possibility to call a function at each epoch. These are Callbacks, and their [documentation is here](https://keras.io/callbacks/). Callbacks allow us to add some neat functionality. In this exercise we'll explore a few of them.\n", "\n", "- Split the data into train and test sets with a test_size = 0.3 and random_state=42\n", "- Reset and recompile your model\n", "- train the model on the train data using `validation_data=(X_test, y_test)`\n", "- Use the `EarlyStopping` callback to stop your training if the `val_loss` doesn't improve\n", "- Use the `ModelCheckpoint` callback to save the trained model to disk once training is finished\n", "- Use the `TensorBoard` callback to output your training information to a `/tmp/` subdirectory\n", "- Watch the next video for an overview of tensorboard" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping, TensorBoard" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "checkpointer = ModelCheckpoint(filepath=\"/tmp/udemy/weights.hdf5\",\n", " verbose=1, save_best_only=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "earlystopper = EarlyStopping(monitor='val_loss', min_delta=0,\n", " patience=1, verbose=1, mode='auto')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tensorboard = TensorBoard(log_dir='/tmp/udemy/tensorboard/')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sklearn.model_selection import train_test_split" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "X_train, X_test, y_train, y_test = train_test_split(Xsc, y_cat.values,\n", " test_size=0.3,\n", " random_state=42)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "K.clear_session()\n", "\n", "inputs = Input(shape=(13,))\n", "\n", "x = Dense(8, kernel_initializer='he_normal', activation='tanh')(inputs)\n", "x = Dense(5, kernel_initializer='he_normal', activation='tanh')(x)\n", "second_to_last = Dense(2, kernel_initializer='he_normal',\n", " activation='tanh')(x)\n", "outputs = Dense(3, activation='softmax')(second_to_last)\n", "\n", "model = Model(inputs=inputs, outputs=outputs)\n", "\n", "model.compile(RMSprop(learning_rate=0.05), 'categorical_crossentropy',\n", " metrics=['accuracy'])\n", "\n", "model.fit(X_train, y_train, batch_size=32,\n", " epochs=20, verbose=2,\n", " validation_data=(X_test, y_test),\n", " callbacks=[checkpointer, earlystopper, tensorboard])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Run Tensorboard with the command:\n", "\n", " tensorboard --logdir /tmp/udemy/tensorboard/\n", " \n", "and open your browser at http://localhost:6006" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" } }, "nbformat": 4, "nbformat_minor": 2 }