{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Model evaluation & cross-validation\n", "\n", "````{margin}\n", "```{warning}\n", "These pages are currently under construction and will be updated continuously.\n", "Please visit these pages again in the next few weeks for further information.\n", "````\n", "\n", "---------------" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Aim(s) of this section 🎯\n", "\n", "As mention in the previous section, it is not sufficient to apply these methods to learn somthing about the nature of our data. It is always necessary to assess the quality of the implemented model. The goal of these section is to look at ways to estimate the generalization accuracy of a model on future (e.g.,unseen, out-of-sample) data.\n", "\n", "In other words, at the end of these sections you should know:\n", "- 1) different techniques to evaluate a given model\n", "- 2) understand the basic idea of cross-validation and different kinds of the same\n", "- 3) get an idea how to assess the significance (e.g., via permutation tests)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Outline for this section 📝\n", "\n", "1. Model diagnostics\n", "\n", "2. Cross-validation" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Prepare data for model\n", "\n", "Lets bring back our example data set (you know the song ...)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "There are 155 samples and 2016 features\n" ] } ], "source": [ "import numpy as np\n", "import pandas as pd\n", "\n", "# get the data set\n", "data = np.load('MAIN2019_BASC064_subsamp_features.npz')['a']\n", "\n", "# get the labels\n", "info = pd.read_csv('participants.csv')\n", "\n", "\n", "print('There are %s samples and %s features' % (data.shape[0], data.shape[1]))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Now let's look at the labels" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "scrolled": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", " | participant_id | \n", "Age | \n", "AgeGroup | \n", "Child_Adult | \n", "Gender | \n", "Handedness | \n", "
---|---|---|---|---|---|---|
0 | \n", "sub-pixar123 | \n", "27.06 | \n", "Adult | \n", "adult | \n", "F | \n", "R | \n", "
1 | \n", "sub-pixar124 | \n", "33.44 | \n", "Adult | \n", "adult | \n", "M | \n", "R | \n", "
2 | \n", "sub-pixar125 | \n", "31.00 | \n", "Adult | \n", "adult | \n", "M | \n", "R | \n", "
3 | \n", "sub-pixar126 | \n", "19.00 | \n", "Adult | \n", "adult | \n", "F | \n", "R | \n", "
4 | \n", "sub-pixar127 | \n", "23.00 | \n", "Adult | \n", "adult | \n", "F | \n", "R | \n", "