| Machine Learning Models | |||
| Modeling an epidemic | 00:08:00 | ||
| The machine learning recipe | 00:06:00 | ||
| The components of a machine learning model | 00:02:00 | ||
| Why model? | 00:03:00 | ||
| On assumptions and can we get rid of them? | 00:09:00 | ||
| The case of AlphaZero | 00:11:00 | ||
| Overfitting/underfitting/bias/variance | 00:11:00 | ||
| Why use machine learning | 00:05:00 | ||
| Linear regression | |||
| The InsureMe challenge | 00:06:00 | ||
| Supervised learning | 00:05:00 | ||
| Linear assumption | 00:03:00 | ||
| Linear regression template | 00:07:00 | ||
| Non-linear vs proportional vs linear | 00:05:00 | ||
| Linear regression template revisited | 00:04:00 | ||
| Loss function | 00:08:00 | ||
| Training algorithm | 00:08:00 | ||
| Code time | 00:15:00 | ||
| R squared | 00:06:00 | ||
| Why use a linear model? | 00:04:00 | ||
| Scaling and Pipelines | |||
| Introduction to scaling | 00:06:00 | ||
| Min-max scaling | 00:03:00 | ||
| Code time (min-max scaling) | 00:09:00 | ||
| The problem with min-max scaling | 00:03:00 | ||
| What’s your IQ? | 00:11:00 | ||
| Standard scaling | 00:04:00 | ||
| Code time (standard scaling) | 00:02:00 | ||
| Model before and after scaling | 00:05:00 | ||
| Inference time | 00:07:00 | ||
| Pipelines | 00:03:00 | ||
| Code time (pipelines) | 00:05:00 | ||
| Regularization | |||
| Spurious correlations | 00:04:00 | ||
| L2 regularization | 00:10:00 | ||
| Code time (L2 regularization) | 00:05:00 | ||
| L2 results | 00:02:00 | ||
| L1 regularization | 00:06:00 | ||
| Code time (L1 regularization) | 00:04:00 | ||
| L1 results | 00:02:00 | ||
| Why does L1 encourage zeros? | 00:09:00 | ||
| L1 vs L2: Which one is best? | 00:01:00 | ||
| Validation | |||
| Introduction to validation | 00:02:00 | ||
| Why not evaluate model on training data | 00:06:00 | ||
| The validation set | 00:05:00 | ||
| Code time (validation set) | 00:08:00 | ||
| Error curves | 00:06:00 | ||
| Model selection | 00:06:00 | ||
| The problem with model selection | 00:06:00 | ||
| Tainted validation set | 00:05:00 | ||
| Monkeys with typewriters | 00:03:00 | ||
| My own validation epic fail | 00:07:00 | ||
| The test set | 00:06:00 | ||
| What if the model doesn’t pass the test? | 00:05:00 | ||
| How not to be fooled by randomness | 00:02:00 | ||
| Cross-validation | 00:04:00 | ||
| Code time (cross validation) | 00:07:00 | ||
| Cross-validation results summary | 00:02:00 | ||
| AutoML | 00:05:00 | ||
| Is AutoML a good idea? | 00:05:00 | ||
| Red flags: Don’t do this! | 00:07:00 | ||
| Red flags summary and what to do instead | 00:05:00 | ||
| Your job as a data scientist | 00:03:00 | ||
| Common Mistakes | |||
| Intro and recap | 00:02:00 | ||
| Mistake #1: Data leakage | 00:05:00 | ||
| The golden rule | 00:04:00 | ||
| Helpful trick (feature importance) | 00:02:00 | ||
| Real example of data leakage (part 1) | 00:05:00 | ||
| Real example of data leakage (part 2) | 00:05:00 | ||
| Another (funny) example of data leakage | 00:02:00 | ||
| Mistake #2: Random split of dependent data | 00:05:00 | ||
| Another example (insurance data) | 00:05:00 | ||
| Mistake #3: Look-Ahead Bias | 00:06:00 | ||
| Example solutions to Look-Ahead Bias | 00:02:00 | ||
| Consequences of Look-Ahead Bias | 00:02:00 | ||
| How to split data to avoid Look-Ahead Bias | 00:03:00 | ||
| Cross-validation with temporally related data | 00:03:00 | ||
| Mistake #4: Building model for one thing, using it for something else | 00:04:00 | ||
| Sketchy rationale | 00:06:00 | ||
| Why this matters for your career and job search | 00:04:00 | ||
| Classification - Part 1: Logistic Model | |||
| Classifying images of handwritten digits | 00:07:00 | ||
| Why the usual regression doesn’t work | 00:04:00 | ||
| Machine learning recipe recap | 00:02:00 | ||
| Logistic model template (binary) | 00:13:00 | ||
| Decision function and boundary (binary) | 00:05:00 | ||
| Logistic model template (multiclass) | 00:14:00 | ||
| Decision function and boundary (multi-class) | 00:01:00 | ||
| Summary: binary vs multiclass | 00:01:00 | ||
| Code time! | 00:20:00 | ||
| Why the logistic model is often called logistic regression | 00:05:00 | ||
| One vs Rest, One vs One | 00:05:00 | ||
| Classification - Part 2: Maximum Likelihood Estimation | |||
| Where we’re at | 00:02:00 | ||
| Brier score and why it doesn’t work | 00:06:00 | ||
| The likelihood function | 00:11:00 | ||
| Optimization task and numerical stability | 00:03:00 | ||
| Let’s improve the loss function | 00:09:00 | ||
| Loss value examples | 00:05:00 | ||
| Adding regularization | 00:02:00 | ||
| Binary cross-entropy loss | 00:03:00 | ||
| Classification - Part 3: Gradient Descent | |||
| Recap | 00:03:00 | ||
| No closed-form solution | 00:02:00 | ||
| Naive algorithm | 00:04:00 | ||
| Fog analogy | 00:05:00 | ||
| Gradient descent overview | 00:03:00 | ||
| The gradient | 00:06:00 | ||
| Numerical calculation | 00:02:00 | ||
| Parameter update | 00:04:00 | ||
| Convergence | 00:02:00 | ||
| Analytical solution | 00:02:00 | ||
| [Optional] Interpreting analytical solution | 00:05:00 | ||
| Gradient descent conditions | 00:03:00 | ||
| Beyond vanilla gradient descent | 00:03:00 | ||
| Code time | 00:07:00 | ||
| Reading the documentation | 00:11:00 | ||
| Classification metrics and class imbalance | |||
| Binary classification and class imbalance | 00:06:00 | ||
| Assessing performance | 00:04:00 | ||
| Accuracy | 00:07:00 | ||
| Accuracy with different class importance | 00:04:00 | ||
| Precision and Recall | 00:07:00 | ||
| Sensitivity and Specificity | 00:03:00 | ||
| F-measure and other combined metrics | 00:05:00 | ||
| ROC curve | 00:07:00 | ||
| Area under the ROC curve | 00:06:00 | ||
| Custom metric (important stuff!) | 00:06:00 | ||
| Other custom metrics | 00:03:00 | ||
| Bad data science process | 00:04:00 | ||
| Data rebalancing (avoid doing this!) | 00:06:00 | ||
| Stratified split | 00:03:00 | ||
| Neural Networks | |||
| The inverted MNIST dataset | 00:04:00 | ||
| The problem with linear models | 00:05:00 | ||
| Neurons | 00:03:00 | ||
| Multi-layer perceptron (MLP) for binary classification | 00:05:00 | ||
| MLP for regression | 00:02:00 | ||
| MLP for multi-class classification | 00:01:00 | ||
| Hidden layers | 00:01:00 | ||
| Activation functions | 00:03:00 | ||
| Decision boundary | 00:02:00 | ||
| Loss function | 00:03:00 | ||
| Intro to neural network training | 00:03:00 | ||
| Parameter initialization | 00:03:00 | ||
| Saturation | 00:05:00 | ||
| Non-convexity | 00:04:00 | ||
| Stochastic gradient descent (SGD) | 00:05:00 | ||
| More on SGD | 00:07:00 | ||
| Code time! | 00:13:00 | ||
| Backpropagation | 00:11:00 | ||
| The problem with MLPs | 00:04:00 | ||
| Deep learning | 00:09:00 | ||
| Tree-Based Models | |||
| Decision trees | 00:04:00 | ||
| Building decision trees | 00:09:00 | ||
| Stopping tree growth | 00:03:00 | ||
| Pros and cons of decision trees | 00:08:00 | ||
| Decision trees for classification | 00:07:00 | ||
| Decision boundary | 00:01:00 | ||
| Bagging | 00:04:00 | ||
| Random forests | 00:06:00 | ||
| Gradient-boosted trees for regression | 00:07:00 | ||
| Gradient-boosted trees for classification [optional] | 00:04:00 | ||
| How to use gradient-boosted trees | 00:03:00 | ||
| K-nn and SVM | |||
| Nearest neighbor classification | 00:03:00 | ||
| K nearest neighbors | 00:03:00 | ||
| Disadvantages of k-NN | 00:04:00 | ||
| Recommendation systems (collaborative filtering) | 00:03:00 | ||
| Introduction to Support Vector Machines (SVMs) | 00:05:00 | ||
| Maximum margin | 00:02:00 | ||
| Soft margin | 00:02:00 | ||
| SVM vs Logistic Model (support vectors) | 00:03:00 | ||
| Alternative SVM formulation | 00:06:00 | ||
| Dot product | 00:02:00 | ||
| Non-linearly separable data | 00:03:00 | ||
| Kernel trick (polynomial) | 00:10:00 | ||
| RBF kernel | 00:02:00 | ||
| SVM remarks | 00:06:00 | ||
| Unsupervised Learning | |||
| Intro to unsupervised learning | 00:01:00 | ||
| Clustering | 00:03:00 | ||
| K-means clustering | 00:10:00 | ||
| K-means application example | 00:03:00 | ||
| Elbow method | 00:02:00 | ||
| Clustering remarks | 00:07:00 | ||
| Intro to dimensionality reduction | 00:05:00 | ||
| PCA (principal component analysis) | 00:08:00 | ||
| PCA remarks | 00:03:00 | ||
| Code time (PCA) | 00:13:00 | ||
| Feature Engineering | |||
| Missing data | 00:02:00 | ||
| Imputation | 00:04:00 | ||
| Imputer within pipeline | 00:04:00 | ||
| One-Hot encoding | 00:05:00 | ||
| Ordinal encoding | 00:03:00 | ||
| How to combine pipelines | 00:04:00 | ||
| Code sample | 00:08:00 | ||
| Feature Engineering | 00:07:00 | ||
| Features for Natural Language Processing (NLP) | 00:11:00 | ||
| Anatomy of a Data Science Project | 00:01:00 | ||