Machine Learning Models | |||
Modeling an epidemic | 00:08:00 | ||
The machine learning recipe | 00:06:00 | ||
The components of a machine learning model | 00:02:00 | ||
Why model? | 00:03:00 | ||
On assumptions and can we get rid of them? | 00:09:00 | ||
The case of AlphaZero | 00:11:00 | ||
Overfitting/underfitting/bias/variance | 00:11:00 | ||
Why use machine learning | 00:05:00 | ||
Linear regression | |||
The InsureMe challenge | 00:06:00 | ||
Supervised learning | 00:05:00 | ||
Linear assumption | 00:03:00 | ||
Linear regression template | 00:07:00 | ||
Non-linear vs proportional vs linear | 00:05:00 | ||
Linear regression template revisited | 00:04:00 | ||
Loss function | 00:08:00 | ||
Training algorithm | 00:08:00 | ||
Code time | 00:15:00 | ||
R squared | 00:06:00 | ||
Why use a linear model? | 00:04:00 | ||
Scaling and Pipelines | |||
Introduction to scaling | 00:06:00 | ||
Min-max scaling | 00:03:00 | ||
Code time (min-max scaling) | 00:09:00 | ||
The problem with min-max scaling | 00:03:00 | ||
What’s your IQ? | 00:11:00 | ||
Standard scaling | 00:04:00 | ||
Code time (standard scaling) | 00:02:00 | ||
Model before and after scaling | 00:05:00 | ||
Inference time | 00:07:00 | ||
Pipelines | 00:03:00 | ||
Code time (pipelines) | 00:05:00 | ||
Regularization | |||
Spurious correlations | 00:04:00 | ||
L2 regularization | 00:10:00 | ||
Code time (L2 regularization) | 00:05:00 | ||
L2 results | 00:02:00 | ||
L1 regularization | 00:06:00 | ||
Code time (L1 regularization) | 00:04:00 | ||
L1 results | 00:02:00 | ||
Why does L1 encourage zeros? | 00:09:00 | ||
L1 vs L2: Which one is best? | 00:01:00 | ||
Validation | |||
Introduction to validation | 00:02:00 | ||
Why not evaluate model on training data | 00:06:00 | ||
The validation set | 00:05:00 | ||
Code time (validation set) | 00:08:00 | ||
Error curves | 00:06:00 | ||
Model selection | 00:06:00 | ||
The problem with model selection | 00:06:00 | ||
Tainted validation set | 00:05:00 | ||
Monkeys with typewriters | 00:03:00 | ||
My own validation epic fail | 00:07:00 | ||
The test set | 00:06:00 | ||
What if the model doesn’t pass the test? | 00:05:00 | ||
How not to be fooled by randomness | 00:02:00 | ||
Cross-validation | 00:04:00 | ||
Code time (cross validation) | 00:07:00 | ||
Cross-validation results summary | 00:02:00 | ||
AutoML | 00:05:00 | ||
Is AutoML a good idea? | 00:05:00 | ||
Red flags: Don’t do this! | 00:07:00 | ||
Red flags summary and what to do instead | 00:05:00 | ||
Your job as a data scientist | 00:03:00 | ||
Common Mistakes | |||
Intro and recap | 00:02:00 | ||
Mistake #1: Data leakage | 00:05:00 | ||
The golden rule | 00:04:00 | ||
Helpful trick (feature importance) | 00:02:00 | ||
Real example of data leakage (part 1) | 00:05:00 | ||
Real example of data leakage (part 2) | 00:05:00 | ||
Another (funny) example of data leakage | 00:02:00 | ||
Mistake #2: Random split of dependent data | 00:05:00 | ||
Another example (insurance data) | 00:05:00 | ||
Mistake #3: Look-Ahead Bias | 00:06:00 | ||
Example solutions to Look-Ahead Bias | 00:02:00 | ||
Consequences of Look-Ahead Bias | 00:02:00 | ||
How to split data to avoid Look-Ahead Bias | 00:03:00 | ||
Cross-validation with temporally related data | 00:03:00 | ||
Mistake #4: Building model for one thing, using it for something else | 00:04:00 | ||
Sketchy rationale | 00:06:00 | ||
Why this matters for your career and job search | 00:04:00 | ||
Classification - Part 1: Logistic Model | |||
Classifying images of handwritten digits | 00:07:00 | ||
Why the usual regression doesn’t work | 00:04:00 | ||
Machine learning recipe recap | 00:02:00 | ||
Logistic model template (binary) | 00:13:00 | ||
Decision function and boundary (binary) | 00:05:00 | ||
Logistic model template (multiclass) | 00:14:00 | ||
Decision function and boundary (multi-class) | 00:01:00 | ||
Summary: binary vs multiclass | 00:01:00 | ||
Code time! | 00:20:00 | ||
Why the logistic model is often called logistic regression | 00:05:00 | ||
One vs Rest, One vs One | 00:05:00 | ||
Classification - Part 2: Maximum Likelihood Estimation | |||
Where we’re at | 00:02:00 | ||
Brier score and why it doesn’t work | 00:06:00 | ||
The likelihood function | 00:11:00 | ||
Optimization task and numerical stability | 00:03:00 | ||
Let’s improve the loss function | 00:09:00 | ||
Loss value examples | 00:05:00 | ||
Adding regularization | 00:02:00 | ||
Binary cross-entropy loss | 00:03:00 | ||
Classification - Part 3: Gradient Descent | |||
Recap | 00:03:00 | ||
No closed-form solution | 00:02:00 | ||
Naive algorithm | 00:04:00 | ||
Fog analogy | 00:05:00 | ||
Gradient descent overview | 00:03:00 | ||
The gradient | 00:06:00 | ||
Numerical calculation | 00:02:00 | ||
Parameter update | 00:04:00 | ||
Convergence | 00:02:00 | ||
Analytical solution | 00:02:00 | ||
[Optional] Interpreting analytical solution | 00:05:00 | ||
Gradient descent conditions | 00:03:00 | ||
Beyond vanilla gradient descent | 00:03:00 | ||
Code time | 00:07:00 | ||
Reading the documentation | 00:11:00 | ||
Classification metrics and class imbalance | |||
Binary classification and class imbalance | 00:06:00 | ||
Assessing performance | 00:04:00 | ||
Accuracy | 00:07:00 | ||
Accuracy with different class importance | 00:04:00 | ||
Precision and Recall | 00:07:00 | ||
Sensitivity and Specificity | 00:03:00 | ||
F-measure and other combined metrics | 00:05:00 | ||
ROC curve | 00:07:00 | ||
Area under the ROC curve | 00:06:00 | ||
Custom metric (important stuff!) | 00:06:00 | ||
Other custom metrics | 00:03:00 | ||
Bad data science process | 00:04:00 | ||
Data rebalancing (avoid doing this!) | 00:06:00 | ||
Stratified split | 00:03:00 | ||
Neural Networks | |||
The inverted MNIST dataset | 00:04:00 | ||
The problem with linear models | 00:05:00 | ||
Neurons | 00:03:00 | ||
Multi-layer perceptron (MLP) for binary classification | 00:05:00 | ||
MLP for regression | 00:02:00 | ||
MLP for multi-class classification | 00:01:00 | ||
Hidden layers | 00:01:00 | ||
Activation functions | 00:03:00 | ||
Decision boundary | 00:02:00 | ||
Loss function | 00:03:00 | ||
Intro to neural network training | 00:03:00 | ||
Parameter initialization | 00:03:00 | ||
Saturation | 00:05:00 | ||
Non-convexity | 00:04:00 | ||
Stochastic gradient descent (SGD) | 00:05:00 | ||
More on SGD | 00:07:00 | ||
Code time! | 00:13:00 | ||
Backpropagation | 00:11:00 | ||
The problem with MLPs | 00:04:00 | ||
Deep learning | 00:09:00 | ||
Tree-Based Models | |||
Decision trees | 00:04:00 | ||
Building decision trees | 00:09:00 | ||
Stopping tree growth | 00:03:00 | ||
Pros and cons of decision trees | 00:08:00 | ||
Decision trees for classification | 00:07:00 | ||
Decision boundary | 00:01:00 | ||
Bagging | 00:04:00 | ||
Random forests | 00:06:00 | ||
Gradient-boosted trees for regression | 00:07:00 | ||
Gradient-boosted trees for classification [optional] | 00:04:00 | ||
How to use gradient-boosted trees | 00:03:00 | ||
K-nn and SVM | |||
Nearest neighbor classification | 00:03:00 | ||
K nearest neighbors | 00:03:00 | ||
Disadvantages of k-NN | 00:04:00 | ||
Recommendation systems (collaborative filtering) | 00:03:00 | ||
Introduction to Support Vector Machines (SVMs) | 00:05:00 | ||
Maximum margin | 00:02:00 | ||
Soft margin | 00:02:00 | ||
SVM vs Logistic Model (support vectors) | 00:03:00 | ||
Alternative SVM formulation | 00:06:00 | ||
Dot product | 00:02:00 | ||
Non-linearly separable data | 00:03:00 | ||
Kernel trick (polynomial) | 00:10:00 | ||
RBF kernel | 00:02:00 | ||
SVM remarks | 00:06:00 | ||
Unsupervised Learning | |||
Intro to unsupervised learning | 00:01:00 | ||
Clustering | 00:03:00 | ||
K-means clustering | 00:10:00 | ||
K-means application example | 00:03:00 | ||
Elbow method | 00:02:00 | ||
Clustering remarks | 00:07:00 | ||
Intro to dimensionality reduction | 00:05:00 | ||
PCA (principal component analysis) | 00:08:00 | ||
PCA remarks | 00:03:00 | ||
Code time (PCA) | 00:13:00 | ||
Feature Engineering | |||
Missing data | 00:02:00 | ||
Imputation | 00:04:00 | ||
Imputer within pipeline | 00:04:00 | ||
One-Hot encoding | 00:05:00 | ||
Ordinal encoding | 00:03:00 | ||
How to combine pipelines | 00:04:00 | ||
Code sample | 00:08:00 | ||
Feature Engineering | 00:07:00 | ||
Features for Natural Language Processing (NLP) | 00:11:00 | ||
Anatomy of a Data Science Project | 00:01:00 |
Upgrade to get UNLIMITED ACCESS to ALL COURSES for only £49/year
Claim Offer & UpgradeMembership renews after 12 months. You can cancel anytime from your account.