In this notebook we’ll manually implement regularized logistic regression in order to facilitate intuition about the algorithm’s underlying math and to demonstrate how regularization can address overfitting or underfitting. We’ll then implement logistic regression in a practical manner utilizing the ubiquitous scikit-learn package. The post assumes the reader is familiar with the concepts of optimization, cross validation, and non-linearity.

Andrew Ng’s excellent Machine Learning class on Coursera is not only an excellent primer on the theoretical underpinnings of ML, but it also introduces it’s students to practical implementations via coding. Unfortunately for data enthusiasts that are pickled in Python or R, the class excercises are in Matlab or Octave. Ng’s class introduces logistic regression with regularization early on which is logical as it’s methods underpin more advanced concepts. If you’re interested, check out Caltech’s more theory heavy Learning From Data course on edX which digs a little deeper in does the same but REALLY dives deep into perceptron.

Using training set X and label set y (data from Ng’s class), we’ll use Logistic Regression to estimate the probability that an observation is of label class 1 or 0. Specifically we’ll predict whether computer microchips will pass a QA test based on two measurements X[0] and X[1].

We’ll generate additional features from the original QA data (feature engineering) in order to create a better feature space to learn from and achieve a better fit. One potential pitfall is that we may overfit the training data which decreases the learned models ability to generalize causing poor performance on new unseen observations. Regularization aims to keep the influence of the new features in check.