Machine Learning

Introduction

This is a huge topic

So how do people learn?

  • We gain information through observation or instruction
  • We apply this information to tasks
  • We generalize this information to related domains

"Machine Learning" is—obviously—far more simplistic.

"A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E."
—Tom Mitchell,
Professor of Machine Learning, CMU

Input Data

  • Features: individual measurable property or characteristic of a phenomenon being observed
  • Numeric data
  • Labeled images, documents, or audio
  • Graphs/Relationships

Abstraction

  • Features are used to derive a model
  • A model is a summarized/generalized knowledge representation of the input data
  • Computational blocks (like if/else rules)
  • Math equations
  • Data structures like trees and graphs
  • Logical groupings of similar features

Abstraction, Cont'd

Different problems require different modeling approaches:

  • Characterize the relationship between features
  • Predict some value given other information
  • Classify a thing by category
  • Group similar things together

Generalization

  • Apply the model to new data
  • Evaluate the performance of the model
  • Make improvements
  • Try again
  • A model generalizes well when we can apply it to new data and it gives us the right answers

None of this is magic.

  • You need quality input data
  • You need to be able to formulate a problem in a way that ML can work with
  • You may need to spend time creating data to learn from
  • You will need to spend time evaluating the result

Remember this?

(Lecture 10: Exploratory Data Analysis)

  • Input data: height and width measurements
  • Abstraction: we have a prediction problem
    • "Given their height, can we predict someone's weight?"
    • Output is a math equation: y = mx + b
    • Our model learns the value of m and b
  • Generalization: test with real data. How well does it predict?
  • We can formulate this problem (and a lot of ML problems) as trying to minimize a loss function
  • How much does our model miss the target?
  • Keep adjusting until our model is as close as it can be.

Mean Squared Error (MSE)

\[\begin{aligned} MSE = \frac{1}{n}\sum_{i=1}^{n} ({Y_{i} - \hat{Y_{i}}})^2 \end{aligned} \]

Mean Squared Error (MSE)


total =  0
for point in points:
    err = distance(point, line)
    total += (err * err)
MSE = total / len(points)
    

MSE is not the only loss function.

  • There are many
  • Some work better than others depending on:
    • Kind of data
    • Kind of ML task
    • How you want penalize (or weight) certain kinds of error
  • Mean absolute error (MAE) - doesn't penalize large errors
  • Likelihood Loss: good for comparing different models
  • Hinge Loss: good for classification

There is a solution that will find the slope (m) and y-intercept (b) of a line that minimizes MSE. It is called Ordinary least squares (OLS) regression

\[\begin{aligned} m = \frac{\sum_{(x_{i} - \overline{x})(y_{i} - \overline{y})}}{\sum_(x_{i} - \overline{x})^2} \\ b = \overline{y} - m * \overline{x} \\ \end{aligned} \]
  • OLS is easy because it is a closed-form solution
  • You have a formula, you plug in the values, you get an answer
  • It is also very fast to compute (with you have a univariate solution)
  • It only useful for linear regression
  • What if we have a multivariate problem? (e.g. age, and height to predict weight)
  • What if the solution is non-linear?

Gradient Descent

Another (more general method) to minimize a loss function like MSE

  • Faster to compute with multivariate regression
  • Can minimize loss function for non-linear solutions too

When the slope is zero, we know we have minimized our loss function

As we get closer, our steps get smaller and smaller

There are two main categories of machine learning algorithms

  1. Supervised Learning
  2. Unsupervised Learning

Supervised Learning

  • Algorithms that "learn" from past information
  • Needs training data
  • Linear regression in an example
    • We gave it training data (height/width measurements)
    • It learned to predict height based of width from this data

Supervised Learning Tasks

  • Prediction (AKA regression)
    • What is our forecasted earnings growth?
    • What is the likelihood it will rain?
    • What is the most effective dosage for this patient?
    • We are predicting continuous values
  • Classification
    • Image classification
    • Handwriting recognition
    • Is this email spam or not? (binary)
    • We are predicting discrete values

Bad training data makes bad models

Unsupervised Learning

  • No labelled training data to learn from, no prediction to be made
  • Primarily about looking for patterns in data
  • Can we group things together?
  • Are there hidden relationships among things?

Clustering

  • Group or organize similar objects together
  • Can we draw boundaries around objects in N-dimensional space?