CMSC 210: Lecture 26

Machine Learning: Classification

Announcements

Assignment 7: Due Wednesday May 18th
Grades are up for assignment #6
Please take a few minutes to complete a course eval
- You should have received an email from StudentCourseEvaluations@umbc.edu with a link to the surveys for classes in which you are enrolled
- Deadline is May 17th

Let's recap regression.

Input Data: univariate regression

Model training

Predicting with the model

Input Data: multivariate regression

Model training

Predicting with the model

Let's talk about classifiers...

Regression is about finding a function that models the relationship between features and predictor
Classification is about finding a function gives us a decision boundary

Naive Bayesian Classifier

Thomas Bayes

What makes the Naive Bayes Classifier Good?

Simple
Intuitive
Works surprisingly well

How does it work?

Let's say we have three categories: 1, 2, and 3
And we have a bunch of features,
X =[x1,x2,x3...xn]
Every x belongs to one of three categories: 1, 2, or 3
We want to calculate probability. Given a single x, what is the probability of it's being a y?
Which value of y has the highest probability? That's the one we choose to classify our x as.
But how do we do that?

Enter Bayes' theorem

\[\begin{aligned} P(Y|X) = \frac{P(X|Y)P(Y)}{P(X)} \end{aligned} \]

We can calculate `P(Y)`

\[\begin{aligned} P(Y\equiv Play) = \frac{\#Play}{\#Play + \#NoPlay} \end{aligned} \]

We can calculate `P(X)`

\[\begin{aligned} P(X\equiv Sunny) = \frac{\#Sunny}{\#Sunny + \#Overcast + \#Rainy} \end{aligned} \]

We can calculate `P(X|Y)`

\[\begin{aligned} P(X \equiv Sunny|Y\equiv Play) = \frac{\#Play \& \#Sunny}{\#Play} \end{aligned} \]

The problem is that this gets huge

For a model with k binary features, there are: \[\begin{aligned} 2^{k+1} \end{aligned} \] ...parameters to calculate
8 parameters are needed for a 2-feature binary dataset
To get around this we pretend all features are independent
This is the "naive" part of the algorithm. Parameters to calculate now: \[\begin{aligned} 2k \end{aligned} \]

k-Nearest Neighbors

A lazy learner

How do we chose a value for k?
This is called a hyperparameter
If k is too low, noise will have higher influence on the result. Overrfitting
High value for k is expensive to compute
Use the elbow method

Machine Learning: Classification

Announcements

Let's recap regression.

Input Data: univariate regression

Model training

Predicting with the model

Input Data: multivariate regression

Model training

Predicting with the model

Let's talk about classifiers...

Naive Bayesian Classifier

What makes the Naive Bayes Classifier Good?

How does it work?

Enter Bayes' theorem

We can calculate P(Y)

We can calculate P(X)

We can calculate P(X|Y)

The problem is that this gets huge

k-Nearest Neighbors

We can calculate `P(Y)`

We can calculate `P(X)`

We can calculate `P(X|Y)`