Uploaded on Sep 18, 2018
ExcelR offers 160 hours classroom training on Business Analytics / Data Scientist / Data Analytics. We are considered as one of the best training institutes on Business Analytics in Pune. “Faculty and vast course agenda is our differentiator”. The training is conducted by alumni of premier institutions such as IIT & ISB who has extensive experience in the arena of analytics. They are considered to be one of the best trainers in the industry. The topics covered as part of this Data Scientist Certification program is on par with most of the Master of Science in Analytics (MS in Business Analytics / MS in Data Analytics) programs across the top-notch universities of the globe.
business analytics training in pune
MapReduce and the art of “Thinking Paralle”
Machine Learning
k-Nearest Neighbor
Classifiers
1-Nearest Neighbor Classifier
Training Examples (Instances)
Some for each CLASS Test Examples(What class to assign this?)
1-Nearest Neighbor
x
http://www.math.le.ac.uk/people/ag153/homepage/KNN/OliverKNN_Talk.pdf
2-Nearest Neighbor
?
3-Nearest Neighbor
X
8-Nearest Neighbor
X
Controlling COMPLEXITY in k-NN
Measuring similarity with
distance
Locating the tomato's nearest neighbors requires a distance
function, or a formula that measures the similarity between the two
instances.
There are many different ways to calculate distance. Traditionally, the
k-NN
algorithm uses Euclidean distance, which is the distance one would
measure if it were possible to use a ruler to connect two points,
illustrated in the previous
figure by the dotted lines connecting the tomato to its neighbors.
Euclidean distance
Euclidean distance is specified by the following formula, where p and q
are the
examples to be compared, each having n features. The term p1 refers to
the value
of the first feature of example p, while q1 refers to the value of the first
feature of
example q:
Application of KNN
Which Class Tomoto belongs to given the feature values:
Tomato (sweetness = 6, crunchiness = 4),
K = 3, 5, 7, 9
K = 11,13,15,17
Bayesian Classifiers
Understanding probability
The probability of an event is estimated from the observed data
by dividing the number of trials in which the event occurred by
the total number of trials
For instance, if it rained 3 out of 10 days with similar
conditions as today, the probability of rain today can be
estimated as 3 / 10 = 0.30 or 30 percent.
Similarly, if 10 out of 50 prior email messages were spam,
then the probability of any incoming message being spam can
be estimated as 10 / 50 = 0.20 or 20 percent.
Note: The probability of all the possible outcomes of a trial must always sum to 1
For example, given the value P(spam) = 0.20, we can calculate
P(ham) = 1 – 0.20 = 0.80
For example, given the value P(spam) = 0.20, we can calculate
P(ham) = 1 – 0.20 = 0.80
Understanding probability cont..
Because an event cannot simultaneously happen and not happen, an
event is always mutually exclusive and exhaustive with its
complement
The complement of event A is typically denoted Ac or A'.
Additionally, the shorthand notation P(¬A) can used to denote the
probability of event A not occurring, as in P(¬spam) = 0.80. This
notation is equivalent to P(Ac).
Understanding joint probability
Often, we are interested in monitoring several nonmutually exclusive
events for the same trial
Spam
20%
Lotter
y 5%
Ham
80%
All
emails
Lottery
without
appearin
g in
Spam
Lottery
appearin
g in Ham
Lottery appearing in
Spam
Understanding joint probability
Estimate the probability that both P(spam) and P(Spam) occur, which can be written as P(spam
∩ Lottery). the notation A ∩ B refers to the event in which both A and B occur.
Calculating P(spam ∩ Lottery) depends on the joint probability of the two
events or how the probability of one event is related to the probability of
the other.
If the two events are totally unrelated, they are called
independent events
If P(spam) and P(Lottery) were independent, we could easily
calculate P(spam ∩ Lottery), the probability of both events
happening at the same time.
Because 20 percent of all the messages are spam, and 5 percent of
all the e-mails contain the word Lottery, we could assume that 1
percent of all messages are spam with the term Lottery.
More generally, for independent events A and B, the probability of
both happening can be expressed as P(A ∩ B) = P(A) * P(B).
0.05 * 0.20 = 0.01
Bayes Rule
Bayes Rule: The most important Equation in ML!!
P Class Data P Class P DataClass P Data
Posterior Probability
(Probability of class AFTER seeing the data)
Class Prior Data Likelihood given Class
Data Prior (Marginal)
Naïve Bayes Classifier
Conditional Independence
Simple Independence between two variables:
Class Conditional Independence assumption:
P X1, X2 P X1 P X2
P X1, X2 P X1 P X2
P X1, X2 C P X1 C P X2 C
Fever Body Ache
Viral
Infection
P Fever, BodyAche P Fever P BodyAche
P Fever, BodyAcheViral P Fever Viral P BodyAcheViral
Naïve Bayes Classifier
Conditional Independence among variables given
Classes!
Simplifying assumption
Baseline model especially when large number of features
Taking log and ignoring denominator:
P C X1, X2 ,..., XD P C P X1, X2 ,..., XD C P X1, X2 ,..., XD C '
C
P C P Xd C
d1
D
P Xd C
d1
D
C
log P C X1, X2 ,..., XD log P C log
d1
D
P Xd C
Naïve Bayes Classifier for
Categorical Valued Variables
Let’s Naïve Bayes!
#EXMP
LS COLOR SHAPE
LIK
E
20 Red Square Y
10 Red Circle Y
10 Red Triangle N
10 Green Square N
5 Green Circle Y
5 Green Triangle N
10 Blue Square N
10 Blue Circle N
20 Blue Triangle Y
log P C X1, X2 ,..., XD log P C log
d1
D
P Xd C
Class Prior Parameters:
P Like Y ???
P Like N ???
Class Conditional Likelihoods
P Color Red Like Y ????
P Color Red Like N ????
...
P Shape Triangle Like N ????
Naïve Bayes Classifier for
Text Classifier
Text Classification Example
Doc1 = {buy two shirts get one shirt half off}
Doc2 = {get a free watch. send your contact details now}
Doc3 = {your flight to chennai is delayed by two hours}
Doc4 = {you have three tweets from @sachin}
Four Class Problem:
Spam,
Promotions,
Social,
Main
P promodoc1 0.84
P spamdoc2 0.94
P maindoc3 0.75
P social doc4 0.91
Bag-of-Words Representation
Structured (e.g. Multivariate) data – fixed number of
features
Unstructured (e.g. Text) data
arbitrary length documents,
high dimensional feature space (many words in
vocabulary),
Sparse (small fraction of vocabulary words present in a doc.)
Bag-of-Words Representation:
Ignore Sequential order of words
Represent as a Weighted-Set – Term Frequency of each term• RawDoc = {buy two shirts get one shirt half off}
• Stemming = {buy two shirt get one shirt half off}
• BoW’s = {buy:1, two:1, shirt:2, get:1, one:1, half:1,
off:1}
Naïve Bayes Classifier with BoW
Make an “independence assumption” about words |
class
BoW = {buty:1, two:1, shirt:2, get:1, one:1, half:1,
off:1}
P doc1 promo
P buy:1, two:1, shirt :2, get :1, one:1, half :1, off :1 promo
P buy:1 promo P two:1 promo P shirt : 2 promo
P get :1 promo P one:1 promo P free:1 promo
P buy promo 1 P two promo 1 P shirt promo 2
P get promo 1 P one promo 1 P free promo 1
Naïve Bayes Text Classifiers
Log Likelihood of document given class.
Parameters in Naïve Bayes Text classifiers:
doc tf wm m1
M
tf wm Number of times word wm occurs in doc
P doc class P w1 class tf w1 P w2 class tf w2 ...P wM class tf wM
P wm c Probability that word wm occurs in documents of class c
P shirt promo ,P freespam ,P buyspam ,P buy promo ,...
Number of parameters = ??
doc1
doc2
...
docm
doc1
doc2
doc3
...
...
docn
Likelihood of a word given class. For each word, each class.
Estimating these parameters from data:
Naïve Bayes Parameters
P wm c Probability that word wm occurs in documents of class c
N wm,c Number of times word wm occurs in documents of class c
N free, promo tf freedoc
docpromo
N free,spam tf freedoc
docspam
Bayesian Classifier
Multi-variate real-valued
data
Bayes Rule
P Class Data P Class P DataClass P Data
Posterior Probability
(Probability of class AFTER seeing the data)
Class Prior Data Likelihood given Class
Data Prior (Marginal)
P cx P c P xc P x xR
D
Simple Bayesian Classifier
P cx P c P xc P x
Sum: P xc dx
xRD
1
Mean: xP xc dx
xRD
mc
Co-Variance: xmc xmc T P xc dx
xRD
c
P xc xmc,c 1
2
D
2 c
1
2
exp 1
2
xmc T c1 xmc
Each Class Conditional
Probability is assumed to be a
Uni-Modal (Single Cloud)
(NORMAL) Distribution
Controlling COMPLEXITY
Comments