Data Science Institute In Hyderabad

Data Science Institute in Hyderabad

245 views
Embed
Email
From
Username or Email (please add comma after each username or email)
Name	Email
Back
Menu 3

Eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo.
Bhar8073

Uploaded on Oct 14, 2018
Category Education
ExcelR is one of the leading training providing institute for Data Science in Hyderabad.Our trained professionals from IIM's,IIT's and other renowned institutes guide the students through various real life problems and help them get an understanding as to how to approach them.
Category Education
Comments

                     Data Science Institute in Hyderabad
                     
MapReduce and the art of “Thinking Paralle”



Machine Learning


k-Nearest Neighbor 
Classifiers


1-Nearest Neighbor Classifier

Training Examples (Instances)
Some for each CLASS Test Examples(What class to assign this?)


1-Nearest Neighbor

x

http://www.math.le.ac.uk/people/ag153/homepage/KNN/OliverKNN_Talk.pdf


2-Nearest Neighbor

?


3-Nearest Neighbor

X


8-Nearest Neighbor

X


Controlling COMPLEXITY in k-NN










Measuring similarity with 
distance

Locating the tomato's nearest neighbors requires a distance 
function, or a formula that measures the similarity between the two 
instances.

There are many different ways to calculate distance. Traditionally, the 
k-NN
algorithm uses Euclidean distance, which is the distance one would 
measure if it were possible to use a ruler to connect two points, 
illustrated in the previous
figure by the dotted lines connecting the tomato to its neighbors.


Euclidean distance

Euclidean distance is specified by the following formula, where p and q 
are the
examples to be compared, each having n features. The term p1 refers to 
the value
of the first feature of example p, while q1 refers to the value of the first 
feature of
example q:


Application of KNN

Which Class Tomoto belongs to given the feature values:

Tomato (sweetness = 6, crunchiness = 4), 


K = 3, 5, 7, 9


K = 11,13,15,17


Bayesian Classifiers


Understanding probability 
The probability of an event is estimated from the observed data 
by dividing the number of trials in which the event occurred by 
the total number of trials

 For instance, if it rained 3 out of 10 days with similar 
conditions as today, the probability of rain today can be 
estimated as 3 / 10 = 0.30 or 30 percent. 

Similarly, if 10 out of 50 prior email messages were spam, 
then the probability of any incoming message being spam can 
be estimated as 10 / 50 = 0.20 or 20 percent.

Note: The probability of all the possible outcomes of a trial must always sum to 1 

For example, given the value P(spam) = 0.20, we can calculate 
P(ham) = 1 – 0.20 = 0.80


For example, given the value P(spam) = 0.20, we can calculate 
P(ham) = 1 – 0.20 = 0.80

Understanding probability cont.. 

Because an event cannot simultaneously happen and not happen, an 
event is always mutually exclusive and exhaustive with its 
complement
The complement of event A is typically denoted Ac or A'. 

Additionally, the shorthand notation P(¬A) can used to denote the 
probability of event A not occurring, as in P(¬spam) = 0.80. This 
notation is equivalent to P(Ac).


Understanding joint probability 

Often, we are interested in monitoring several nonmutually exclusive 
events for the same trial

Spam
20%

Lotter
y 5%

Ham
80%

All 
emails


Lottery 
without 
appearin
g in 
Spam

Lottery 
appearin
g in Ham

Lottery appearing in 
Spam

Understanding joint probability 

Estimate the probability that both P(spam) and P(Spam) occur, which can be written as P(spam 
∩ Lottery). the notation A ∩ B refers to the event in which both A and B occur. 


Calculating P(spam ∩ Lottery) depends on the joint probability of the two 
events or how the probability of one event is related to the probability of 
the other.
 If the two events are totally unrelated, they are called 
independent events
 If P(spam) and P(Lottery) were independent, we could easily 
calculate P(spam ∩ Lottery), the probability of both events 
happening at the same time. 

Because 20 percent of all the messages are spam, and 5 percent of 
all the e-mails contain the word Lottery, we could assume that 1 
percent of all messages are spam with the term Lottery.

 More generally, for independent events A and B, the probability of 
both happening can be expressed as P(A ∩ B) = P(A) * P(B). 

0.05 * 0.20 = 0.01


Bayes Rule

 Bayes Rule: The most important Equation in ML!!

P Class Data   P Class  P DataClass P Data 

Posterior Probability
(Probability of class AFTER seeing the data)

Class Prior Data Likelihood given Class

Data Prior (Marginal)


Naïve Bayes Classifier


Conditional Independence

 Simple Independence between two variables:

 Class Conditional Independence assumption:

P X1, X2   P X1  P X2 

P X1, X2   P X1  P X2 
P X1, X2 C   P X1 C  P X2 C 

Fever Body Ache

Viral 
Infection

P Fever,  BodyAche   P Fever  P BodyAche 

P Fever,  BodyAcheViral   P Fever Viral  P BodyAcheViral 


Naïve Bayes Classifier 
Conditional Independence among variables given 
Classes!

 Simplifying assumption

 Baseline model especially when large number of features

 Taking log and ignoring denominator: 

P C X1, X2 ,..., XD   P C  P X1, X2 ,..., XD C P X1, X2 ,..., XD C ' 
C
 

P C  P Xd C 
d1

D


P Xd C 

d1

D


C


log P C X1, X2 ,..., XD    log P C    log
d1

D

 P Xd C  


Naïve Bayes Classifier for
Categorical Valued Variables


Let’s Naïve Bayes!

#EXMP
LS COLOR SHAPE

LIK
E

20 Red Square Y
10 Red Circle Y
10 Red Triangle N
10 Green Square N
5 Green Circle Y
5 Green Triangle N

10 Blue Square N
10 Blue Circle N
20 Blue Triangle Y

log P C X1, X2 ,..., XD    log P C    log
d1

D

 P Xd C  

Class Prior Parameters:
P Like  Y   ???
P Like  N   ???

Class Conditional Likelihoods
P Color  Red Like  Y   ????
P Color  Red Like  N   ????
...

P Shape  Triangle Like  N   ????


Naïve Bayes Classifier for
Text Classifier


Text Classification Example

 Doc1 = {buy two shirts get one shirt half off}

 Doc2 = {get a free watch. send your contact details now}

 Doc3 = {your flight to chennai is delayed by two hours}

 Doc4 = {you have three tweets from @sachin}

Four Class Problem:

 Spam,  

 Promotions, 

 Social, 

 Main

P promodoc1   0.84
P spamdoc2   0.94
P maindoc3   0.75
P social doc4   0.91


Bag-of-Words Representation

 Structured (e.g. Multivariate) data – fixed number of 
features

 Unstructured (e.g. Text) data
 arbitrary length documents, 
 high dimensional feature space (many words in 

vocabulary),
 Sparse (small fraction of vocabulary words present in a doc.)

 Bag-of-Words Representation:
 Ignore Sequential order of words
 Represent as a Weighted-Set – Term Frequency of each term• RawDoc = {buy two shirts get one shirt half off}
• Stemming = {buy two shirt get one shirt half off}
• BoW’s = {buy:1, two:1, shirt:2, get:1, one:1, half:1, 

off:1}


Naïve Bayes Classifier with BoW

 Make an “independence assumption” about words | 
class

BoW = {buty:1, two:1, shirt:2, get:1, one:1, half:1, 
off:1}

P doc1 promo 
 P buy:1,  two:1,  shirt :2,  get :1,  one:1,  half :1,  off :1 promo 

 P buy:1 promo  P two:1 promo  P shirt : 2 promo 
   P get :1 promo  P one:1 promo  P free:1 promo 

 P buy promo  1 P two promo  1 P shirt promo  2

   P get promo  1 P one promo  1 P free promo  1


Naïve Bayes Text Classifiers

 Log Likelihood of document given class.

 Parameters in Naïve Bayes Text classifiers:

doc tf wm   m1
M

tf wm   Number of times word wm occurs in doc
P doc class   P w1 class  tf w1  P w2 class  tf w2  ...P wM class  tf wM 

P wm c   Probability that word wm occurs in documents of class c
P shirt promo  ,P freespam  ,P buyspam  ,P buy promo  ,...
Number of parameters = ??


doc1
doc2
...
docm

doc1
doc2
doc3
...
...
docn

 Likelihood of a word given class. For each word, each class.

 Estimating these parameters from data:

Naïve Bayes Parameters

P wm c   Probability that word wm occurs in documents of class c

N wm,c   Number of times word wm occurs in documents of class c

N free, promo   tf freedoc 
docpromo


N free,spam   tf freedoc 
docspam



Bayesian Classifier
Multi-variate real-valued 
data


Bayes Rule

P Class Data   P Class  P DataClass P Data 

Posterior Probability
(Probability of class AFTER seeing the data)

Class Prior Data Likelihood given Class

Data Prior (Marginal)

P cx   P c  P xc P x  xR
D


Simple Bayesian Classifier

P cx   P c  P xc P x 

Sum: P xc  dx
xRD
  1

Mean: xP xc  dx
xRD
  mc

Co-Variance: xmc  xmc  T P xc  dx
xRD
  c

P xc    xmc,c   1
2 

D
2 c

1
2

exp  1
2

xmc  T c1 xmc 



Each Class Conditional 
Probability is assumed to be a 

Uni-Modal (Single Cloud) 
(NORMAL) Distribution


Controlling COMPLEXITY