data science certification in pune


Excelrsubmissions

Uploaded on Aug 29, 2018

Category Education

Data Science is an extremely high-in-demand profession which requires a professional to possess sound knowledge of analysing data in all dimensions and uncover the unseen truth coupled with the logic and domain knowledge to impact the top-line (increase business) and bottom-line (increase revenue). ExcelR’s Data Science curriculum is meticulously designed and delivered matching the industry needs and considered to be the best in the industry

Category Education

Comments

                     

data science certification in pune

© 2013 ExcelR Solutions. All Rights Reserved Advanced Regression AGENDA Mul)nomia l Regression Zero Inflated Poisson Regression Nega)ve Binomial © 2013 ExcelR Solutions. All Rights Reserved Multinomial Regression •  Logis'c regression (Binomial distribu'on) is used when output has ‘2’ categories •  Mul'nomial regression (classifica'on model) is used when output has > ‘2’ categories •  Extension to logis'c regression •  No natural ordering of categories •  Response variable has > ‘2’ categories & hence we apply mul'logit •  Understand the impact of cost & 'me on the various modes of transport Mode of transport Car Carpool Bus Rail All modes Count 218 32 81 122 453 Probability 0.48 0.07 0.18 0.27 1 © 2013 ExcelR Solutions. All Rights Reserved Multinomial Regression •  Whether we have ‘Y’ (response) or ‘X’ (predictor), which is categorical with ‘s’ categories ü  Lowest in numerical / lexicographical value is chosen as baseline / reference ü  Missing level in output is baseline level ü  We can choose the baseline level of our choice based on ‘relevel’ func'on in R ü  Model formulates the rela'onship between transformed (logit) Y & numerical X linearly ü  Modeling quan'ta've variables linearly might not always be correct © 2013 ExcelR Solutions. All Rights Reserved Multinomial Regression - Output Itera'on History: •  Itera've procedure is used to compute maximum likelihood es'mates •  # itera'ons & convergence status is provided •  -2logL = 2 * nega've log likelihood •  -2logL has χ2 distribu'on, which is used for hypothesis tes'ng of goodness of fit # parameters = 27 © 2013 ExcelR Solutions. All Rights Reserved Multinomial Regression - Output Log(P(choice = carpool | x) / P(choice = car | x) = β20 + β21 * cost.car + β22 * cost.carpool + ……………. This equa'on compares the log of probabili'es of carpool to car •  ‘car’ has been chosen as baseline •  x = vector represen'ng the values of all inputs •  The regression coefficient 0.636 indicates that for a ‘1’ unit increases the ‘cost.car’, the log odds of ‘carpool’ to ‘car’ increases by 0.636 •  Intercept value does not mean anything in this context •  If we have a categorical X also, say Gender (female = 0, male = 1), then regression coefficient (say 0.22) indicates that rela've to females, males increase the log odds of ‘carpool’ to ‘car’ by 0.22 © 2013 ExcelR Solutions. All Rights Reserved Probability •  Let p = p(x | A) be the probability of any event (say airi'on) under condi'on A (say gender = female) •  Then p(x | A) ÷ (1 - p(x | A) is called the odds associated with the event Odds •  If there are two condi'ons A (gender = female) & B (gender = male) then the ra'o p(x | A) ÷ (1 - p(x | A) / p(x | B) ÷ (1 - p(x | B) is called as odds ra'o of A with respect to B Odds Ratio •  p(x | A) ÷ p(x | B) is called as rela've risk Relative Risk hips://en.wikipedia.org/wiki/Rela've_risk © 2013 ExcelR Solutions. All Rights Reserved •  Odds ra'o is computed from the coefficients in the linear model equa'on by simply exponen'a'ng •  Exponen'ated regression coefficients are odds ra'o for a unit change in a predictor variable •  The odds ra'o for a unit increase in cost.car is 1.88 for choosing carpool vs car Odds Ratio © 2013 ExcelR Solutions. All Rights Reserved Goodness of fit Linear GLM Analysis of Variance Analysis of Deviance Residual Deviance Residual Sum of Squares OLS Maximum Likelihood •  Residual Deviance is -2 log L •  Adding more parameters to the model will reduce Residual Deviance even if it is not going to be useful for predic'on •  In order to control this, penalty of “2 * number of parameters” is added to to Residual deviance •  This penalized value of -2 log L is called as AIC criterion •  AIC = -2 log L + 2 * number of parameters Note: “Mul'logit Model with Interac(on”