Uploaded on Aug 29, 2018
Data Science is an extremely high-in-demand profession which requires a professional to possess sound knowledge of analysing data in all dimensions and uncover the unseen truth coupled with the logic and domain knowledge to impact the top-line (increase business) and bottom-line (increase revenue). ExcelR’s Data Science curriculum is meticulously designed and delivered matching the industry needs and considered to be the best in the industry
data science certification in pune
© 2013 ExcelR Solutions. All Rights Reserved Advanced Regression AGENDA Mul)nomia l Regression Zero Inflated Poisson Regression Nega)ve Binomial © 2013 ExcelR Solutions. All Rights Reserved Multinomial Regression • Logis'c regression (Binomial distribu'on) is used when output has ‘2’ categories • Mul'nomial regression (classifica'on model) is used when output has > ‘2’ categories • Extension to logis'c regression • No natural ordering of categories • Response variable has > ‘2’ categories & hence we apply mul'logit • Understand the impact of cost & 'me on the various modes of transport Mode of transport Car Carpool Bus Rail All modes Count 218 32 81 122 453 Probability 0.48 0.07 0.18 0.27 1 © 2013 ExcelR Solutions. All Rights Reserved Multinomial Regression • Whether we have ‘Y’ (response) or ‘X’ (predictor), which is categorical with ‘s’ categories ü Lowest in numerical / lexicographical value is chosen as baseline / reference ü Missing level in output is baseline level ü We can choose the baseline level of our choice based on ‘relevel’ func'on in R ü Model formulates the rela'onship between transformed (logit) Y & numerical X linearly ü Modeling quan'ta've variables linearly might not always be correct © 2013 ExcelR Solutions. All Rights Reserved Multinomial Regression - Output Itera'on History: • Itera've procedure is used to compute maximum likelihood es'mates • # itera'ons & convergence status is provided • -2logL = 2 * nega've log likelihood • -2logL has χ2 distribu'on, which is used for hypothesis tes'ng of goodness of fit # parameters = 27 © 2013 ExcelR Solutions. All Rights Reserved Multinomial Regression - Output Log(P(choice = carpool | x) / P(choice = car | x) = β20 + β21 * cost.car + β22 * cost.carpool + ……………. This equa'on compares the log of probabili'es of carpool to car • ‘car’ has been chosen as baseline • x = vector represen'ng the values of all inputs • The regression coefficient 0.636 indicates that for a ‘1’ unit increases the ‘cost.car’, the log odds of ‘carpool’ to ‘car’ increases by 0.636 • Intercept value does not mean anything in this context • If we have a categorical X also, say Gender (female = 0, male = 1), then regression coefficient (say 0.22) indicates that rela've to females, males increase the log odds of ‘carpool’ to ‘car’ by 0.22 © 2013 ExcelR Solutions. All Rights Reserved Probability • Let p = p(x | A) be the probability of any event (say airi'on) under condi'on A (say gender = female) • Then p(x | A) ÷ (1 - p(x | A) is called the odds associated with the event Odds • If there are two condi'ons A (gender = female) & B (gender = male) then the ra'o p(x | A) ÷ (1 - p(x | A) / p(x | B) ÷ (1 - p(x | B) is called as odds ra'o of A with respect to B Odds Ratio • p(x | A) ÷ p(x | B) is called as rela've risk Relative Risk hips://en.wikipedia.org/wiki/Rela've_risk © 2013 ExcelR Solutions. All Rights Reserved • Odds ra'o is computed from the coefficients in the linear model equa'on by simply exponen'a'ng • Exponen'ated regression coefficients are odds ra'o for a unit change in a predictor variable • The odds ra'o for a unit increase in cost.car is 1.88 for choosing carpool vs car Odds Ratio © 2013 ExcelR Solutions. All Rights Reserved Goodness of fit Linear GLM Analysis of Variance Analysis of Deviance Residual Deviance Residual Sum of Squares OLS Maximum Likelihood • Residual Deviance is -2 log L • Adding more parameters to the model will reduce Residual Deviance even if it is not going to be useful for predic'on • In order to control this, penalty of “2 * number of parameters” is added to to Residual deviance • This penalized value of -2 log L is called as AIC criterion • AIC = -2 log L + 2 * number of parameters Note: “Mul'logit Model with Interac(on”
Comments