flowchart LR A[Choose] --> B[Fit] B --> C[Assess] C --> D[Use] C --> A
Lucy D’Agostino McGowan
response | predictor(s) | model |
---|---|---|
quantitative | one quantitative | simple linear regression |
quantitative | two or more (of either kind) | multiple linear regression |
binary | one (of either kind) | simple logistic regression |
binary | two or more (of either kind) | multiple logistic regression |
response | predictor(s) | model |
---|---|---|
quantitative | one quantitative | simple linear regression |
quantitative | two or more (of either kind) | multiple linear regression |
binary | one (of either kind) | simple logistic regression |
binary | two or more (of either kind) | multiple logistic regression |
variables | predictor | ordinary regression | logistic regression |
---|---|---|---|
one: \(x\) | \(\beta_0 + \beta_1 x\) | Response \(y\) | \(\textrm{logit}(\pi)=\log\left(\frac{\pi}{1-\pi}\right)\) |
several: \(x_1,x_2,\dots,x_k\) | \(\beta_0 + \beta_1x_1 + \dots+\beta_kx_k\) | Response \(y\) | \(\textrm{logit}(\pi)=\log\left(\frac{\pi}{1-\pi}\right)\) |
Form | Model |
---|---|
Logit form | \(\log\left(\frac{\pi}{1-\pi}\right) = \beta_0 + \beta_1x_1 + \beta_2x_2 + \dots \beta_kx_k\) |
Probability form | \(\Large\pi = \frac{e^{\beta_0 + \beta_1x_1 + \beta_2x_2 + \dots \beta_kx_k}}{1+e^{\beta_0 + \beta_1x_1 + \beta_2x_2 + \dots \beta_kx_k}}\) |
flowchart LR A[Choose] --> B[Fit] B --> C[Assess] C --> D[Use] C --> A
Call: glm(formula = Acceptance ~ MCAT + GPA, family = "binomial", data = MedGPA)
Coefficients:
(Intercept) MCAT GPA
-22.3727 0.1645 4.6765
Degrees of Freedom: 54 Total (i.e. Null); 52 Residual
Null Deviance: 75.79
Residual Deviance: 54.01 AIC: 60.01
Call:
glm(formula = Acceptance ~ MCAT + GPA, family = "binomial", data = MedGPA)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.7132 -0.8132 0.3136 0.7663 1.9933
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -22.3727 6.4538 -3.467 0.000527 ***
MCAT 0.1645 0.1032 1.595 0.110786
GPA 4.6765 1.6416 2.849 0.004389 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 75.791 on 54 degrees of freedom
Residual deviance: 54.014 on 52 degrees of freedom
AIC: 60.014
Number of Fisher Scoring iterations: 5
How do we get a confidence interval?
Call:
glm(formula = Acceptance ~ MCAT + GPA, family = "binomial", data = MedGPA)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.7132 -0.8132 0.3136 0.7663 1.9933
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -22.3727 6.4538 -3.467 0.000527 ***
MCAT 0.1645 0.1032 1.595 0.110786
GPA 4.6765 1.6416 2.849 0.004389 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 75.791 on 54 degrees of freedom
Residual deviance: 54.014 on 52 degrees of freedom
AIC: 60.014
Number of Fisher Scoring iterations: 5
How do we convert this to an odds ratio from the log odds scale?
How do we convert this to an odds ratio from the log odds scale?
What are the assumptions of multiple logistic regression?
How do you determine whether the conditions are met?
How do you determine whether the conditions are met?
If I have two nested models, how do you think I can determine if the full model is significantly better than the reduced?
Call:
glm(formula = Acceptance ~ GPA, family = binomial, data = MedGPA)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.7805 -0.8522 0.4407 0.7819 2.0967
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -19.207 5.629 -3.412 0.000644 ***
GPA 5.454 1.579 3.454 0.000553 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 75.791 on 54 degrees of freedom
Residual deviance: 56.839 on 53 degrees of freedom
AIC: 60.839
Number of Fisher Scoring iterations: 4
Call:
glm(formula = Acceptance ~ GPA + MCAT, family = binomial, data = MedGPA)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.7132 -0.8132 0.3136 0.7663 1.9933
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -22.3727 6.4538 -3.467 0.000527 ***
GPA 4.6765 1.6416 2.849 0.004389 **
MCAT 0.1645 0.1032 1.595 0.110786
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 75.791 on 54 degrees of freedom
Residual deviance: 54.014 on 52 degrees of freedom
AIC: 60.014
Number of Fisher Scoring iterations: 5
Call:
glm(formula = Acceptance ~ GPA, family = binomial, data = MedGPA)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.7805 -0.8522 0.4407 0.7819 2.0967
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -19.207 5.629 -3.412 0.000644 ***
GPA 5.454 1.579 3.454 0.000553 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 75.791 on 54 degrees of freedom
Residual deviance: 56.839 on 53 degrees of freedom
AIC: 60.839
Number of Fisher Scoring iterations: 4
Call:
glm(formula = Acceptance ~ GPA + MCAT, family = binomial, data = MedGPA)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.7132 -0.8132 0.3136 0.7663 1.9933
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -22.3727 6.4538 -3.467 0.000527 ***
GPA 4.6765 1.6416 2.849 0.004389 **
MCAT 0.1645 0.1032 1.595 0.110786
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 75.791 on 54 degrees of freedom
Residual deviance: 54.014 on 52 degrees of freedom
AIC: 60.014
Number of Fisher Scoring iterations: 5
Call:
glm(formula = Acceptance ~ GPA, family = binomial, data = MedGPA)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.7805 -0.8522 0.4407 0.7819 2.0967
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -19.207 5.629 -3.412 0.000644 ***
GPA 5.454 1.579 3.454 0.000553 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 75.791 on 54 degrees of freedom
Residual deviance: 56.839 on 53 degrees of freedom
AIC: 60.839
Number of Fisher Scoring iterations: 4
Call:
glm(formula = Acceptance ~ GPA + MCAT + Apps, family = binomial,
data = MedGPA)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.6949 -0.8309 0.2900 0.7926 1.8238
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -23.68942 7.02387 -3.373 0.000744 ***
GPA 4.86062 1.69441 2.869 0.004123 **
MCAT 0.17287 0.10537 1.641 0.100867
Apps 0.04379 0.07617 0.575 0.565412
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 75.791 on 54 degrees of freedom
Residual deviance: 53.682 on 51 degrees of freedom
AIC: 61.682
Number of Fisher Scoring iterations: 5
How do you interpret these \(\beta\) coefficients?
Call:
glm(formula = Acceptance ~ MCAT + GPA, family = "binomial", data = MedGPA)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.7132 -0.8132 0.3136 0.7663 1.9933
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -22.3727 6.4538 -3.467 0.000527 ***
MCAT 0.1645 0.1032 1.595 0.110786
GPA 4.6765 1.6416 2.849 0.004389 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 75.791 on 54 degrees of freedom
Residual deviance: 54.014 on 52 degrees of freedom
AIC: 60.014
Number of Fisher Scoring iterations: 5
The coefficient for \(x\) is \(\hat\beta\) (95% CI: \(LB_\hat\beta, UB_\hat\beta\)). A one-unit increase in \(x\) yields a \(\hat\beta\) expected change in the log odds of y, holding all other variables constant.
The odds ratio for \(x\) is \(e^{\hat\beta}\) (95% CI: \(e^{LB_\hat\beta}, e^{UB_\hat\beta}\)). A one-unit increase in \(x\) yields a \(e^{\hat\beta}\)-fold expected change in the odds of y, holding all other variables constant.
– | Ordinary regression | Logistic regression |
---|---|---|
test or interval for \(\beta\) | \(t = \frac{\hat\beta}{SE_{\hat\beta}}\) | \(z = \frac{\hat\beta}{SE_{\hat\beta}}\) |
— | t-distribution | z-distribution |
test for nested models | \(F = \frac{\Delta SSModel / p}{SSE_{full} / (n - k - 1)}\) | G = \(\Delta(-2\log\mathcal{L})\) |
— | F-distribution | \(\chi^2\)-distribution |