Lucy D’Agostino McGowan
full_magnolia_data <- full_magnolia_data %>%
group_by(id) %>%
summarise(max_length = max(leaf_length),
min_length = min(leaf_length),
mean_length = mean(leaf_length),
mean_width = mean(leaf_width)) %>%
mutate(inches = ifelse(max_length < 10, 1, 0),
inches = ifelse(min_length < 2, 1, 0),
flipped = ifelse(mean_length < mean_width, 1, 0)) %>%
left_join(full_magnolia_data, by = "id") %>%
select(-max_length, -mean_length, - mean_width) %>%
mutate(leaf_length2 = ifelse(flipped, leaf_width, leaf_length),
leaf_width = ifelse(flipped, leaf_length, leaf_width),
leaf_length = leaf_length2)
What if I want to know the relationship between leaf length and leaf width of the magnolias on the Mag Quad?
How can we quantify how much we’d expect the slope to differ from one random sample to another?
How can we quantify how much we’d expect the slope to differ from one random sample to another?
Call:
lm(formula = leaf_length ~ leaf_width, data = magnolia_data)
Residuals:
Min 1Q Median 3Q Max
-4.4424 -1.7942 -0.9585 1.0470 9.0647
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.8369 1.4507 1.266 0.216
leaf_width 1.2756 0.2645 4.822 4.51e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.243 on 28 degrees of freedom
Multiple R-squared: 0.4537, Adjusted R-squared: 0.4342
F-statistic: 23.26 on 1 and 28 DF, p-value: 4.507e-05
We need a test statistic that incorporates \(\hat{\beta}_1\) and the standard error
Call:
lm(formula = leaf_length ~ leaf_width, data = magnolia_data)
Residuals:
Min 1Q Median 3Q Max
-4.4424 -1.7942 -0.9585 1.0470 9.0647
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.8369 1.4507 1.266 0.216
leaf_width 1.2756 0.2645 4.822 4.51e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.243 on 28 degrees of freedom
Multiple R-squared: 0.4537, Adjusted R-squared: 0.4342
F-statistic: 23.26 on 1 and 28 DF, p-value: 4.507e-05
How do we interpret this?
Call:
lm(formula = leaf_length ~ leaf_width, data = magnolia_data)
Residuals:
Min 1Q Median 3Q Max
-4.4424 -1.7942 -0.9585 1.0470 9.0647
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.8369 1.4507 1.266 0.216
leaf_width 1.2756 0.2645 4.822 4.51e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.243 on 28 degrees of freedom
Multiple R-squared: 0.4537, Adjusted R-squared: 0.4342
F-statistic: 23.26 on 1 and 28 DF, p-value: 4.507e-05
How do we know what values of this statistic are worth paying attention to?
Call:
lm(formula = leaf_length ~ leaf_width, data = magnolia_data)
Residuals:
Min 1Q Median 3Q Max
-4.4424 -1.7942 -0.9585 1.0470 9.0647
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.8369 1.4507 1.266 0.216
leaf_width 1.2756 0.2645 4.822 4.51e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.243 on 28 degrees of freedom
Multiple R-squared: 0.4537, Adjusted R-squared: 0.4342
F-statistic: 23.26 on 1 and 28 DF, p-value: 4.507e-05
How do get a confidence interval for \(\hat{\beta}_1\)? What function can we use in R?
How do we interpret this value?
Application Exercise
appex-08.qmd
leaf_length
and leaf_width
in your data05:00
Call:
lm(formula = leaf_length ~ leaf_width, data = magnolia_data)
Residuals:
Min 1Q Median 3Q Max
-4.4424 -1.7942 -0.9585 1.0470 9.0647
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.8369 1.4507 1.266 0.216
leaf_width 1.2756 0.2645 4.822 4.51e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.243 on 28 degrees of freedom
Multiple R-squared: 0.4537, Adjusted R-squared: 0.4342
F-statistic: 23.26 on 1 and 28 DF, p-value: 4.507e-05
Is \(\hat\beta_1\) different from 0?
Call:
lm(formula = leaf_length ~ leaf_width, data = magnolia_data)
Residuals:
Min 1Q Median 3Q Max
-4.4424 -1.7942 -0.9585 1.0470 9.0647
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.8369 1.4507 1.266 0.216
leaf_width 1.2756 0.2645 4.822 4.51e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.243 on 28 degrees of freedom
Multiple R-squared: 0.4537, Adjusted R-squared: 0.4342
F-statistic: 23.26 on 1 and 28 DF, p-value: 4.507e-05
Is \(\beta_1\) different from 0? (notice the lack of the hat!)
The probability of observing a statistic as extreme or more extreme than the observed test statistic given the null hypothesis is true
Call:
lm(formula = leaf_length ~ leaf_width, data = magnolia_data)
Residuals:
Min 1Q Median 3Q Max
-4.4424 -1.7942 -0.9585 1.0470 9.0647
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.8369 1.4507 1.266 0.216
leaf_width 1.2756 0.2645 4.822 4.51e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.243 on 28 degrees of freedom
Multiple R-squared: 0.4537, Adjusted R-squared: 0.4342
F-statistic: 23.26 on 1 and 28 DF, p-value: 4.507e-05
What is the p-value? What is the interpretation?
Call:
lm(formula = leaf_length ~ leaf_width, data = magnolia_data)
Residuals:
Min 1Q Median 3Q Max
-4.4424 -1.7942 -0.9585 1.0470 9.0647
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.8369 1.4507 1.266 0.216
leaf_width 1.2756 0.2645 4.822 4.51e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.243 on 28 degrees of freedom
Multiple R-squared: 0.4537, Adjusted R-squared: 0.4342
F-statistic: 23.26 on 1 and 28 DF, p-value: 4.507e-05
Do we reject the null hypothesis?
Application Exercise
appex-08.qmd
leaf_length
and leaf_width
with your data02:00