Lucy D’Agostino McGowan
How can I visualize a single continuous variable?
set.seed(1)
ggplot(full_magnolia_data,
aes(x = leaf_length, y = 1)) +
geom_boxplot() +
geom_jitter() +
geom_jitter(data = magnolia_data, color = "cornflower blue", size = 3) +
labs(x = "Leaf length (cm)") +
theme(axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank())
How can I calculate the average leaf length of the magnolias in my sample in R?
What if I want to know the average leaf length of the magnolias on the Mag Quad?
How can we quantify how much we’d expect the mean to differ from one random sample to another?
How can we quantify how much we’d expect the mean to differ from one random sample to another?
Call:
lm(formula = leaf_length ~ 1, data = magnolia_data)
Residuals:
Min 1Q Median 3Q Max
-9.509 -3.809 -1.029 2.621 13.071
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.129 1.031 16.62 2.32e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 5.645 on 29 degrees of freedom
Application Exercise
INSERT YOUR GOOGLE SPREADSHEET URL HERE
with the Google Spreadsheet URL with your magnolia data.summary
function on the linear model you fit05:00
If we use the same sampling method to select different samples and computed an interval estimate for each sample, we would expect the true population parameter (the average leaf length on the Mag Quad) to fall within the interval estimates 95% of the time.
\[\bar{x} \pm t^∗ \times SE_{\bar{x}}\]
Call:
lm(formula = leaf_length ~ 1, data = magnolia_data)
Residuals:
Min 1Q Median 3Q Max
-9.509 -3.809 -1.029 2.621 13.071
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.129 1.031 16.62 2.32e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 5.645 on 29 degrees of freedom
Why 0.025?
Call:
lm(formula = leaf_length ~ 1, data = magnolia_data)
Residuals:
Min 1Q Median 3Q Max
-9.509 -3.809 -1.029 2.621 13.071
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.129 1.031 16.62 2.32e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 5.645 on 29 degrees of freedom
Why lower.tail = FALSE
?
Call:
lm(formula = leaf_length ~ 1, data = magnolia_data)
Residuals:
Min 1Q Median 3Q Max
-9.509 -3.809 -1.029 2.621 13.071
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.129 1.031 16.62 2.32e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 5.645 on 29 degrees of freedom
Call:
lm(formula = leaf_length ~ 1, data = magnolia_data)
Residuals:
Min 1Q Median 3Q Max
-9.509 -3.809 -1.029 2.621 13.071
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.129 1.031 16.62 2.32e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 5.645 on 29 degrees of freedom
[1] 16.7
[1] 13
If we use the same sampling method to select different samples and computed an interval estimate for each sample, we would expect the true population parameter (the mean) to fall within the interval estimates 95% of the time.
Application Exercise
appex-08.qmd
confint
function05:00