Lucy D’Agostino McGowan
Call:
lm(formula = mass ~ height, data = starwars_nojabba)
Coefficients:
(Intercept) height
-32.5408 0.6214
How did we decide on this line?
starwars_nojabba <- starwars_nojabba %>%
mutate(fitted = fitted(lm(mass ~ height, data = starwars_nojabba)))
ggplot(starwars_nojabba, aes(x = height, mass)) +
geom_point(color = "#86a293") +
geom_segment(aes(
x = height,
y = mass,
xend = height,
yend = fitted
),
color = "blue") +
geom_smooth(
method = "lm",
se = FALSE,
formula = "y ~ x",
color = "#86a293"
) +
labs(title = "The relationship between mass and height for Star Wars characters",
caption = "Data from SWAPI (swapi.dev)")
ggplot(starwars_nojabba, aes(x = height, mass)) +
geom_rect(
aes(
xmin = height,
xmax = height + mass - fitted,
ymin = mass,
ymax = fitted
),
fill = "blue",
color = "blue",
alpha = 0.2
) +
geom_smooth(
method = "lm",
se = FALSE,
formula = "y ~ x",
color = "#86a293"
) +
geom_point(color = "#86a293") +
coord_fixed() +
labs(title = "The relationship between mass and height for Star Wars characters",
caption = "Data from SWAPI (swapi.dev)")
ggplot(starwars_nojabba, aes(x = height, mass)) +
geom_rect(
aes(
xmin = height,
xmax = height + mass - fitted,
ymin = mass,
ymax = fitted
),
fill = "blue",
color = "blue",
alpha = 0.2
) +
geom_smooth(
method = "lm",
se = FALSE,
formula = "y ~ x",
color = "#86a293"
) +
geom_point(color = "#86a293") +
coord_fixed() +
labs(title = "The relationship between mass and height for Star Wars characters",
caption = "Data from SWAPI (swapi.dev)")
ggplot(starwars_nojabba, aes(x = height, mass)) +
geom_point(color = "#86a293") +
geom_segment(aes(
x = height,
y = mass,
xend = height,
yend = fitted
),
color = "blue") +
geom_smooth(
method = "lm",
se = FALSE,
formula = "y ~ x",
color = "#86a293"
) +
labs(title = "The relationship between mass and height for Star Wars characters",
caption = "Data from SWAPI (swapi.dev)")
\[\Large \sum(y-\hat{y})^2\]
\[\Large \sum_{i=1}^n(y_i - \hat{y}_i)^2\]
\[\Large e_i = y_i - \hat{y}_i\]
\[\Large e_1 = y_1 - \hat{y}_1\]
Application Exercise
x
and y
. Drag the blue points to change the line.03:00
# A tibble: 58 × 4
mass height y_hat residual
<dbl> <int> <dbl> <dbl>
1 77 172 74.3 2.67
2 75 167 71.2 3.77
3 32 96 27.1 4.89
4 136 202 93.0 43.0
5 49 150 60.7 -11.7
6 120 178 78.1 41.9
7 75 165 70.0 5.02
8 32 97 27.7 4.27
9 84 183 81.2 2.83
10 77 182 80.5 -3.55
# … with 48 more rows
How could I add the residual squared to this data frame?
# A tibble: 58 × 4
mass height y_hat residual
<dbl> <int> <dbl> <dbl>
1 77 172 74.3 2.67
2 75 167 71.2 3.77
3 32 96 27.1 4.89
4 136 202 93.0 43.0
5 49 150 60.7 -11.7
6 120 178 78.1 41.9
7 75 165 70.0 5.02
8 32 97 27.7 4.27
9 84 183 81.2 2.83
10 77 182 80.5 -3.55
# … with 48 more rows
How could I add the residual squared to this data frame?
# A tibble: 58 × 4
mass height y_hat residual_2
<dbl> <int> <dbl> <dbl>
1 77 172 74.3 7.11
2 75 167 71.2 14.2
3 32 96 27.1 23.9
4 136 202 93.0 1851.
5 49 150 60.7 136.
6 120 178 78.1 1759.
7 75 165 70.0 25.2
8 32 97 27.7 18.2
9 84 183 81.2 8.02
10 77 182 80.5 12.6
# … with 48 more rows
How can I summarize this dataset to calculate the sum of the squared residuals?
# A tibble: 58 × 4
mass height y_hat residual_2
<dbl> <int> <dbl> <dbl>
1 77 172 74.3 7.11
2 75 167 71.2 14.2
3 32 96 27.1 23.9
4 136 202 93.0 1851.
5 49 150 60.7 136.
6 120 178 78.1 1759.
7 75 165 70.0 25.2
8 32 97 27.7 18.2
9 84 183 81.2 8.02
10 77 182 80.5 12.6
# … with 48 more rows
How can I summarize this dataset to calculate the sum of the squared residuals?
How can I add the total sample size?
How can I add the total sample size?
How can I add the degrees of freedom \((n-p)\)?
How can I add the degrees of freedom \((n-p)\)?
How can I add the total \(\hat{\sigma}_\varepsilon= \sqrt{\frac{\textrm{SSE}}{df}}\)?
How can I add the total \(\hat{\sigma}_\varepsilon= \sqrt{\frac{\textrm{SSE}}{df}}\)?
# A tibble: 1 × 4
sse n df sigma
<dbl> <int> <dbl> <dbl>
1 20509. 58 56 19.1
lm
output
Call:
lm(formula = mass ~ height, data = starwars_nojabba)
Residuals:
Min 1Q Median 3Q Max
-39.382 -8.212 0.211 3.846 57.327
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -32.54076 12.56053 -2.591 0.0122 *
height 0.62136 0.07073 8.785 4.02e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 19.14 on 56 degrees of freedom
Multiple R-squared: 0.5795, Adjusted R-squared: 0.572
F-statistic: 77.18 on 1 and 56 DF, p-value: 4.018e-12
Application Exercise
PorschePrice
data by running ?PorschePrice
in your ConsolePrice
from Mileage
y_hat
to the PorschePrice
dataset with the predicted y valuesresidual
to the PorschePrice
dataset with the residuals07:00