Lucy D’Agostino McGowan
glimpse()
function to see all of your variables and their typesRows: 30
Columns: 3
$ Price <dbl> 69.4, 56.9, 49.9, 47.4, 42.9, 36.9, 83.0, 72.9, 69.9, 67.9, 66…
$ Age <int> 3, 3, 2, 4, 4, 6, 0, 0, 2, 0, 2, 2, 4, 3, 10, 11, 4, 4, 10, 3,…
$ Mileage <dbl> 21.50, 43.00, 19.90, 36.00, 44.00, 49.80, 1.30, 0.67, 13.40, 9…
fct
: “factor” this is a type of categorical variableglimpse()
function to see all of your variables and their typesRows: 87
Columns: 5
$ name <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth Vader", "Leia Or…
$ height <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 182, 188, 180, 2…
$ mass <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, 32.0, 84.0, 77.…
$ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, grey", "brown", N…
$ skin_color <chr> "fair", "gold", "white, blue", "white", "light", "light", "…
chr
: “character” this is a type of categorical variableAn indicator variable uses two values, usually 0 and 1, to indicate whether a data case does (1) or does not (0) belong to a specific category
What does this line of code do?
What does this line of code do?
What if I wanted to model the relationship between TotalPrice
and Color
?
Why is ColorJ
NA
?
Call:
lm(formula = TotalPrice ~ ColorD + ColorE + ColorF + ColorG +
ColorH + ColorI + ColorJ, data = Diamonds)
Coefficients:
(Intercept) ColorD ColorE ColorF ColorG ColorH
1936 3632 2423 7224 7623 6732
ColorI ColorJ
5704 NA
k
categories, always include k-1
What is the reference category?
Call:
lm(formula = TotalPrice ~ ColorD + ColorE + ColorF + ColorG +
ColorH + ColorI, data = Diamonds)
Coefficients:
(Intercept) ColorD ColorE ColorF ColorG ColorH
1936 3632 2423 7224 7623 6732
ColorI
5704
D
compared to color J
increases the expected total price by 3632.E
compared to color J
increases the expected total price by 2423What is the reference category?
Call:
lm(formula = TotalPrice ~ ColorD + ColorE + ColorF + ColorG +
ColorH + ColorI, data = Diamonds)
Coefficients:
(Intercept) ColorD ColorE ColorF ColorG ColorH
1936 3632 2423 7224 7623 6732
ColorI
5704
D
compared to color J
increases the expected total price by 3632.F
?
Call:
lm(formula = TotalPrice ~ Color, data = Diamonds)
Coefficients:
(Intercept) ColorE ColorF ColorG ColorH ColorI
5569 -1209 3592 3990 3100 2071
ColorJ
-3632
What is the reference category?
Call:
lm(formula = TotalPrice ~ Color, data = Diamonds)
Coefficients:
(Intercept) ColorE ColorF ColorG ColorH ColorI
5569 -1209 3592 3990 3100 2071
ColorJ
-3632
E
now?
Call:
lm(formula = TotalPrice ~ Color, data = Diamonds)
Coefficients:
(Intercept) ColorD ColorE ColorF ColorG ColorH
1936 3632 2423 7224 7623 6732
ColorI
5704
What is the reference category?
Call:
lm(formula = Pulse ~ Emergency, data = ICU)
Coefficients:
(Intercept) Emergency
91.11 10.63
Application Exercise
Diamonds
dataset?Clarity
variable in the Diamonds
data?TotalPrice
as the outcome and Clarity
as the explanatory variableSI1
and refit the modelDepth
to your model. How do you interpret the coefficient for this parameter?05:00