08:00
Lucy D’Agostino McGowan
Application Exercise
Open the file labeled data-exploration.qmd
Read in the data set into an object called dat
Examine the data, how many observations are there? How many variables? Add the description to your file.
Examine the data dictionary by opening the file called data-dictionary.csv
Find the outcome variable (the retention rate) and your variable of interest specific to your group. Create a plot to visualize the relationship between these variables. Add a description to your file
Does the data look like it needs a transformation? If so, apply one and examine the plot again. Describe this in your file.
08:00
We can build a table to examine the distribution of our variables.
Characteristic | N = 321 |
---|---|
mpg | 19.2 (15.4, 22.8) |
cyl | |
4 | 11 (34%) |
6 | 7 (22%) |
8 | 14 (44%) |
disp | 196 (121, 326) |
hp | 123 (96, 180) |
drat | 3.70 (3.08, 3.92) |
wt | 3.33 (2.58, 3.61) |
qsec | 17.71 (16.89, 18.90) |
vs | 14 (44%) |
am | 13 (41%) |
gear | |
3 | 15 (47%) |
4 | 12 (38%) |
5 | 5 (16%) |
carb | |
1 | 7 (22%) |
2 | 10 (31%) |
3 | 3 (9.4%) |
4 | 10 (31%) |
6 | 1 (3.1%) |
8 | 1 (3.1%) |
1 Median (IQR); n (%) |
Application Exercise
table-one
chunk to examine a Table of your variables. Move your variable of interest to the top of the list so that it is the first rendered in the table.03:00
The ggdag
package can help us display our causal assumptions you drew last week. There are three steps:
dagify
ggdag
function to plot themggdag_adjustment_set
function to determine what you need to add to your final model.library(ggdag)
dag <- dagify(
exposure ~ variable1 + variable2 + variable3 + variable4,
outcome ~ exposure + variable1,
variable1 ~ variable3,
variable2 ~ variable3,
exposure = "exposure",
outcome = "outcome",
latent = "variable3",
labels = c(variable1 = "Variable 1",
variable2 = "Variable 2",
variable3 = "Variable 3",
variable4 = "Variable 4",
exposure = "Exposure",
outcome = "Outcome")
)
🎉
Application Exercise
data-dictionary.csv
file and map the available variables to the names in the equations you developed for homework 2.causal-assumptions.qmd
dagify
code chunk after deleting the # add your equations here
comment. Make sure to separate each equation with a commaggdag
chunk to create the causal diagramadjustment_set
chunk to see what variables you need to adjust for20:00
Loose ends
Git
Panel on the top right to pull in new data. Raise your hand if this doesn’t work.data-exploration.qmd
Render the document. Do you see a figure of your variable on the x-axis and the outcome on the y-axis? If not, create that figure and re-render the documentdata-exploration.qmd
? If not, fill it in.causal-assumptions.qmd
Render the document. Do you see two figures, one with the Causal Diagram and one showing the adjustment set? If not, be sure that you have set eval: true
in all of the chunks Note: if you are getting an error, raise your hand so I can come help out05:00
Application Exercise
data-dictionary.csv
and examine the available variables. Are there any that you didn’t include in your causal diagram that maybe should be included? Add them nowPairs:
causal-assumptions.qmd
file with these changes and observe your final adjustment set.20:00
Application Exercise
final-model.qmd
20:00
Application Exercise
sensitivity-data.qmd
file. This is a sensitivity analysis for including athletic data and US News Ranking dataathletics_dat
. How many observations are there? Fill in the explanation with this number.bball_power_rating
. Does this change your result? Add an explanation of what you see.usnews_dat
. How many observations are there? Fill in the explanation with this number.usnews_ranking
. Does this change your result? Add an explanation of what you see.10:00
Application Exercise
Let’s put this all together!
index.qmd
10:00