20 Event Studies

For this chapter, download the following libraries.

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(fixest)
library(wooldridge)

In the examples of difference-in-differences that we saw before, we compared outcomes before and after a treatment for treatment and control groups. The treatment group all started experiencing the treatment at the same time. Event studies is the name for the approach when treated units receive treatment at different times.Event studies estimate dynamic treatment effects by comparing outcomes relative to the timing of the event.

20.1 Notation

We define event time as the number of periods before or after treatment:

\[\begin{equation} Y_{it} = \sum_{k \neq -1} \beta_k \cdot \mathbb{1}\{ \text{event\_time}_{it} = k \} + \alpha_i + \gamma t + \varepsilon_{it} \end{equation}\]

The reference period is ( k = -1 ), the period just before treatment.
\(\alpha_i\): individual fixed effects
\(\gamma_t\): time fixed effects

Each \(\beta_k\) tells us the average difference in the outcome \(k\) periods before or after treatment, relative to the omitted period (\(k = -1\)).

20.1.1 Event Study in Practice: Union Entry and Wages

We’ll use the wagepan dataset from the wooldridge package. This dataset tracks 545 men from 1980 to 1987, recording their wages and union status each year.

data(wagepan)
head(wagepan)

  nr year agric black bus construc ent exper fin hisp poorhlth hours manuf
1 13 1980     0     0   1        0   0     1   0    0        0  2672     0
2 13 1981     0     0   0        0   0     2   0    0        0  2320     0
3 13 1982     0     0   1        0   0     3   0    0        0  2940     0
4 13 1983     0     0   1        0   0     4   0    0        0  2960     0
5 13 1984     0     0   0        0   0     5   0    0        0  3071     0
6 13 1985     0     0   1        0   0     6   0    0        0  2864     0
  married min nrthcen nrtheast occ1 occ2 occ3 occ4 occ5 occ6 occ7 occ8 occ9 per
1       0   0       0        1    0    0    0    0    0    0    0    0    1   0
2       0   0       0        1    0    0    0    0    0    0    0    0    1   1
3       0   0       0        1    0    0    0    0    0    0    0    0    1   0
4       0   0       0        1    0    0    0    0    0    0    0    0    1   0
5       0   0       0        1    0    0    0    0    1    0    0    0    0   1
6       0   0       0        1    0    1    0    0    0    0    0    0    0   0
  pro pub rur south educ tra trad union    lwage d81 d82 d83 d84 d85 d86 d87
1   0   0   0     0   14   0    0     0 1.197540   0   0   0   0   0   0   0
2   0   0   0     0   14   0    0     1 1.853060   1   0   0   0   0   0   0
3   0   0   0     0   14   0    0     0 1.344462   0   1   0   0   0   0   0
4   0   0   0     0   14   0    0     0 1.433213   0   0   1   0   0   0   0
5   0   0   0     0   14   0    0     0 1.568125   0   0   0   1   0   0   0
6   0   0   0     0   14   0    0     0 1.699891   0   0   0   0   1   0   0
  expersq
1       1
2       4
3       9
4      16
5      25
6      36

We define the “event” as the first year a person joins a union.

# Identify first year of union membership
first_union <- wagepan %>%
  group_by(nr) %>%
  filter(union == 1) %>%
  summarise(first_union_year = min(year), .groups = "drop")

# Merge with main data and compute event time
wagepan_evt <- wagepan %>%
  left_join(first_union, by = "nr") %>%
  mutate(event_time = ifelse(!is.na(first_union_year), year - first_union_year, NA))

We’ll keep only individuals who ever joined a union to estimate dynamic effects.

wage_evt <- wagepan_evt %>%
  filter(!is.na(event_time))

20.1.2 Estimate the Event Study Model

We’ll use fixest::feols() to estimate the event study, including individual and year fixed effects.

event_model <- feols(lwage ~ i(event_time, ref = -1) | nr + year, data = wage_evt)

The variable 'event_time::7' has been removed because of collinearity (see $collin.var).

summary(event_model)

OLS estimation, Dep. Var.: lwage
Observations: 2,240
Fixed-effects: nr: 280,  year: 8
Standard-errors: Clustered (nr) 
                Estimate Std. Error   t value Pr(>|t|)    
event_time::-7  0.053626   0.181664  0.295191 0.768067    
event_time::-6  0.261184   0.119646  2.182970 0.029872 *  
event_time::-5  0.096592   0.099593  0.969867 0.332953    
event_time::-4 -0.100424   0.096855 -1.036849 0.300704    
event_time::-3  0.038853   0.066285  0.586150 0.558248    
event_time::-2 -0.070702   0.050560 -1.398389 0.163107    
event_time::0   0.099590   0.042999  2.316111 0.021277 *  
event_time::1   0.080911   0.040040  2.020729 0.044263 *  
event_time::2   0.054518   0.039461  1.381556 0.168214    
event_time::3   0.040313   0.039699  1.015461 0.310766    
event_time::4   0.032080   0.043211  0.742404 0.458467    
event_time::5   0.071639   0.036762  1.948740 0.052329 .  
event_time::6   0.071525   0.034204  2.091129 0.037421 *  
... 1 variable was removed because of collinearity (event_time::7)
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 0.32362     Adj. R2: 0.538958
                Within R2: 0.020262

20.1.3 Plot the Event Study

iplot(event_model, ci = TRUE, ref.line = 0, main = "Event Study: Union Entry and Log Wages")

“Event study plot showing estimates and 95 percent confidence intervals for the effect of union entry on log wages over event time, with a vertical dashed line at time zero

20.1.4 Interpreting the Plot

Pre-trends: Are the coefficients before treatment close to zero? This supports the parallel trends assumption.
Post-treatment effects: Do wages rise or fall after joining a union?
Statistical significance: Look at the confidence intervals.

Event studies are a powerful tool for visualizing causal effects over time—but only if the identifying assumptions hold!

20.1.5 Resources

I closely followed Tilburg Science Hub for this exercise, which is based on the original paper.

20.2 Activity: Event Studies

Read the first two pages of the paper “Punishment to promote prosocial behavior: a field experiment” posted to Canvas > Modules > Module 2 > VS2024.pdf. Summarize in a few sentences the idea behind the experiment in that paper.
Go to Canvas. Download waste.dta from Modules > Module 2 > Datasets. Save the dataset to a convenient folder and open it in your script.
Explore the data. Get a sense for the units of analysis, the variables, whether the data are wide or long, and how many units are in the treatment and control groups. We are interested in the otucome residual_weight, which measures the amount of garbage. For a given route, the variable treatment “turns on” (that is, goes from 0 to 1) when the households on the route receive the announcement of the inspections.
We want to create a variable called weekstart that contains the value of calendar_week for which a particular route went from 0 to 1. Create this variable. This is tricky! Hint: Think about first constructing a variable that is like calendar_week but for all rows when treatment is 0, it takes a very large value. Then, create weekstart by taking the minimum of that variable.
Create a variable called eventtime that is centered (i.e., 0) around when treatment begins, by route. It should be 0 one period before treatment begins (the variable treatment goes from 0 to 1).
Graph the variable eventtime. You can use the below code. Notice that we observe many observations between -20 and 20, but there are some observations observed beyond those periods. This can pose an issue because we want to have enough data at each of our event times to be able to estimate the regression well.

ggplot(data = waste) + 
  geom_bar(aes(x=eventtime)) + 
  labs(x = "Event Time", y = "Count") +
  theme_bw()

Bin event time. You can use cut or ifelse. Create a bin for values less than or equal to -37. Create a bin for greater than or equal to 28. Everything else should remain as it is. That is, the values should be: -37, -36, -35, …, 26, 27, 28. Once you create the variable, turn it into a factor and use relevel to change the reference point to 0. We want to do this because we want to exclude 0 from the regression below. That is, we want all estimated values relative to the value at event time 0.
You should estimate this equation:

\[\begin{equation*} Y_{it} = \beta_0 + \sum_{\tau = -37}^{-1} \alpha_\tau W_{i\tau} + \sum_{\tau=1}^{28} \alpha_\tau W_{i\tau} + \lambda_i + \mu_t + \varepsilon_{it}. \end{equation*}\]

\(Y_{it}\): this is the outcome variable and should be residual_weight
\(\tau\): event time, at \(\tau = 1\), treatment begins
\(W_{i\tau}\): indicator that is 1 if route \(i\) has event time \(\tau\)
\(\lambda_i\): unit fixed effects
\(\mu_t\): time fixed effects

To do this, use your usual lm(). The right hand side should include eventtime_bin, unit, and time fixed effects.

Look at your estimates for eventtime_bin. What do you notice about the estimates as the event time goes from negative to positive?
This is going to be way more striking as a plot. Use the below code to extract the coefficients from the regression into a manageable dataset. There are complex data operations here that we have not learned about in class. If you are interested, try to figure out what each line does. Note that you need the tibble and stringr packages. The object regout is the object from summarizing the output of question 8. Using the critical value 1.96, create two variables in this dataset that take the lower and upper values of the 95% confidence interval.

library(tibble)
library(stringr)

regout <- output$coefficients[str_detect(rownames(output$coefficients), "eventtime_bin"),] %>%
  as.data.frame() %>%
  rownames_to_column(var = "var") %>%
  as_tibble() %>%
  mutate(eventtime = str_extract(var, "-??[:digit:]{1,2}?$"),
         eventtime = as.numeric(eventtime))

names(regout) <- c("varname", "estimate", "stderror", "tvalue", "pvalue", "eventtime")

Create a plot with event time as the horizontal axis and the estimated coefficient as the vertical axis. There should be a point for every estimate at each event time. Use geom_errorbar() to add the 95% confidence intervalus. Note that in aesthetics you will need to specify x as event time, ymin (the minimum y value) and ymax (the maximum y value). You can go to the help for that function and scroll to the bottom to see examples.
What is your takeaway about the effectiveness of this type of intervention to induce people to separate their trash and recycling?