19  Fixed Effects and Difference-in-Differences

19.1 Panel Data

Before discussing difference-in-differences, it is important to understand the terminology for data that extends over time.

  • Cross-sectional data. In this type of data, there is one observation per unit. This is the type of data we have been considering in the class so far.
  • Panel data. In this type of data, there are multiple observations per unit corresponding to multiple time periods. This is the type of data we will consider for difference-in-differences. Another name for panel data is longitudinal data.

Like before, we use \(i\) to index the units of the dataset (individuals, households, firms, counties, countries, etc.). Unlike before, we also have to index time periods, \(t\). Time periods can be anything: years, seconds, decades, etc. It is convenient to consider there are \(N\) individuals, so that \(i = 1, 2, \ldots, N\), and \(T\) time periods, so that \(t = 1, 2, \ldots, T\).

The advantage of panel data is that it allows us to consider a “before and after” something changes. We can then see the effect of a policy intervention by comparing what happens after the intervention to what happens before the intervention.

We can load an example panel dataset from the AER package.

library(AER)
Loading required package: car
Loading required package: carData
Loading required package: lmtest
Loading required package: zoo

Attaching package: 'zoo'
The following objects are masked from 'package:base':

    as.Date, as.Date.numeric
Loading required package: sandwich
Loading required package: survival
library(dplyr)

Attaching package: 'dplyr'
The following object is masked from 'package:car':

    recode
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(ggplot2)
library(magrittr)
library(tidyr)

Attaching package: 'tidyr'
The following object is masked from 'package:magrittr':

    extract
data(Fatalities)

# Explore the data (commented to be concise)
# summary(Fatalities)

This dataset contains the number of deaths from traffic accidents in the continental U.S. annually from 1982 to 1988. Before we dive in, answer the following questions about the data.

  1. What is the unit of analysis in the data?
  2. What is the unit of time period in the data?
  3. How many observations and variables are there?
  4. Is the dataset in long or wide format?

A policymaker might be interested in the relationship between traffic fatalities and taxes on alcohol. We can use this dataset to see the relationship. To begin, let us look at a naive correlation between the fatality rate and the tax on beer.

# Calculate the fatality rate per 10,000 people
df <- Fatalities %>%
  mutate(frate = fatal / pop * 10000)

summary(df$frate)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.8212  1.6237  1.9560  2.0404  2.4179  4.2178 
# Correlation in 1982
lm(frate ~ beertax, 
   data = filter(df, year == 1982)) %>% 
  summary()

Call:
lm(formula = frate ~ beertax, data = filter(df, year == 1982))

Residuals:
    Min      1Q  Median      3Q     Max 
-0.9356 -0.4480 -0.1068  0.2295  2.1716 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   2.0104     0.1391  14.455   <2e-16 ***
beertax       0.1485     0.1884   0.788    0.435    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6705 on 46 degrees of freedom
Multiple R-squared:  0.01332,   Adjusted R-squared:  -0.008126 
F-statistic: 0.6212 on 1 and 46 DF,  p-value: 0.4347
# Correlation in 1988
lm(frate ~ beertax, 
   data = filter(df, year == 1988)) %>% 
  summary()

Call:
lm(formula = frate ~ beertax, data = filter(df, year == 1988))

Residuals:
     Min       1Q   Median       3Q      Max 
-0.72931 -0.36028 -0.07132  0.39938  1.35783 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   1.8591     0.1060  17.540   <2e-16 ***
beertax       0.4388     0.1645   2.668   0.0105 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4903 on 46 degrees of freedom
Multiple R-squared:  0.134, Adjusted R-squared:  0.1152 
F-statistic: 7.118 on 1 and 46 DF,  p-value: 0.0105

Our results are counter-intuitive. It seems that increasing taxes actually increases fatality rate. While this may be the case, we cannot conclude anything causal due to omitted variable biases. We have some covariates in this dataset that my help us control for economic conditions, demographic factors, etc., but we may always be concerned about unobservable factors that determine traffic fatalities, some of which may be related to alcohol taxes. Fixed effects can help us!

19.2 Fixed Effects

We can consider all the observable and unobservable factors of unit that do not vary over time using unit fixed effects. These are just indicator variables for each of the states.

If \(Y_{it}\) is the fatality rate for state \(i\) in year \(t\) and \(X_{it}\) is the tax for beer, then we can think of the following regressions for 1982 and 1988 with fixed effects:

\[\begin{align*} Y_{i1982} = \beta_0 + \beta_1X_{i1982} + \beta_2 Z_i + U_{i1982} \\ Y_{i1988} = \beta_0 + \beta_1X_{i1988} + \beta_2 Z_i + U_{i1988}. \end{align*}\]

Notice that because the fixed effects do not vary over time, if we take the difference of the two equations then the fixed effects will cancel out. This will help us capture the correlation between changes in taxes and changes in traffic fatalities:

\[\begin{equation*} Y_{i1988} - Y_{i1982} = \beta_1(X_{i1988} - X_{i1982}) + (U_{i1988} - U_{i1982}) \end{equation*}\]

We can implement this in the data. We allow for an intercept which is the mean change in fatality rate if there were no change in the tax on beer.

# Reshape data from long to wide
df_wide <- df %>%
  pivot_wider(id_cols = state,
              names_from = year,
              values_from = c(frate, beertax))

head(df_wide)
# A tibble: 6 × 15
  state frate_1982 frate_1983 frate_1984 frate_1985 frate_1986 frate_1987
  <fct>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
1 al          2.13       2.35       2.34       2.19       2.67       2.72
2 az          2.50       2.27       2.83       2.80       3.07       2.77
3 ar          2.38       2.40       2.24       2.26       2.54       2.68
4 ca          1.86       1.81       1.95       1.88       1.95       1.99
5 co          2.17       2.05       1.91       1.79       1.85       1.79
6 ct          1.65       1.39       1.49       1.41       1.41       1.40
# ℹ 8 more variables: frate_1988 <dbl>, beertax_1982 <dbl>, beertax_1983 <dbl>,
#   beertax_1984 <dbl>, beertax_1985 <dbl>, beertax_1986 <dbl>,
#   beertax_1987 <dbl>, beertax_1988 <dbl>
# Calculate differences
df_wide <- df_wide %>%
  mutate(frate_diff = frate_1988 - frate_1982,
         beertax_diff = beertax_1988 - beertax_1982)

# Regression
lm(frate_diff ~ beertax_diff, data = df_wide) %>%
  summary()

Call:
lm(formula = frate_diff ~ beertax_diff, data = df_wide)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.22715 -0.09619  0.09212  0.22290  0.67745 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)  
(Intercept)  -0.07204    0.06064  -1.188   0.2410  
beertax_diff -1.04097    0.41723  -2.495   0.0162 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.394 on 46 degrees of freedom
Multiple R-squared:  0.1192,    Adjusted R-squared:    0.1 
F-statistic: 6.225 on 1 and 46 DF,  p-value: 0.01625

Now we see that when we increase the beer tax, that is correlated with a change in fatality rates. Increasing the tax on a case of beer by one dollar results in 1.04 deaths per 10,000 people on average. We can evaluate the magnitude of this by looking at the summary statistics of the outcome variable and considering the data. This effect seems pretty large. Even with taking the differences to remove the time-invariant, state-level factors, there could be other omitted variables that make the change in beer tax endogenous.

ggplot(data = df_wide) +
  geom_point(aes(x = beertax_diff, y = frate_diff)) +
  geom_smooth(aes(x = beertax_diff, y = frate_diff), 
              method = "lm", se = FALSE) +
  labs(x = "Change in Beer Tax (1988 - 1982)",
       y = "Change in Fatality Rate (per 10,000, 1988 - 1982)") +
  theme_bw()
`geom_smooth()` using formula = 'y ~ x'

A scatter plot showing the relationship between the change in beer tax and the change in traffic fatality rate between 1982 and 1988. Each point represents a state. A downward-sloping blue linear regression line indicates that increases in beer tax are associated with decreases in fatality rates.

summary(df_wide[c("frate_1982", "frate_1988", "frate_diff")])
   frate_1982      frate_1988      frate_diff      
 Min.   :1.101   Min.   :1.231   Min.   :-1.30657  
 1st Qu.:1.618   1st Qu.:1.628   1st Qu.:-0.13319  
 Median :2.046   Median :1.998   Median : 0.04075  
 Mean   :2.089   Mean   :2.070   Mean   :-0.01951  
 3rd Qu.:2.317   3rd Qu.:2.468   3rd Qu.: 0.23626  
 Max.   :4.218   Max.   :3.236   Max.   : 0.71697  

We can also consider regressions with the fixed effects without taking the differences between time periods. We call these panel regression models:

\[\begin{equation*} Y_{it} = \beta_0 + \beta_1 X_{it} + \beta_2 Z_i + U_{it}. \end{equation*}\]

Notice that we have the state fixed effects (\(Z_i\)). You can imagine this regression for each \(t\) that we have in the data. The basic idea here is that each state has its own intercept (\(\beta_2 Z_i\)) in addition to the common intercept (\(\beta_0\)). The notation above is a bit of a shorthand for including indicators for each of the units except (arbitrarily) the first one:

\[\begin{equation*} Y_{it} = \beta_0 + \beta_1 X_{it} + \alpha_2 D2_i + \alpha_3 D3_i + \ldots + \alpha_N DN_i + U_{it}. \end{equation*}\]

If we remove the common intercept, then we can also include the intercept for the first unit.

Let us estimate the model with traffic fatalities and the beer tax. Notice that we want to use the long data. It is useful to compare with and without intercept. We see the coefficients of the fixed effects, but we are only interested in the estimated coefficient on the beer tax: \(-0.657\).

# With Intercept
lm(frate ~ beertax + state, data = df) %>%
  summary()

Call:
lm(formula = frate ~ beertax + state, data = df)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.58696 -0.08284 -0.00127  0.07955  0.89780 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.47763    0.31336  11.098  < 2e-16 ***
beertax     -0.65587    0.18785  -3.491 0.000556 ***
stateaz     -0.56773    0.26667  -2.129 0.034107 *  
statear     -0.65495    0.21902  -2.990 0.003028 ** 
stateca     -1.50947    0.30435  -4.960 1.21e-06 ***
stateco     -1.48428    0.28735  -5.165 4.50e-07 ***
statect     -1.86226    0.28053  -6.638 1.58e-10 ***
statede     -1.30760    0.29395  -4.448 1.24e-05 ***
statefl     -0.26813    0.13933  -1.924 0.055284 .  
statega      0.52460    0.18395   2.852 0.004661 ** 
stateid     -0.66902    0.25797  -2.593 0.009989 ** 
stateil     -1.96162    0.29150  -6.730 9.23e-11 ***
statein     -1.46154    0.27254  -5.363 1.69e-07 ***
stateia     -1.54393    0.25344  -6.092 3.58e-09 ***
stateks     -1.22322    0.24544  -4.984 1.08e-06 ***
stateky     -1.21752    0.28717  -4.240 3.02e-05 ***
statela     -0.84712    0.18867  -4.490 1.03e-05 ***
stateme     -1.10795    0.19112  -5.797 1.78e-08 ***
statemd     -1.70644    0.28322  -6.025 5.17e-09 ***
statema     -2.10975    0.27610  -7.641 3.24e-13 ***
statemi     -1.48453    0.23602  -6.290 1.18e-09 ***
statemn     -1.89721    0.26509  -7.157 6.92e-12 ***
statems     -0.02908    0.14845  -0.196 0.844839    
statemo     -1.29626    0.26669  -4.861 1.93e-06 ***
statemt     -0.36039    0.26396  -1.365 0.173225    
statene     -1.52218    0.24928  -6.106 3.30e-09 ***
statenv     -0.60077    0.28595  -2.101 0.036517 *  
statenh     -1.25445    0.20968  -5.983 6.53e-09 ***
statenj     -2.10575    0.30720  -6.855 4.37e-11 ***
statenm      0.42637    0.25432   1.677 0.094724 .  
stateny     -2.18667    0.29890  -7.316 2.57e-12 ***
statenc     -0.29047    0.11984  -2.424 0.015979 *  
statend     -1.62344    0.25381  -6.396 6.45e-10 ***
stateoh     -1.67442    0.25381  -6.597 2.02e-10 ***
stateok     -0.54506    0.16912  -3.223 0.001415 ** 
stateor     -1.16800    0.28572  -4.088 5.65e-05 ***
statepa     -1.76747    0.27610  -6.402 6.26e-10 ***
stateri     -2.26505    0.29376  -7.711 2.07e-13 ***
statesc      0.55717    0.11000   5.065 7.30e-07 ***
statesd     -1.00372    0.20962  -4.788 2.70e-06 ***
statetn     -0.87566    0.26802  -3.267 0.001218 ** 
statetx     -0.91747    0.24556  -3.736 0.000225 ***
stateut     -1.16395    0.19642  -5.926 8.88e-09 ***
statevt     -0.96604    0.21113  -4.576 7.08e-06 ***
stateva     -1.29018    0.20416  -6.320 9.99e-10 ***
statewa     -1.65952    0.28346  -5.854 1.31e-08 ***
statewv     -0.89675    0.24661  -3.636 0.000328 ***
statewi     -1.75927    0.29395  -5.985 6.44e-09 ***
statewy     -0.22850    0.31290  -0.730 0.465811    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1899 on 287 degrees of freedom
Multiple R-squared:  0.905, Adjusted R-squared:  0.8891 
F-statistic: 56.97 on 48 and 287 DF,  p-value: < 2.2e-16
# Without Intercept
lm(frate ~ beertax + state - 1, data = df) %>%
  summary()

Call:
lm(formula = frate ~ beertax + state - 1, data = df)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.58696 -0.08284 -0.00127  0.07955  0.89780 

Coefficients:
        Estimate Std. Error t value Pr(>|t|)    
beertax -0.65587    0.18785  -3.491 0.000556 ***
stateal  3.47763    0.31336  11.098  < 2e-16 ***
stateaz  2.90990    0.09254  31.445  < 2e-16 ***
statear  2.82268    0.13213  21.364  < 2e-16 ***
stateca  1.96816    0.07401  26.594  < 2e-16 ***
stateco  1.99335    0.08037  24.802  < 2e-16 ***
statect  1.61537    0.08391  19.251  < 2e-16 ***
statede  2.17003    0.07746  28.016  < 2e-16 ***
statefl  3.20950    0.22151  14.489  < 2e-16 ***
statega  4.00223    0.46403   8.625 4.43e-16 ***
stateid  2.80861    0.09877  28.437  < 2e-16 ***
stateil  1.51601    0.07848  19.318  < 2e-16 ***
statein  2.01609    0.08867  22.736  < 2e-16 ***
stateia  1.93370    0.10222  18.918  < 2e-16 ***
stateks  2.25441    0.10863  20.753  < 2e-16 ***
stateky  2.26011    0.08046  28.089  < 2e-16 ***
statela  2.63051    0.16266  16.171  < 2e-16 ***
stateme  2.36968    0.16006  14.805  < 2e-16 ***
statemd  1.77119    0.08246  21.480  < 2e-16 ***
statema  1.36788    0.08648  15.818  < 2e-16 ***
statemi  1.99310    0.11663  17.089  < 2e-16 ***
statemn  1.58042    0.09363  16.880  < 2e-16 ***
statems  3.44855    0.20936  16.472  < 2e-16 ***
statemo  2.18137    0.09252  23.576  < 2e-16 ***
statemt  3.11724    0.09441  33.017  < 2e-16 ***
statene  1.95545    0.10551  18.534  < 2e-16 ***
statenv  2.87686    0.08106  35.492  < 2e-16 ***
statenh  2.22318    0.14114  15.751  < 2e-16 ***
statenj  1.37188    0.07333  18.709  < 2e-16 ***
statenm  3.90401    0.10154  38.449  < 2e-16 ***
stateny  1.29096    0.07563  17.070  < 2e-16 ***
statenc  3.18717    0.25173  12.661  < 2e-16 ***
statend  1.85419    0.10193  18.191  < 2e-16 ***
stateoh  1.80321    0.10193  17.691  < 2e-16 ***
stateok  2.93257    0.18428  15.913  < 2e-16 ***
stateor  2.30963    0.08117  28.453  < 2e-16 ***
statepa  1.71016    0.08648  19.776  < 2e-16 ***
stateri  1.21258    0.07753  15.640  < 2e-16 ***
statesc  4.03480    0.35479  11.372  < 2e-16 ***
statesd  2.47391    0.14121  17.519  < 2e-16 ***
statetn  2.60197    0.09162  28.398  < 2e-16 ***
statetx  2.56016    0.10853  23.589  < 2e-16 ***
stateut  2.31368    0.15453  14.972  < 2e-16 ***
statevt  2.51159    0.13973  17.975  < 2e-16 ***
stateva  2.18745    0.14664  14.917  < 2e-16 ***
statewa  1.81811    0.08233  22.084  < 2e-16 ***
statewv  2.58088    0.10767  23.971  < 2e-16 ***
statewi  1.71836    0.07746  22.185  < 2e-16 ***
statewy  3.24913    0.07233  44.922  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1899 on 287 degrees of freedom
Multiple R-squared:  0.9931,    Adjusted R-squared:  0.992 
F-statistic: 847.8 on 49 and 287 DF,  p-value: < 2.2e-16

While we can always use lm(), we can also use a package called plm to more conveniently estimate fixed effect regressions. The model is called “within” because we want there to be unit fixed effects. We get the same result but it does not show all the estimated coefficients for the fixed effects.

library(plm)

Attaching package: 'plm'
The following objects are masked from 'package:dplyr':

    between, lag, lead
plm(frate ~ beertax, 
    data = df,
    index = c("state", "year"),
    model = "within") %>%
  summary()
Oneway (individual) effect Within Model

Call:
plm(formula = frate ~ beertax, data = df, model = "within", index = c("state", 
    "year"))

Balanced Panel: n = 48, T = 7, N = 336

Residuals:
      Min.    1st Qu.     Median    3rd Qu.       Max. 
-0.5869619 -0.0828376 -0.0012701  0.0795454  0.8977960 

Coefficients:
        Estimate Std. Error t-value Pr(>|t|)    
beertax -0.65587    0.18785 -3.4915 0.000556 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    10.785
Residual Sum of Squares: 10.345
R-Squared:      0.040745
Adj. R-Squared: -0.11969
F-statistic: 12.1904 on 1 and 287 DF, p-value: 0.00055597

We still may not be satisfied with these estimates! What if there are factors that are changing over time, not just over states? We can include time fixed effects for this. I will use \(\delta_t\) to stand in for all the indicators:

\[\begin{equation*} Y_{it} = \beta_0 + \beta_1 X_{it} + \delta_t + U_{it}. \end{equation*}\]

We can run the below regressions because both state and year are factor variables. If they are not factor variables, then we need to convert them before running the regression.

# State fixed effects
lm(frate ~ beertax + state - 1, data = df) %>% 
  summary()

Call:
lm(formula = frate ~ beertax + state - 1, data = df)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.58696 -0.08284 -0.00127  0.07955  0.89780 

Coefficients:
        Estimate Std. Error t value Pr(>|t|)    
beertax -0.65587    0.18785  -3.491 0.000556 ***
stateal  3.47763    0.31336  11.098  < 2e-16 ***
stateaz  2.90990    0.09254  31.445  < 2e-16 ***
statear  2.82268    0.13213  21.364  < 2e-16 ***
stateca  1.96816    0.07401  26.594  < 2e-16 ***
stateco  1.99335    0.08037  24.802  < 2e-16 ***
statect  1.61537    0.08391  19.251  < 2e-16 ***
statede  2.17003    0.07746  28.016  < 2e-16 ***
statefl  3.20950    0.22151  14.489  < 2e-16 ***
statega  4.00223    0.46403   8.625 4.43e-16 ***
stateid  2.80861    0.09877  28.437  < 2e-16 ***
stateil  1.51601    0.07848  19.318  < 2e-16 ***
statein  2.01609    0.08867  22.736  < 2e-16 ***
stateia  1.93370    0.10222  18.918  < 2e-16 ***
stateks  2.25441    0.10863  20.753  < 2e-16 ***
stateky  2.26011    0.08046  28.089  < 2e-16 ***
statela  2.63051    0.16266  16.171  < 2e-16 ***
stateme  2.36968    0.16006  14.805  < 2e-16 ***
statemd  1.77119    0.08246  21.480  < 2e-16 ***
statema  1.36788    0.08648  15.818  < 2e-16 ***
statemi  1.99310    0.11663  17.089  < 2e-16 ***
statemn  1.58042    0.09363  16.880  < 2e-16 ***
statems  3.44855    0.20936  16.472  < 2e-16 ***
statemo  2.18137    0.09252  23.576  < 2e-16 ***
statemt  3.11724    0.09441  33.017  < 2e-16 ***
statene  1.95545    0.10551  18.534  < 2e-16 ***
statenv  2.87686    0.08106  35.492  < 2e-16 ***
statenh  2.22318    0.14114  15.751  < 2e-16 ***
statenj  1.37188    0.07333  18.709  < 2e-16 ***
statenm  3.90401    0.10154  38.449  < 2e-16 ***
stateny  1.29096    0.07563  17.070  < 2e-16 ***
statenc  3.18717    0.25173  12.661  < 2e-16 ***
statend  1.85419    0.10193  18.191  < 2e-16 ***
stateoh  1.80321    0.10193  17.691  < 2e-16 ***
stateok  2.93257    0.18428  15.913  < 2e-16 ***
stateor  2.30963    0.08117  28.453  < 2e-16 ***
statepa  1.71016    0.08648  19.776  < 2e-16 ***
stateri  1.21258    0.07753  15.640  < 2e-16 ***
statesc  4.03480    0.35479  11.372  < 2e-16 ***
statesd  2.47391    0.14121  17.519  < 2e-16 ***
statetn  2.60197    0.09162  28.398  < 2e-16 ***
statetx  2.56016    0.10853  23.589  < 2e-16 ***
stateut  2.31368    0.15453  14.972  < 2e-16 ***
statevt  2.51159    0.13973  17.975  < 2e-16 ***
stateva  2.18745    0.14664  14.917  < 2e-16 ***
statewa  1.81811    0.08233  22.084  < 2e-16 ***
statewv  2.58088    0.10767  23.971  < 2e-16 ***
statewi  1.71836    0.07746  22.185  < 2e-16 ***
statewy  3.24913    0.07233  44.922  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1899 on 287 degrees of freedom
Multiple R-squared:  0.9931,    Adjusted R-squared:  0.992 
F-statistic: 847.8 on 49 and 287 DF,  p-value: < 2.2e-16
# Time fixed effects
lm(frate ~ beertax + year - 1, data = df) %>% 
  summary()

Call:
lm(formula = frate ~ beertax + year - 1, data = df)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.06068 -0.38427 -0.09853  0.26781  2.23447 

Coefficients:
         Estimate Std. Error t value Pr(>|t|)    
beertax   0.36634    0.06260   5.852 1.18e-08 ***
year1982  1.89485    0.08566  22.121  < 2e-16 ***
year1983  1.81281    0.08571  21.151  < 2e-16 ***
year1984  1.82311    0.08564  21.288  < 2e-16 ***
year1985  1.78430    0.08534  20.909  < 2e-16 ***
year1986  1.87873    0.08514  22.065  < 2e-16 ***
year1987  1.87931    0.08483  22.154  < 2e-16 ***
year1988  1.89382    0.08448  22.416  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.5471 on 328 degrees of freedom
Multiple R-squared:  0.9349,    Adjusted R-squared:  0.9333 
F-statistic: 588.7 on 8 and 328 DF,  p-value: < 2.2e-16
# State fixed effects and time fixed effects
lm(frate ~ beertax + state + year - 1, data = df) %>% 
  summary()

Call:
lm(formula = frate ~ beertax + state + year - 1, data = df)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.59556 -0.08096  0.00143  0.08234  0.83883 

Coefficients:
         Estimate Std. Error t value Pr(>|t|)    
beertax  -0.63998    0.19738  -3.242  0.00133 ** 
stateal   3.51137    0.33250  10.560  < 2e-16 ***
stateaz   2.96451    0.09933  29.846  < 2e-16 ***
statear   2.87284    0.14162  20.286  < 2e-16 ***
stateca   2.02618    0.07857  25.787  < 2e-16 ***
stateco   2.04984    0.08594  23.851  < 2e-16 ***
statect   1.67125    0.08989  18.592  < 2e-16 ***
statede   2.22711    0.08264  26.951  < 2e-16 ***
statefl   3.25132    0.23590  13.782  < 2e-16 ***
statega   4.02300    0.49087   8.196 8.92e-15 ***
stateid   2.86242    0.10606  26.990  < 2e-16 ***
stateil   1.57287    0.08380  18.769  < 2e-16 ***
statein   2.07123    0.09512  21.775  < 2e-16 ***
stateia   1.98709    0.10976  18.103  < 2e-16 ***
stateks   2.30707    0.11663  19.781  < 2e-16 ***
stateky   2.31659    0.08604  26.923  < 2e-16 ***
statela   2.67772    0.17390  15.398  < 2e-16 ***
stateme   2.41713    0.17116  14.122  < 2e-16 ***
statemd   1.82731    0.08828  20.700  < 2e-16 ***
statema   1.42335    0.09272  15.352  < 2e-16 ***
statemi   2.04488    0.12516  16.338  < 2e-16 ***
statemn   1.63488    0.10051  16.266  < 2e-16 ***
statems   3.49146    0.22311  15.649  < 2e-16 ***
statemo   2.23598    0.09931  22.515  < 2e-16 ***
statemt   3.17160    0.10136  31.291  < 2e-16 ***
statene   2.00846    0.11329  17.729  < 2e-16 ***
statenv   2.93322    0.08671  33.827  < 2e-16 ***
statenh   2.27245    0.15116  15.033  < 2e-16 ***
statenj   1.43016    0.07773  18.399  < 2e-16 ***
statenm   3.95748    0.10903  36.296  < 2e-16 ***
stateny   1.34849    0.08051  16.748  < 2e-16 ***
statenc   3.22630    0.26770  12.052  < 2e-16 ***
statend   1.90762    0.10945  17.428  < 2e-16 ***
stateoh   1.85664    0.10945  16.963  < 2e-16 ***
stateok   2.97776    0.19670  15.139  < 2e-16 ***
stateor   2.36597    0.08684  27.244  < 2e-16 ***
statepa   1.76563    0.09272  19.044  < 2e-16 ***
stateri   1.26964    0.08272  15.348  < 2e-16 ***
statesc   4.06496    0.37606  10.809  < 2e-16 ***
statesd   2.52317    0.15123  16.684  < 2e-16 ***
statetn   2.65670    0.09833  27.017  < 2e-16 ***
statetx   2.61282    0.11653  22.423  < 2e-16 ***
stateut   2.36165    0.16532  14.286  < 2e-16 ***
statevt   2.56100    0.14966  17.112  < 2e-16 ***
stateva   2.23618    0.15698  14.245  < 2e-16 ***
statewa   1.87424    0.08813  21.266  < 2e-16 ***
statewv   2.63364    0.11560  22.782  < 2e-16 ***
statewi   1.77545    0.08264  21.485  < 2e-16 ***
statewy   3.30791    0.07641  43.291  < 2e-16 ***
year1983 -0.07990    0.03835  -2.083  0.03813 *  
year1984 -0.07242    0.03835  -1.888  0.06001 .  
year1985 -0.12398    0.03844  -3.225  0.00141 ** 
year1986 -0.03786    0.03859  -0.981  0.32731    
year1987 -0.05090    0.03897  -1.306  0.19260    
year1988 -0.05180    0.03962  -1.307  0.19215    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1879 on 281 degrees of freedom
Multiple R-squared:  0.9934,    Adjusted R-squared:  0.9921 
F-statistic: 771.5 on 55 and 281 DF,  p-value: < 2.2e-16

For these regressions, we need to be careful with the standard errors. Normally, we assume homoskedasticity, and for longitudinal data, we do not want there to be correlation between the unobserved variables. However, \(U_{it}\) will naturally be correlated with \(U_{it+1}\) because they are both for the same unit. Similarly, \(U_{it}\) will be correlated with \(U_{jt}\) because they are both for the same time period.

We can cluster the standard errors to allow for heteroskedasticity and autocorrelated errors within the cluster, but not across clusters. To do, we can use plm so that we have a plm object rather than an lm object. Then, we can use coeftest() with the sandwich package to calculate the clustered standard errors.

library(sandwich)

plm_out <- plm(frate ~ beertax,
               data = df,
               index = c("state", "year"),
               model = "within",
               effect = "twoways") # This does state and year fixed effects

coeftest(plm_out, vcov = vcovHC, type = "HC1")

t test of coefficients:

        Estimate Std. Error t value Pr(>|t|)  
beertax -0.63998    0.35015 -1.8277  0.06865 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

19.3 Difference-in-Differences

Now we are comfortable with panel data and fixed effects. In some cases, you may be comfortable assuming that the fixed effects account for all possible unobserved heterogeneity that could be correlated with the covariate of interest (e.g., beer taxes), and interpret the above estimates causally. This is hard to argue though, and there may be something better we can do: difference-in-differences.

Suppose there is a large change in the taxes in Texas but a small change in the taxes in Oklahoma. If we compare Texas to Oklahoma in 1988, we will see the difference between the two states after the policy, but what if this is all due to differences between the states? If we compare Texas in 1982 to Texas in 1988, we may see a difference but what if this is all due to differences between the years? We can take the “difference-in-differences” to account for both possibilities and just see the change due to the policy change. Here are the means for each of the groups before (pre) and after (post) the policy:

\[\begin{equation*} (Y_{\text{Treatment, Post}} - Y_{\text{Treatment, Pre}}) - (Y_{\text{Control, Post}} - Y_{\text{Control, Pre}}) \end{equation*}\]

Here is an illustration of the logic from Hanck et al. (2018).

# Source:
# https://bookdown.org/machar1991/ITER/13-4-quasi-experiments.html
plot(c(0, 1), c(6, 8), 
     type = "p",
     ylim = c(5, 12),
     xlim = c(-0.3, 1.3),
     main = "The Differences-in-Differences Estimator",
     xlab = "Period",
     ylab = "Y",
     col = "steelblue",
     pch = 20,
     xaxt = "n",
     yaxt = "n")

axis(1, at = c(0, 1), labels = c("before", "after"))
axis(2, at = c(0, 13))

# add treatment group
points(c(0, 1, 1), c(7, 9, 11), 
       col = "darkred",
       pch = 20)

# add line segments
lines(c(0, 1), c(7, 11), col = "darkred")
lines(c(0, 1), c(6, 8), col = "steelblue")
lines(c(0, 1), c(7, 9), col = "darkred", lty = 2)
lines(c(1, 1), c(9, 11), col = "black", lty = 2, lwd = 2)

# add annotations
text(1, 10, expression(hat(beta)[1]^{DID}), cex = 0.8, pos = 4)
text(0, 5.5, "s. mean control", cex = 0.8 , pos = 4)
text(0, 6.8, "s. mean treatment", cex = 0.8 , pos = 4)
text(1, 7.9, "s. mean control", cex = 0.8 , pos = 4)
text(1, 11.1, "s. mean treatment", cex = 0.8 , pos = 4)

A line plot illustrating the Differences-in-Differences estimator. It shows mean values for a treatment group and a control group before and after an intervention. The control group mean rises slightly from before to after. The treatment group mean rises more sharply. A dashed line shows the counterfactual trend for the treatment group if there had been no intervention, and a vertical dashed line indicates the DID estimate, the difference between the actual post-treatment mean and the counterfactual.

Let us simulate some data to see how we estimate difference-in-differences.

# Source:
# https://bookdown.org/machar1991/ITER/13-4-quasi-experiments.html

set.seed(470500)

# Sample size
N <- 200

# Treatment effect
TEffect <- 4

# Generate treatment dummy
TDummy <- c(rep(0, N/2), rep(1, N/2))

# Simulate pre- and post-treatment values of the dependent variable
y_pre <- 7 + rnorm(N)
y_pre[1:N/2] <- y_pre[1:N/2] - 1
y_post <- 7 + 2 + TEffect * TDummy + rnorm(N)
y_post[1:N/2] <- y_post[1:N/2] - 1 

We can compute the difference-in-difference estimator by hand.

mean(y_post[TDummy == 1]) - mean(y_pre[TDummy == 1]) - 
(mean(y_post[TDummy == 0]) - mean(y_pre[TDummy == 0]))
[1] 3.947658

We can use a linear model:

\[\begin{equation*} \Delta Y_i = \beta_0 + \beta_1 D_i + U_i. \end{equation*}\]

Notice that we do not need to have everything in a dataset to use lm(). We can also rely on standalone vectors.

lm(I(y_post - y_pre) ~ TDummy) %>%
  summary()

Call:
lm(formula = I(y_post - y_pre) ~ TDummy)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.1025 -0.9670 -0.0932  0.9622  4.4390 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   2.0414     0.1466   13.92   <2e-16 ***
TDummy        3.9477     0.2074   19.04   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.466 on 198 degrees of freedom
Multiple R-squared:  0.6467,    Adjusted R-squared:  0.6449 
F-statistic: 362.4 on 1 and 198 DF,  p-value: < 2.2e-16

We can also use this equation:

\[\begin{equation*} Y_{it} = \beta_0 + \beta_1 D_i + \beta_2 Post_t + \beta (Post_t \times D_i) + U_{it} \end{equation*}\]

df <- tibble(Y = c(y_pre, y_post),
             D = rep(TDummy, 2), 
             Post = c(rep("1", N), rep("2", N)))

lm(Y ~ D * Post, data = df) %>%
  summary()

Call:
lm(formula = Y ~ D * Post, data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.8788 -0.6071 -0.0357  0.6399  3.8692 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   5.9209     0.1020  58.044   <2e-16 ***
D             1.2918     0.1443   8.955   <2e-16 ***
Post2         2.0414     0.1443  14.151   <2e-16 ***
D:Post2       3.9477     0.2040  19.350   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.02 on 396 degrees of freedom
Multiple R-squared:  0.8816,    Adjusted R-squared:  0.8807 
F-statistic: 982.9 on 3 and 396 DF,  p-value: < 2.2e-16

The key assumption is commonly called “parallel trends.” This boils down to: in the absence of treatment, the trends of the treatment and the control group would be the same. We can see this most easily in the graph. This means that there cannot be unobserved factors that vary by time and by unit that are correlated with treatment.

19.4 Further Reading

These notes are based on chapters 10 and 13 of Hanck et al. (2018). See that source for more information.

19.4.1 References

Hanck, Cristoph, Martin Arnold, Alexander Gerber, and Martin Schmelzer. 2018. Introduction to Econometrics with R. https://bookdown.org/machar1991/ITER/.