R: Coefficients estimeates become NA when I add control variables

N

Nikeli

Guest
I can regress my model and get an outcome with estimates for all my coefficients. When I add control variables to the model, some if the estimates become NA.

As far as I know it should not happen, that the estimates become NA, but are different from the regression without control variables. Is there a problem with correlations that I overlook?

lm(y ~ x1 + x2 + x3 + x4 + factor(fe1) + factor(fe2) + factor(fe3), data=data) works fine

lm(y ~ x1 + x2 + x3 + x4 + factor(fe1) + factor(fe2) + factor(fe3) + z1 + z2 + z3, data=data) gives me NA as estimates for x1-4

Edit: More info of the data and Model:

The data I use is a panel data set with some thousand individuals across several years. For each individual there are 346 variables. But there are also many NA in the data. So all in all the data is 37'005 observation of 346 variables.

What I want to regress is a staggered difference-in-difference model. The individuals live in 26 countries which get treated at 4 different time. So first I created the DID interactions.

Treatment 1:

data$time1 = ifelse(data$year >= 1978, 1, 0)
data$treated1 = ifelse(dat$country == "24" | dat$country == "26" , 1, 0)
data$did1 = data$time1 * data$treated1


Treatment 2:

data$time2 = ifelse(data$year >= 1982, 1, 0)
data$treated2 = ifelse(dat$country == "6" | dat$country == "7" |
dat$country == "8" | dat$country == "9" |
dat$country == "13" | dat$country == "22" |
dat$country == "25" , 1, 0)
data$did2 = data$time2 * data$treated2


Treatment 3:

data$time3 = ifelse(data$year >= 1990, 1, 0)
data$treated3 = ifelse(dat$country == "1" | dat$country == "2" |
dat$country == "3" | dat$country == "4" |
data$country == "10" | data$country == "11" |
data$country == "12" | data$country == "14" |
data$country == "15" | data$country == "18" |
data$country == "19" | data$country == "20" |
data$country == "21" | data$country == "23" , 1, 0)
data$did3 = data$time3 * data$treated3


Treatment 4:

data$time4 = ifelse(data$year >= 1994, 1, 0)
data$treated4 = ifelse(dat$country == "16" | dat$country == "17" , 1, 0)
data$did4 = data$time4 * data$treated4


Finally I regress:

didreg <- lm(y ~ treated1 + time1 + did1
+ treated2 + time2 + did2
+ treated3 + time3 + did3
+ treated4 + time4 + did4
+ factor(year) + factor(country) + factor(age), data=data)


So the regression contains the staggered DID and three fixed effects. There I get an output with no NA:

Residuals:
Min 1Q Median 3Q Max
-1.08745 -0.20639 0.09794 0.23383 1.48028

Coefficients: (8 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.8880739 0.0275366 32.251 < 2e-16 ***
treated1 -0.5032276 0.0494587 -10.175 < 2e-16 ***
time1 -0.2013633 0.0139685 -14.416 < 2e-16 ***
did1 0.4519027 0.0437361 10.332 < 2e-16 ***
treated2 -0.1691864 0.0272819 -6.201 5.66e-10 ***
time2 -0.0415755 0.0159119 -2.613 0.008983 **
did2 0.1643995 0.0224585 7.320 2.53e-13 ***
treated3 -0.0468476 0.0209003 -2.241 0.025002 *
time3 -0.0458305 0.0214227 -2.139 0.032415 *
did3 0.0417400 0.0257318 1.622 0.104787
treated4 -0.0105064 0.0185380 -0.567 0.570889
time4 0.1039606 0.0177773 5.848 5.02e-09 ***
did4 0.0065036 0.0222232 0.293 0.769792
factor(year)1979 NA NA NA NA
factor(year)1987 NA NA NA NA
factor(year)1991 NA NA NA NA
factor(year)1995 0.0063466 0.0064798 0.979 0.327365
factor(year)1999 -0.0729413 0.0076529 -9.531 < 2e-16 ***
factor(year)2003 -0.0591001 0.0067198 -8.795 < 2e-16 ***
factor(year)2007 -0.0488750 0.0070385 -6.944 3.88e-12 ***
factor(year)2011 -0.0079068 0.0069850 -1.132 0.257660
factor(year)2015 NA NA NA NA
factor(country)2 -0.0261802 0.0075630 -3.462 0.000538 ***
factor(country)3 0.0108143 0.0174244 0.621 0.534841
factor(country)4 -0.0706746 0.0237196 -2.980 0.002888 **
factor(country)5 NA NA NA NA
factor(country)6 0.0248501 0.0194242 1.279 0.200787
factor(country)7 0.0007571 0.0214920 0.035 0.971899
factor(country)8 -0.0219132 0.0126151 -1.737 0.082385 .
factor(country)9 0.0336437 0.0165810 2.029 0.042461 *
factor(country)10 -0.0310213 0.0150191 -2.065 0.038887 *
factor(country)11 0.0148530 0.0138820 1.070 0.284650
factor(country)12 0.0455003 0.0140558 3.237 0.001209 **
factor(country)13 0.0132165 0.0141440 0.934 0.350090
factor(country)14 0.0955972 0.0099067 9.650 < 2e-16 ***
factor(country)15 0.0301855 0.0179561 1.681 0.092759 .
factor(country)16 -0.0215210 0.0211858 -1.016 0.309720
factor(country)17 NA NA NA NA
factor(country)18 -0.0943804 0.0158211 -5.965 2.46e-09 ***
factor(country)19 -0.0298620 0.0086660 -3.446 0.000570 ***
factor(country)20 0.0120562 0.0142208 0.848 0.396563
factor(country)21 -0.0124351 0.0073584 -1.690 0.091053 .
factor(country)22 -0.0271342 0.0088044 -3.082 0.002058 **
factor(country)23 -0.0305946 0.0112176 -2.727 0.006387 **
factor(country)24 0.0531145 0.0234191 2.268 0.023336 *
factor(country)25 NA NA NA NA
factor(country)26 NA NA NA NA
factor(age)19 -0.0428153 0.0241366 -1.774 0.076093 .
factor(age)20 0.0066700 0.0237840 0.280 0.779142
factor(age)21 -0.0279582 0.0235592 -1.187 0.235345
...(up to 97)
---
Signif. codes: 0 ë***í 0.001 ë**í 0.01 ë*í 0.05 ë.í 0.1 ë í 1

Residual standard error: 0.3332 on 34031 degrees of freedom
(2856 observations deleted due to missingness)
Multiple R-squared: 0.05441, Adjusted R-squared: 0.05115
F-statistic: 16.74 on 117 and 34031 DF, p-value: < 2.2e-16


If I add now some control variables:

didreg <- lm(y ~ treated1 + time1 + did1
+ treated2 + time2 + did2
+ treated3 + time3 + did3
+ treated4 + time4 + did4
+ factor(year) + factor(country) + factor(age)
+ educ + income + sh1 + spm6 + sg12, data=data)


There will be NA in the estimates i am interested in:

Residuals:
Min 1Q Median 3Q Max
-1.05529 -0.15496 0.07053 0.20014 0.51105

Coefficients: (12 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.283e-01 3.188e-02 16.573 < 2e-16 ***
treated1 -4.153e-02 2.576e-02 -1.612 0.106970
time1 NA NA NA NA
did1 NA NA NA NA
treated2 -1.407e-02 1.937e-02 -0.727 0.467498
time2 NA NA NA NA
did2 NA NA NA NA
treated3 -1.239e-02 1.907e-02 -0.650 0.515742
time3 NA NA NA NA
did3 NA NA NA NA
treated4 -2.578e-02 2.196e-02 -1.174 0.240386
time4 NA NA NA NA
did4 NA NA NA NA
...
educ 1.983e-02 1.034e-03 19.184 < 2e-16 ***
income 2.152e-02 1.700e-03 12.659 < 2e-16 ***
sh1 7.574e-03 1.770e-03 4.279 1.88e-05 ***
spm6 9.147e-02 6.342e-03 14.422 < 2e-16 ***
sg12 4.481e-02 6.722e-03 6.666 2.71e-11 ***
---
Signif. codes: 0 ë***í 0.001 ë**í 0.01 ë*í 0.05 ë.í 0.1 ë í 1

Residual standard error: 0.2681 on 17373 degrees of freedom
(19519 observations deleted due to missingness)
Multiple R-squared: 0.1248, Adjusted R-squared: 0.1191
F-statistic: 22.11 on 112 and 17373 DF, p-value: < 2.2e-16


I can add the conrol variables education and income without NA for the did estimates, but for the others NA is in the output.

Continue reading...
 
Top