This lesson introduces the stargazer package for presenting regression results.
After completing this module, students should be able to:
NA
summary
summary(lm(y~x, data=data))
make the regression results hard to read and it may not be the most appropiate format for sharing. Take look at the output of some regressions using the summary
command.data("mtcars")
mr1 <- lm(hp ~ mpg + cyl + wt + gear, data = mtcars)
summary(mr1)
Call:
lm(formula = hp ~ mpg + cyl + wt + gear, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-39.076 -17.816 -2.354 14.645 67.680
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -133.995 106.205 -1.262 0.217859
mpg -3.032 2.199 -1.379 0.179215
cyl 28.332 5.927 4.780 5.5e-05 ***
wt 6.739 12.116 0.556 0.582625
gear 39.215 9.134 4.293 0.000203 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 30.16 on 27 degrees of freedom
Multiple R-squared: 0.8315, Adjusted R-squared: 0.8065
F-statistic: 33.31 on 4 and 27 DF, p-value: 4.428e-10
There are a couple of things that are not ideal about the summary()
output:
coded
. Unless the person reading the results is familiar with the dataset or an expert, it will be hard for them to figure out what each variable is representing.Call
and Residuals
part of the output is in virtually all applications redundant.summary
only presents one regression output at a time. To save space, and make our results easier to read, we’ll often present the output of several regression side-by-side in the same table, especially when estimated parameters are common.summary
If you are not happy with the output of summary
there are a couple of alternatives:
stargazer
, jtools
, huxreg
, or, xtable
.The process of doing (1) can be time consuming and more often than not, only a few type of users will benefit from the extra work. For most of our needs, at least for the topics discussed in this course, the package stargazer
will suffice.\(^1\)
Installing and loading stargazer
To use stargazer
you need to,
install.packages("stargazer")
,# You will do this once
# in the console
#install.packages("stargazer", repos = "http://cran.us.r-project.org")
library("stargazer")
# You will need this in
# every file that requires
# stargazer. Only once per
# file.
library("stargazer")
##
## Please cite as:
## Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.2. https://CRAN.R-project.org/package=stargazer
To avoid getting those messages when loading/installing packages you can add the option {r messages='FALSE'}
in the R-chunk.
\(^1\) Note: I know that if you are a perfectionist/obsesive with details type of person, at some point you’ll write your own functions or even your own package, but for this course that’s not necessary.
Presenting regression outputs with the stargazer
package:
Let’s take a look at the previous regression output, now using the stargazer
command.
stargazer(mr1,
title = "Regression Results",
dep.var.labels = c("Gross Horsepower"),
covariate.labels = c("Intercept",
"Miles per Gallon",
"Cylinders",
"Weight (1000 lbs)",
"Number of Forward Gear"),
type = "html",
intercept.bottom = FALSE,
header = FALSE)
Dependent variable: | |
Gross Horsepower | |
Intercept | -133.995 |
(106.205) | |
Miles per Gallon | -3.032 |
(2.199) | |
Cylinders | 28.332*** |
(5.927) | |
Weight (1000 lbs) | 6.739 |
(12.116) | |
Number of Forward Gear | 39.215*** |
(9.134) | |
Observations | 32 |
R2 | 0.832 |
Adjusted R2 | 0.807 |
Residual Std. Error | 30.157 (df = 27) |
F Statistic | 33.310*** (df = 4; 27) |
Note: | p<0.1; p<0.05; p<0.01 |
This looks way better than the summary
output, and, as we’ll see in a minute, it can be tweaked to your liking.
The stargazer
function takes an R
object (generally of the class lm
or data.frame
) and extract the attributes
of the object and process them into a Html or Latex table.
Let’s inspect the attributes of the regression mr1
that we just estimated:
attributes(mr1)
## $names
## [1] "coefficients" "residuals" "effects" "rank"
## [5] "fitted.values" "assign" "qr" "df.residual"
## [9] "xlevels" "call" "terms" "model"
##
## $class
## [1] "lm"
All of these can be accessed manually using the $
symbol, e.g. mr1$coefficients
.
mr1$coefficients
## (Intercept) mpg cyl wt gear
## -133.994984 -3.031787 28.332058 6.739500 39.215107
After stargazer
extracts and process the content of the object, then it takes into account the options provided by the user and generates the output table. As we’ll see, we can use stargazer to,
The second option is really useful if you are working in very large projects that use the same table in different documents.
results = 'asis'
optionIn RMarkdown, you need to use the “results=‘asis’” option in the r-chunk options, like so {r results='asis'}
to produce stargazer tables. Otherwise, Rstudio won’t know that the output has to be compiled as Latex or Html code - depending on what you are knitting - and not as a simple piece of R
code output.
See what happens if I don’t include the results='asis'
option,
stargazer(mr1,
title = "Regression Results",
dep.var.labels = c("Gross Horsepower"),
covariate.labels = c("Intercept",
"Miles per Gallon",
"Cylinders",
"Weight (1000 lbs)",
"Number of Forward Gear"),
type = "html",
intercept.bottom = FALSE,
header = FALSE)
##
## <table style="text-align:center"><caption><strong>Regression Results</strong></caption>
## <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td><em>Dependent variable:</em></td></tr>
## <tr><td></td><td colspan="1" style="border-bottom: 1px solid black"></td></tr>
## <tr><td style="text-align:left"></td><td>Gross Horsepower</td></tr>
## <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Intercept</td><td>-133.995</td></tr>
## <tr><td style="text-align:left"></td><td>(106.205)</td></tr>
## <tr><td style="text-align:left"></td><td></td></tr>
## <tr><td style="text-align:left">Miles per Gallon</td><td>-3.032</td></tr>
## <tr><td style="text-align:left"></td><td>(2.199)</td></tr>
## <tr><td style="text-align:left"></td><td></td></tr>
## <tr><td style="text-align:left">Cylinders</td><td>28.332<sup>***</sup></td></tr>
## <tr><td style="text-align:left"></td><td>(5.927)</td></tr>
## <tr><td style="text-align:left"></td><td></td></tr>
## <tr><td style="text-align:left">Weight (1000 lbs)</td><td>6.739</td></tr>
## <tr><td style="text-align:left"></td><td>(12.116)</td></tr>
## <tr><td style="text-align:left"></td><td></td></tr>
## <tr><td style="text-align:left">Number of Forward Gear</td><td>39.215<sup>***</sup></td></tr>
## <tr><td style="text-align:left"></td><td>(9.134)</td></tr>
## <tr><td style="text-align:left"></td><td></td></tr>
## <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>32</td></tr>
## <tr><td style="text-align:left">R<sup>2</sup></td><td>0.832</td></tr>
## <tr><td style="text-align:left">Adjusted R<sup>2</sup></td><td>0.807</td></tr>
## <tr><td style="text-align:left">Residual Std. Error</td><td>30.157 (df = 27)</td></tr>
## <tr><td style="text-align:left">F Statistic</td><td>33.310<sup>***</sup> (df = 4; 27)</td></tr>
## <tr><td colspan="2" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr>
## </table>
As you can see, this is the Html code of the previous table, which RStudio is not compiling because I didn’t specify theresults='asis'
option.
Since these lectures are knitted to html, note that I always add the option type = "html"
to the stargazer
function. But your assignments are knitted to .pdf via Latex, therefore you have to specify type = "latex"
as an option.
In any case, you always have to add the option {r results='asis'}
.
See below an example code to generate the table in Latex. (you won’t see any table output, as this is an Html file)
stargazer(mr1,
title = "Regression Results",
dep.var.labels = c("Gross Horsepower"),
covariate.labels = c("Intercept",
"Miles per Gallon",
"Cylinders",
"Weight (1000 lbs)",
"Number of Forward Gear"),
type = "latex",
intercept.bottom = FALSE,
header = FALSE)
Unfortunately, when using the {r results='asis'}
option, the table code is not processed until we knit the document. Because of that, when working in the editor, the stargazer
output will simply display the code for the table, and not the table itself. Which is a bit annoying if you like to see the regression results in the editor.
If you feel very strongly about inspecting the regression results in the editor you can use type = "text"
while working in the editor and then change it to type = "latex"
when knitting the .pdf file.
stargazer(mr1,
title = "Regression Results",
dep.var.labels = c("Gross Horsepower"),
covariate.labels = c("Intercept",
"Miles per Gallon",
"Cylinders",
"Weight (1000 lbs)",
"Number of Forward Gear"),
type = "text",
intercept.bottom = FALSE)
##
## Regression Results
## ==================================================
## Dependent variable:
## ---------------------------
## Gross Horsepower
## --------------------------------------------------
## Intercept -133.995
## (106.205)
##
## Miles per Gallon -3.032
## (2.199)
##
## Cylinders 28.332***
## (5.927)
##
## Weight (1000 lbs) 6.739
## (12.116)
##
## Number of Forward Gear 39.215***
## (9.134)
##
## --------------------------------------------------
## Observations 32
## R2 0.832
## Adjusted R2 0.807
## Residual Std. Error 30.157 (df = 27)
## F Statistic 33.310*** (df = 4; 27)
## ==================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Not the nicest output, but customizable and can be viewed in editor - still better than summary -.
stargazer
has many options to tweak the format of a regression table to your liking. This are the most basic ones:
Variable Labels
dep.var.labels
allows you to provide a custom variable name to the dependent variable (\(y\) variable).
covariate.labels
same as above, but for the independent variables (\(x\) variables).
The input to the previous options is a vector. Then, you have specify it with c()
Space
The no.space
reduces the vertical space between rows. This is useful if you want to reduce the space of the regression table.
The header=FALSE
option removes some comments appended to the table
The omit.stat
option determines which statistics are included at the bottom. See documentation for the code of each stat that can be omitted.
Finally, the align=TRUE
option makes the numbers line up nicely with the decimal points (this option only works in Latex, and not for Html). For this to option to work, you need to add the following option in the preamble of the .Rmd file:
header-includes:
- \usepackage{dcolumn}
list
with each regression as the input of stargazer. For example,# Running four regressions
mr1 <- lm(hp ~ mpg, data = mtcars)
mr2 <- lm(hp ~ mpg + cyl, data = mtcars)
mr3 <- lm(hp ~ mpg + cyl + wt, data = mtcars)
mr4 <- lm(hp ~ mpg + cyl + wt + gear, data = mtcars)
# Collecting regressions in a list
regs <- list(mr1, mr2, mr3, mr4)
# Displaying regressions side-by-side
stargazer(regs,
title = "Regression Results",
dep.var.labels = c("Gross Horsepower"),
covariate.labels = c("Intercept",
"Miles per Gallon",
"Cylinders",
"Weight (1000 lbs)",
"Number of Forward Gear"),
type = "html",
intercept.bottom = FALSE,
df = FALSE) # omits d.o.f of F-test
Dependent variable: | ||||
Gross Horsepower | ||||
(1) | (2) | (3) | (4) | |
Intercept | 324.082*** | 54.067 | 115.662 | -133.995 |
(27.433) | (86.093) | (113.207) | (106.205) | |
Miles per Gallon | -8.830*** | -2.775 | -4.220 | -3.032 |
(1.310) | (2.177) | (2.778) | (2.199) | |
Cylinders | 23.979*** | 25.025*** | 28.332*** | |
(7.346) | (7.486) | (5.927) | ||
Weight (1000 lbs) | -12.135 | 6.739 | ||
(14.382) | (12.116) | |||
Number of Forward Gear | 39.215*** | |||
(9.134) | ||||
Observations | 32 | 32 | 32 | 32 |
R2 | 0.602 | 0.709 | 0.716 | 0.832 |
Adjusted R2 | 0.589 | 0.689 | 0.686 | 0.807 |
Residual Std. Error | 43.945 | 38.223 | 38.414 | 30.157 |
F Statistic | 45.460*** | 35.373*** | 23.585*** | 33.310*** |
Note: | p<0.1; p<0.05; p<0.01 |
ci = TRUE
stargazer(regs,
title = "Regression Results",
dep.var.labels = c("Gross Horsepower"),
covariate.labels = c("Intercept",
"Miles per Gallon",
"Cylinders",
"Weight (1000 lbs)",
"Number of Forward Gear"),
type = "html",
intercept.bottom = FALSE,
df = FALSE,
ci = TRUE,
digits = 2)
Dependent variable: | ||||
Gross Horsepower | ||||
(1) | (2) | (3) | (4) | |
Intercept | 324.08*** | 54.07 | 115.66 | -133.99 |
(270.31, 377.85) | (-114.67, 222.81) | (-106.22, 337.54) | (-342.15, 74.16) | |
Miles per Gallon | -8.83*** | -2.77 | -4.22 | -3.03 |
(-11.40, -6.26) | (-7.04, 1.49) | (-9.67, 1.23) | (-7.34, 1.28) | |
Cylinders | 23.98*** | 25.03*** | 28.33*** | |
(9.58, 38.38) | (10.35, 39.70) | (16.71, 39.95) | ||
Weight (1000 lbs) | -12.13 | 6.74 | ||
(-40.32, 16.05) | (-17.01, 30.49) | |||
Number of Forward Gear | 39.22*** | |||
(21.31, 57.12) | ||||
Observations | 32 | 32 | 32 | 32 |
R2 | 0.60 | 0.71 | 0.72 | 0.83 |
Adjusted R2 | 0.59 | 0.69 | 0.69 | 0.81 |
Residual Std. Error | 43.95 | 38.22 | 38.41 | 30.16 |
F Statistic | 45.46*** | 35.37*** | 23.58*** | 33.31*** |
Note: | p<0.1; p<0.05; p<0.01 |
stargazer
can also process objects of the class data.frame. Which can be used simply to present the data in table format:stargazer(head(mtcars), type = "html", summary = FALSE)
mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
Mazda RX4 | 21 | 6 | 160 | 110 | 3.900 | 2.620 | 16.460 | 0 | 1 | 4 | 4 |
Mazda RX4 Wag | 21 | 6 | 160 | 110 | 3.900 | 2.875 | 17.020 | 0 | 1 | 4 | 4 |
Datsun 710 | 22.800 | 4 | 108 | 93 | 3.850 | 2.320 | 18.610 | 1 | 1 | 4 | 1 |
Hornet 4 Drive | 21.400 | 6 | 258 | 110 | 3.080 | 3.215 | 19.440 | 1 | 0 | 3 | 1 |
Hornet Sportabout | 18.700 | 8 | 360 | 175 | 3.150 | 3.440 | 17.020 | 0 | 0 | 3 | 2 |
Valiant | 18.100 | 6 | 225 | 105 | 2.760 | 3.460 | 20.220 | 1 | 0 | 3 | 1 |
stargazer(mtcars, type = "html")
Statistic | N | Mean | St. Dev. | Min | Pctl(25) | Pctl(75) | Max |
mpg | 32 | 20.091 | 6.027 | 10 | 15.4 | 22.8 | 34 |
cyl | 32 | 6.188 | 1.786 | 4 | 4 | 8 | 8 |
disp | 32 | 230.722 | 123.939 | 71 | 120.8 | 326 | 472 |
hp | 32 | 146.688 | 68.563 | 52 | 96.5 | 180 | 335 |
drat | 32 | 3.597 | 0.535 | 2.760 | 3.080 | 3.920 | 4.930 |
wt | 32 | 3.217 | 0.978 | 1.513 | 2.581 | 3.610 | 5.424 |
qsec | 32 | 17.849 | 1.787 | 14.500 | 16.892 | 18.900 | 22.900 |
vs | 32 | 0.438 | 0.504 | 0 | 0 | 1 | 1 |
am | 32 | 0.406 | 0.499 | 0 | 0 | 1 | 1 |
gear | 32 | 3.688 | 0.738 | 3 | 3 | 4 | 5 |
carb | 32 | 2.812 | 1.615 | 1 | 2 | 4 | 8 |
This tables, just like the regression tables, can be edited and you can specify custom variable names, what statistics to compute, etc.
hp
) and the main independent variable (mpg
):# Computing correlations
corData <- cor(mtcars, use = "pairwise.complete.obs")
# Extracting columns for mpg and hp
corData <- corData[, colnames(corData) %in% c("hp", "mpg")]
# Naming columns and rows
colnames(corData) <- c("Miles per Gallon","Gross Horsepower")
row.names(corData) <- c("Miles per Gallon",
"Number of Cylinders",
"Displacement (cu.in.)",
"Gross Horsepower",
"Rear Axle Ratio",
"Weight (1000 lbs)",
"1/4 Mile Time",
"Engine (0 = V-shaped, 1 = Straight)",
"Transmission (0 = Automatic, 1 = Manual)",
"Number of Forward Gears",
"Number of Carburetors")
stargazer(corData, summary = FALSE,
type = "html",
title = "Correlation Table",
digits = 2)
Miles per Gallon | Gross Horsepower | |
Miles per Gallon | 1 | -0.78 |
Number of Cylinders | -0.85 | 0.83 |
Displacement (cu.in.) | -0.85 | 0.79 |
Gross Horsepower | -0.78 | 1 |
Rear Axle Ratio | 0.68 | -0.45 |
Weight (1000 lbs) | -0.87 | 0.66 |
1/4 Mile Time | 0.42 | -0.71 |
Engine (0 = V-shaped, 1 = Straight) | 0.66 | -0.72 |
Transmission (0 = Automatic, 1 = Manual) | 0.60 | -0.24 |
Number of Forward Gears | 0.48 | -0.13 |
Number of Carburetors | -0.55 | 0.75 |