Stata Cheat Sheet
Last Updated 2019-05-16
The purpose of this review is to simply list common data analysis procedures that we do in quantitative methods research and outline the Stata code to accomplish these goals. This document will be updated throughout time. The commands here are based on Stata 14. I would not recommend starting with this document if you are just beginning with Stata. Yes, often there are multiple ways to conduct the same analysis. I only present one here for each item.
This cheat sheet uses placeholders; when you see something in [square brackets], it means to replace in your own code the square bracket placeholder with the specified contents. For example, if you have in an opened dataset a variable named “var3” that you want to manipulate, you might see the following in the code.
[name of variable to be manipulated]
In your code, you would replace the brackets as follows.
var3
Data Cleaning
Counting missing data
ssc install mdesc
mdesc [list of variables, or leave blank for to show for all variables]
Edit variable values
replace [name of variable to edit] = [new value]
Create a variable
generate [new variable name] = [new value]
Create dummy variables from categorical variables
tabulate [categorical variable], g([variable name stem])
Delete a variable
drop [list of variables to be deleted, separated by spaces]
Drop observations based on some condition
drop if [condition identifying which observations to drop]
Merging datasets
One-to-one merge:
merge 1:1 [unique identifier variable shared across datasets] using "[name of file to append to current datasets]"
Appending datasets
append "[name of file to append to current dataset]"
Reshaping datasets
From long to wide format:
reshape wide [list of variables to be made wide], i([unique identifier variable]) j([variable with labels for different wide columns])
From wide to long format (if you have only one variable that needs to be changed):
reshape long [stem for variables that are currently wide], i([unique identifier variable]) j([name for new variable indicating labels for data])
Descriptive Statistics
Summarize continuous variable (mean, standard deviation, minimum, maximum)
summarize [list of continuous variables, separated by spaces]
Frequency table for categorical variable
tabulate [single categorical variable]
Bivariate Hypothesis Testing
One-Sample T Test
ttest [continuous variable] == [population mean]
Two-Sample Independent T Test
ttest [continuous variable], by([binary variable])
Two-Sample Dependent T Test
ttest [variable 1] == [variable 2]
Correlation
correlate [list of variables, separated by spaces]
Regression Methods
Ordinary least squares regression
regress [dependent variable] [independent variables, separated by spaces]
Binary logistic regression
logit [dependent variable] [independent variables, separated by spaces]
Ordinal logistic regression
ologit [dependent variable] [independent variables, separated by spaces]
Multinomial logistic regression
mlogit [dependent variable] [independent variables, separated by spaces]
Miscellaneous Analysis Tools
Conduct analysis for subset of observations
For most common analysis commands, you can specify a subset of observations on which to conduct the analysis using the “if” code. For example, for the linear regression command:
regress [dependent variable] [independent variables, separated by spaces] if [condition]