Stata Cheat Sheet

Last Updated 2019-05-16

The purpose of this review is to simply list common data analysis procedures that we do in quantitative methods research and outline the Stata code to accomplish these goals. This document will be updated throughout time. The commands here are based on Stata 14. I would not recommend starting with this document if you are just beginning with Stata. Yes, often there are multiple ways to conduct the same analysis. I only present one here for each item.

This cheat sheet uses placeholders; when you see something in [square brackets], it means to replace in your own code the square bracket placeholder with the specified contents. For example, if you have in an opened dataset a variable named “var3” that you want to manipulate, you might see the following in the code.

[name of variable to be manipulated]

In your code, you would replace the brackets as follows.

var3

Data Cleaning

Counting missing data

ssc install mdesc
mdesc [list of variables, or leave blank for to show for all variables]

Edit variable values

replace [name of variable to edit] = [new value]

Create a variable

generate [new variable name] = [new value]

Create dummy variables from categorical variables

tabulate [categorical variable], g([variable name stem])

Delete a variable

drop [list of variables to be deleted, separated by spaces]

Drop observations based on some condition

drop if [condition identifying which observations to drop]

Merging datasets

One-to-one merge:

merge 1:1 [unique identifier variable shared across datasets] using "[name of file to append to current datasets]"

Appending datasets

append "[name of file to append to current dataset]"

Reshaping datasets

From long to wide format:

reshape wide [list of variables to be made wide], i([unique identifier variable]) j([variable with labels for different wide columns])

From wide to long format (if you have only one variable that needs to be changed):

reshape long [stem for variables that are currently wide], i([unique identifier variable]) j([name for new variable indicating labels for data])

Descriptive Statistics

Summarize continuous variable (mean, standard deviation, minimum, maximum)

summarize [list of continuous variables, separated by spaces]

Frequency table for categorical variable

tabulate [single categorical variable]

Bivariate Hypothesis Testing

One-Sample T Test

ttest [continuous variable] == [population mean]

Two-Sample Independent T Test

ttest [continuous variable], by([binary variable])

Two-Sample Dependent T Test

ttest [variable 1] == [variable 2]

Correlation

correlate [list of variables, separated by spaces]

Regression Methods

Ordinary least squares regression

regress [dependent variable] [independent variables, separated by spaces]

Binary logistic regression

logit [dependent variable] [independent variables, separated by spaces]

Ordinal logistic regression

ologit [dependent variable] [independent variables, separated by spaces]

Multinomial logistic regression

mlogit [dependent variable] [independent variables, separated by spaces]

Miscellaneous Analysis Tools

Conduct analysis for subset of observations

For most common analysis commands, you can specify a subset of observations on which to conduct the analysis using the “if” code. For example, for the linear regression command:

regress [dependent variable] [independent variables, separated by spaces] if [condition]