Undergraduate Econometrics
Homework 1
Eco 231 – Undergraduate Econometrics
Spring 2021
1. Assume that you are hired to investigate the causal effect between being raised in high-poverty
neighborhoods in the US and future outcomes during adulthood (such as health, well-being, social
networks and economic self-sufficiency). Employ the sources in the Datasets file in Blackboard
and succinctly answer the following questions:
(a) Mention a suitable dataset that can help you answer the question above. Provide its name
and the website where it can be downloaded.
(b) What is the sample size in this dataset? Is this a reasonable number for your research?
(c) Briefly describe the data you found in part (a). Using the codebook discuss which variables
are crucial to answer the research question posed in the statement above (no more than 10
lines).
2. Suppose you are a researcher interested in studying the relationship between household character-
istics and future educational outcomes of children. You have been advised that one dataset which
satisfies your requirements is the Early Childhood Longitudinal Study, Birth Cohort. Try to find
the data through the sources talked about in the Stata lecture. In order to answer the following
questions, additionally you will need to locate the codebooks of this database. (Note: You do not
need the data, the codebooks and webpage pdfs contain all information you require)
(a) Briefly describe the objectives of this study and the different rounds of the survey. Mention
the methods employed for data collection. At what ages are the interviews conducted? (Your
answer should not exceed 10 lines).
(b) Describe which are the restrictions for the use of this database.
(c) How many children are classified as low birth weight in the first round of the survey?
(d) Describe the groups of variables available in the first round. Classify them in child charac-
teristics, mother characteristics and household characteristics.
(e) Choose two variables you could employ as baseline characteristics of the household. Describe
how these variables would be relevant for studying future outcomes of children.
(f) Calculate the nonresponse rate between the initial number of individuals interviewed and the
two following rounds of the survey.
1
(g) Suppose you are interested in studying how socio-emotional skills are developed before the
age of two. Describe which assessments included in this study could be employed for this
purpose. Does the study have similar assessments for higher ages?
(h) Describe which measurements can be used to analyze the cognitive skills of children in kinder-
garten.
3. This problem asks you to work directly with Stata. Suppose you are a researcher interested in
studying the labor market outcomes of recent college graduates. One public-use, suitable dataset
for this purpose is the National Survey of College Graduates (NSCG). In order to answer the
following questions, you will need to use the attached documentation to identify the variables of
interest.
(a) Explore the survey using the interview questionnaire. Based on this, write down one scientific
question (related to the topic mentioned above) which could be answered using the NSCG.
(b) Use the interview questionnaire provided with the database to identify the variables related
to hours worked per week, weeks worked per year and year earnings. Notice that information
about weeks worked can be derived using two variables. Also note that the NSCG15 value
of 98 for hrs worked per week = logical skip.
(c) After handling invalid values properly, create a table showing the mean and standard devia-
tion of the three variables described in part (a) for men and women separately.
(d) In order to see the distribution of hours worked per week, crate a histogram of this variable
for men and women. Plot the density in the y-axis and use a bin width of 10 for the x-axis.
(e) Create a new variable lnhourwage defined as the (natural) logarithm of year earnings di-
vided by total hours worked during the year. Produce a table showing the mean, standard
deviation and percentiles 10th and 90th of this variable for men and women separately. Drop
observations which yield a negative value of this variable.
(f) Use the interview questionnaire to identify the variable which indicates whether a respondent
changed employer and/or job between 2013 and 2015, as well as the variables describing the
reason of change in case the employer is different between these two years. What is the
proportion of respondents who stayed with the same employer and job during this period?
(g) As a researcher, you are also interested in studying how the gender wage gap varies across
major fields. Using the variable related to the first bachelor degree (nbamemg) and your
variable lnhourwage create a table showing the mean hourly wage for women and men
across different majors. Which is the one that presents the higher wage gap?
(h) Run a regression of hourly wages on education separately for men and women. How does the
parameter of education differ across gender?
(i) Create a variable of potential experience ptlexper, defined as age-education-6. Run a re-
gression of hourly wages on education, potential experience and potential experience squared
separately for men and women. Interpret your results.
2