2 Homework Assignments
2.1 How to submit homework assignments
You can use a snipping tool to cut and paste the relevant output and figures from RStudio to a word document, type your answers to the questions, and submit them through Black Board. I encourage you to learn more the stargazer
package that transforms your analysis into publishable formats.
2.2 Assignment 1
Deadline: Oct 24, 2021, Midnight
Source: Stock and Watson, \(4^{th}\) Edition, Exercise 2.1
Data Description: The data set includes the joint probability distribution of age
and average hourly earnings ahe
for 25- to 34-year-old full-time workers in 2015 with an education level that exceeds a high school diploma, i.e. \(P(age=x,ahe=y)\)
Questions
a. Compute the marginal distribution of age, i.e \(P(age=x)\) in a new data set.
b. Compute the mean of ahe
for each value of age
, i.e. compute \(E[ahe|age=25]\),
\(E[ahe|age=26]\), etc. and plot these conditional expected values of ahe
against age
Are they related? Briefly comment.
c. Compute the variance of ahe
.
d. Compute the covariance between age
and ahe
.
e. Compute the correlation between age
and ahe
.
f. Relate your answers in (d) and (e) to the plot that you constructed in (a)
Header for the R script
Start a new R script, copy/paste the header below and save it to Dropbox\EC282\Assignment1
or a similar path that you created for this homework assignment. Run the R script and make sure that you have the data df1
in your environment. Conduct the analysis below the header.
###############################################################################
# list the packages we need and loads them, installs them automatically if we don't have them
# add any package that you need to the list
<- c('glue', 'dplyr','readxl', 'ggplot2','tidyr','AER','scales','mvtnorm',
need 'stargazer','httr', 'repmis')
<- need %in% rownames(installed.packages())
have if(any(!have)) install.packages(need[!have])
invisible(lapply(need, library, character.only=T))
# Save the R script to the assignment 1 folder before this
# To set up the working directory
getwd()
setwd(getwd()) #change getwd() here is you need to set a different working directory
#this clears the workspace
rm(list = ls())
#this sets the random number generator seed to my birthday for replication
options(scipen=999)
###############################################################################
#get the data url
<- 'https://www.dropbox.com/s/aieodljjthbdhks/Age_HourlyEarnings.xlsx?dl=1'
df1.url #download the data
GET(df1.url, write_disk(tdf <- tempfile(fileext = ".xlsx")))
#check if it worked
<- read_excel(tdf, col_names = FALSE, skip=1)[2:30,1:11] %>%
df1 rename(ahe = 1) %>%
mutate(ahe = as.numeric(ahe)) %>%
gather(key = "age", value="jointp", -c("ahe")) %>%
mutate(age = as.numeric(gsub(".*?([0-9]+).*", "\\1", age)) + 23)
head(df1)
#CONDUCT THE ANALYSIS BELOW
2.3 Assignment 2
Deadline: Oct 31, 2021, Midnight
Source: Source: Stock and Watson, \(4^{th}\) Edition, Exercise 3.1
Data description: You can find the data description here.
Questions
a. In 2015, the value of the Consumer Price Index (CPI) was 237.0. In 1996, the value of the CPI was 156.9. Create a new variable in your data frame that expressed all earnings in real 2015 dollars. Use this variable to answer the next questions.
b. Construct a 95% confidence interval for the mean of ahe
for high school graduates in 1996.
c. Construct a 95% confidence interval for the mean of ahe
for high school graduates in 2015.
d. Construct a 95% confidence interval for the mean of ahe
for college graduates in 1996.
e. Construct a 95% confidence interval for the mean of ahe
for college graduates in 2015.
f. Did the inflation adjusted wages of high school graduates increase from 1996 to 2015? Use statistical inference to answer.
g. Did the inflation adjusted wages of collage graduates increase from 1996 to 2015? Use statistical inference to answer.
h. Did the gap between earnings of college and high school graduates increase? Use statistical inference to answer.
Header for the R script
Start a new R script, copy/paste the header below and save it to Dropbox\EC282\Assignment2
or a similar path that you created for this homework assignment. Run the R script and make sure that you have the data df1
in your environment. Conduct the analysis below the header.
###############################################################################
# list the packages we need and loads them, installs them automatically if we don't have them
# add any package that you need to the list
<- c('glue', 'dplyr','readxl', 'ggplot2','tidyr','AER','scales','mvtnorm',
need 'stargazer','httr', 'repmis')
<- need %in% rownames(installed.packages())
have if(any(!have)) install.packages(need[!have])
invisible(lapply(need, library, character.only=T))
# Save the R script to the assignment 1 folder before this
# To set up the working directory
getwd()
setwd(getwd()) #change getwd() here is you need to set a different working directory
#this clears the workspace
rm(list = ls())
#this sets the random number generator seed to my birthday for replication
options(scipen=999)
###############################################################################
#get the data url
<- 'https://www.dropbox.com/s/hbi82scuz9q4k11/CPS96_15.xlsx?dl=1'
df1.url #download the data
GET(df1.url, write_disk(tdf <- tempfile(fileext = ".xlsx")))
#check if it worked
<- read_excel(tdf) %>%
df1 mutate(ahe = ahe + rnorm(length(ahe)))
head(df1)
#CONDUCT THE ANALYSIS BELOW
2.4 Assignment 3
Deadline: Nov 7, 2021, Midnight
Source: Stock and Watson, \(4^{th}\) Edition, Exercise 4.1
Data description: You can find the data description here.
Questions
a. Construct a scatterplot of growth
and tradesshare
with a regression line fit on the top.
b. Look at the data set and find Malta on your graph. Why is Malta an outlier?
c. Using all the observations run a regression of growth
on tradeshare
. Interpret the intercept and the slope. Predict the growth rate for a country with a trade share of 0.5 and another with a trade share equal to 1.
d. Estimate the regression without Malta and interpret the coefficients. Should Malta be excluded from the regression? Briefly comment.
Header for the R script
Start a new R script, copy/paste the header below and save it to Dropbox\EC282\Assignment3
or a similar path that you created for this homework assignment. Run the R script and make sure that you have the data df1
in your environment. Conduct the analysis below the header.
###############################################################################
# list the packages we need and loads them, installs them automatically if we don't have them
# add any package that you need to the list
<- c('glue', 'dplyr','readxl', 'ggplot2','tidyr','AER','scales','mvtnorm',
need 'stargazer','httr', 'repmis')
<- need %in% rownames(installed.packages())
have if(any(!have)) install.packages(need[!have])
invisible(lapply(need, library, character.only=T))
# Save the R script to the assignment 1 folder before this
# To set up the working directory
getwd()
setwd(getwd()) #change getwd() here is you need to set a different working directory
#this clears the workspace
rm(list = ls())
#this sets the random number generator seed to my birthday for replication
options(scipen=999)
###############################################################################
#get the data url
<- 'https://www.dropbox.com/s/lbk73b0amzfj8px/Growth.xlsx?dl=1'
df1.url #download the data
GET(df1.url, write_disk(tdf <- tempfile(fileext = ".xlsx")))
#check if it worked
<- read_excel(tdf) %>%
df1 mutate(growth = growth + rnorm(length(growth))/5)
head(df1)
#CONDUCT THE ANALYSIS BELOW
2.5 Assignment 4
Deadline: Nov 14, 2021, Midnight
Source: Stock and Watson, \(4^{th}\) Edition, Exercise 5.3
Data description: You can find the data description here.
Questions
a. Run a regression of birthweight
on age
. Interpret the coefficient on age. Is the coefficients statistically significant?
b. Estimate the mean and the standard error of birth weight for (i) mother who smoked during the pregnancy and (ii) mother who did not smoke during the pregnancy.
c. Estimate the difference between (i) and (ii). Construct a 95% confidence interval for the difference in the average birthweight
for smoking and nonsmoking mothers.
d. Run a regression of birthweight
on on the binary variable smoker
explain how the estimated intercept, slope related to your previous answers. How about the standard error of \(\hat{\beta}_1\)?
Header for the R script
Start a new R script, copy/paste the header below and save it to Dropbox\EC282\Assignment4
or a similar path that you created for this homework assignment. Run the R script and make sure that you have the data df1
in your enviroment. Conduct the analysis below the header.
###############################################################################
# list the packages we need and loads them, installs them automatically if we don't have them
# add any package that you need to the list
<- c('glue', 'dplyr','readxl', 'ggplot2','tidyr','AER','scales','mvtnorm',
need 'stargazer','httr', 'repmis')
<- need %in% rownames(installed.packages())
have if(any(!have)) install.packages(need[!have])
invisible(lapply(need, library, character.only=T))
# Save the R script to the assignment 1 folder before this
# To set up the working directory
getwd()
setwd(getwd()) #change getwd() here is you need to set a different working directory
#this clears the workspace
rm(list = ls())
#this sets the random number generator seed to my birthday for replication
options(scipen=999)
###############################################################################
#get the data url
<- 'https://www.dropbox.com/s/z8r6hc0r4ytt4f8/birthweight_smoking.xlsx?dl=1'
df1.url #download the data
GET(df1.url, write_disk(tdf <- tempfile(fileext = ".xlsx")))
#check if it worked
<- read_excel(tdf) %>%
df1 mutate(birthweight = birthweight + rnorm(length(birthweight)) * 50)
head(df1)
#CONDUCT THE ANALYSIS BELOW
2.6 Assignment 5
Deadline: Nov 21, 2021, Midnight
Source: Stock and Watson, \(4^{th}\) Edition, Exercise 6.1
Data description: You can find the data description here.
Questions
a. Regress (i) birthweight
on smoker
and (ii) birthweight
on smoker
, alcohol
, and nprevist
. Compare the estimated coefficient on smoker
in (i) and (ii). Does the regression suffer from omitted variable bias?
b. Predict the birthweight for a child whose mother smoked during the pregnancy, did not drink alcohol, and had 8 prenatal care visits.
c. Compare the \(R^2\) and adjusted-\(R^2\) from (ii), why are they so similar?
Header for the R script
Start a new R script, copy/paste the header below and save it to Dropbox\EC282\Assignment5
or a similar path that you created for this homework assignment. Run the R script and make sure that you have the data df1
in your environment. Conduct the analysis below the header.
###############################################################################
# list the packages we need and loads them, installs them automatically if we don't have them
# add any package that you need to the list
<- c('glue', 'dplyr','readxl', 'ggplot2','tidyr','AER','scales','mvtnorm',
need 'stargazer','httr', 'repmis')
<- need %in% rownames(installed.packages())
have if(any(!have)) install.packages(need[!have])
invisible(lapply(need, library, character.only=T))
# Save the R script to the assignment 1 folder before this
# To set up the working directory
getwd()
setwd(getwd()) #change getwd() here is you need to set a different working directory
#this clears the workspace
rm(list = ls())
#this sets the random number generator seed to my birthday for replication
options(scipen=999)
###############################################################################
#get the data url
<- 'https://www.dropbox.com/s/z8r6hc0r4ytt4f8/birthweight_smoking.xlsx?dl=1'
df1.url #download the data
GET(df1.url, write_disk(tdf <- tempfile(fileext = ".xlsx")))
#check if it worked
<- read_excel(tdf) %>%
df1 mutate(birthweight = birthweight + rnorm(length(birthweight)) * 50)
head(df1)
#CONDUCT THE ANALYSIS BELOW
2.7 Assignment 6
Deadline: Nov 28, 2021, Midnight
Source: Stock and Watson, \(4^{th}\) Edition, Exercise 7.1
Data description: You can find the data description here.
Questions
a. Regress (i) birthweight
on smoker
, alcohol
,nprevist
, and unmarried
. Interpret the coefficient on unmarried
.
b. Construct a 95% confidence interval on for the coefficient. Is it statistically significant? Is the magnitude of the coefficient large?
c. Looking at this regression, a family advocacy group claims that higher rates of marriage will lead to healthier babies thus one obviuous public policy is to encourage marriage. Do you agree?
d. Consider the data set that you have and briefly discuss what variables can be added to the regression to help to solve question (c).
e. Run the regression with these additional controls. How did the coefficient on marriage has changed with these additional controls.
Header for the R script
Start a new R script, copy/paste the header below and save it to Dropbox\EC282\Assignment6
or a similar path that you created for this homework assignment. Run the R script and make sure that you have the data df1
in your environment. Conduct the analysis below the header.
###############################################################################
# list the packages we need and loads them, installs them automatically if we don't have them
# add any package that you need to the list
<- c('glue', 'dplyr','readxl', 'ggplot2','tidyr','AER','scales','mvtnorm',
need 'stargazer','httr', 'repmis')
<- need %in% rownames(installed.packages())
have if(any(!have)) install.packages(need[!have])
invisible(lapply(need, library, character.only=T))
# Save the R script to the assignment 1 folder before this
# To set up the working directory
getwd()
setwd(getwd()) #change getwd() here is you need to set a different working directory
#this clears the workspace
rm(list = ls())
#this sets the random number generator seed to my birthday for replication
options(scipen=999)
###############################################################################
#get the data url
<- 'https://www.dropbox.com/s/z8r6hc0r4ytt4f8/birthweight_smoking.xlsx?dl=1'
df1.url #download the data
GET(df1.url, write_disk(tdf <- tempfile(fileext = ".xlsx")))
#check if it worked
<- read_excel(tdf) %>%
df1 mutate(birthweight = birthweight + rnorm(length(birthweight)) * 50)
head(df1)
#CONDUCT THE ANALYSIS BELOW
2.8 Assignment 7
Deadline: Dec 5, 2021, Midnight
Source: Stock and Watson, \(4^{th}\) Edition, Exercise 8.1
Data description: You can find the data description here.
Questions
a. Using a regression, show the average infant mortality rate infrate
for for cities with lead pipes and for cities with nonlead pipes. Is there a statistically significant different in the averages?
b. Amount of lead leached from lead pipes depends on the chemistry of the water running through the pipes. Lower the ph
, more asidic the water is and the more lead is leached. Run a regression of infrate
on lead
, ph
, and the interaction term lead
\(times\) ph
c. Does lead
have a statistically significant effect on infant mortality? Explain.
d. Does the effect of lead
on infant mortality depend on ph
? Is this dependence statistically significant?
e. Construct a 95% confidence interval for the effect of lead
on infant mortality when ph
is 6.5
Header for the R script
Start a new R script, copy/paste the header below and save it to Dropbox\EC282\Assignment7
or a similar path that you created for this homework assignment. Run the R script and make sure that you have the data df1
in your environment. Conduct the analysis below the header.
###############################################################################
# list the packages we need and loads them, installs them automatically if we don't have them
# add any package that you need to the list
<- c('glue', 'dplyr','readxl', 'ggplot2','tidyr','AER','scales','mvtnorm',
need 'stargazer','httr', 'repmis')
<- need %in% rownames(installed.packages())
have if(any(!have)) install.packages(need[!have])
invisible(lapply(need, library, character.only=T))
# Save the R script to the assignment 1 folder before this
# To set up the working directory
getwd()
setwd(getwd()) #change getwd() here is you need to set a different working directory
#this clears the workspace
rm(list = ls())
#this sets the random number generator seed to my birthday for replication
options(scipen=999)
###############################################################################
#get the data url
<- 'https://www.dropbox.com/s/5nszqnejl7uu9f5/lead_mortality.xlsx?dl=1'
df1.url #download the data
GET(df1.url, write_disk(tdf <- tempfile(fileext = ".xlsx")))
#check if it worked
<- read_excel(tdf) %>%
df1 mutate(infrate1 = infrate + rnorm(length(infrate))/50)
head(df1)
#CONDUCT THE ANALYSIS BELOW