# 2 Homework Assignments

## 2.1 How to submit homework assignments

You can use a snipping tool to cut and paste the relevant output and figures from RStudio to a word document, type your answers to the questions, and submit them through Black Board. I encourage you to learn more the stargazer package that transforms your analysis into publishable formats.

## 2.2 Assignment 1

Deadline: Oct 24, 2021, Midnight

Source: Stock and Watson, $$4^{th}$$ Edition, Exercise 2.1
Data Description: The data set includes the joint probability distribution of age and average hourly earnings ahe for 25- to 34-year-old full-time workers in 2015 with an education level that exceeds a high school diploma, i.e. $$P(age=x,ahe=y)$$

Questions

a. Compute the marginal distribution of age, i.e $$P(age=x)$$ in a new data set. b. Compute the mean of ahe for each value of age, i.e. compute $$E[ahe|age=25]$$, $$E[ahe|age=26]$$, etc. and plot these conditional expected values of ahe against age Are they related? Briefly comment.
c. Compute the variance of ahe.
d. Compute the covariance between age and ahe.
e. Compute the correlation between age and ahe.
f. Relate your answers in (d) and (e) to the plot that you constructed in (a)

Header for the R script

Start a new R script, copy/paste the header below and save it to Dropbox\EC282\Assignment1 or a similar path that you created for this homework assignment. Run the R script and make sure that you have the data df1 in your environment. Conduct the analysis below the header.

###############################################################################
# list the packages we need and loads them, installs them automatically if we don't have them
# add any package that you need to the list
need <- c('glue', 'dplyr','readxl', 'ggplot2','tidyr','AER','scales','mvtnorm',
'stargazer','httr', 'repmis')

have <- need %in% rownames(installed.packages())
if(any(!have)) install.packages(need[!have])
invisible(lapply(need, library, character.only=T))

# Save the R script to the assignment 1 folder before this
# To set up the working directory
getwd()
setwd(getwd()) #change getwd() here is you need to set a different working directory

#this clears the workspace
rm(list = ls())
#this sets the random number generator seed to my birthday for replication

options(scipen=999)
###############################################################################
#get the data url
df1.url <- 'https://www.dropbox.com/s/aieodljjthbdhks/Age_HourlyEarnings.xlsx?dl=1'
#download the data
GET(df1.url, write_disk(tdf <- tempfile(fileext = ".xlsx")))
#check if it worked
df1 <- read_excel(tdf, col_names = FALSE, skip=1)[2:30,1:11] %>%
rename(ahe = 1) %>%
mutate(ahe = as.numeric(ahe)) %>%
gather(key = "age", value="jointp", -c("ahe")) %>%
mutate(age =  as.numeric(gsub(".*?([0-9]+).*", "\\1", age)) + 23)

head(df1)

#CONDUCT THE ANALYSIS BELOW

## 2.3 Assignment 2

Deadline: Oct 31, 2021, Midnight

Source: Source: Stock and Watson, $$4^{th}$$ Edition, Exercise 3.1

Data description: You can find the data description here.

Questions

a. In 2015, the value of the Consumer Price Index (CPI) was 237.0. In 1996, the value of the CPI was 156.9. Create a new variable in your data frame that expressed all earnings in real 2015 dollars. Use this variable to answer the next questions.

b. Construct a 95% confidence interval for the mean of ahe for high school graduates in 1996.

c. Construct a 95% confidence interval for the mean of ahe for high school graduates in 2015.

d. Construct a 95% confidence interval for the mean of ahe for college graduates in 1996.

e. Construct a 95% confidence interval for the mean of ahe for college graduates in 2015.

f. Did the inflation adjusted wages of high school graduates increase from 1996 to 2015? Use statistical inference to answer.

g. Did the inflation adjusted wages of collage graduates increase from 1996 to 2015? Use statistical inference to answer.

h. Did the gap between earnings of college and high school graduates increase? Use statistical inference to answer.

Header for the R script

Start a new R script, copy/paste the header below and save it to Dropbox\EC282\Assignment2 or a similar path that you created for this homework assignment. Run the R script and make sure that you have the data df1 in your environment. Conduct the analysis below the header.

###############################################################################
# list the packages we need and loads them, installs them automatically if we don't have them
# add any package that you need to the list
need <- c('glue', 'dplyr','readxl',  'ggplot2','tidyr','AER','scales','mvtnorm',
'stargazer','httr', 'repmis')

have <- need %in% rownames(installed.packages())
if(any(!have)) install.packages(need[!have])
invisible(lapply(need, library, character.only=T))

# Save the R script to the assignment 1 folder before this
# To set up the working directory
getwd()
setwd(getwd()) #change getwd() here is you need to set a different working directory

#this clears the workspace
rm(list = ls())
#this sets the random number generator seed to my birthday for replication

options(scipen=999)
###############################################################################
#get the data url
df1.url <- 'https://www.dropbox.com/s/hbi82scuz9q4k11/CPS96_15.xlsx?dl=1'
#download the data
GET(df1.url, write_disk(tdf <- tempfile(fileext = ".xlsx")))
#check if it worked
df1 <- read_excel(tdf) %>%
mutate(ahe = ahe + rnorm(length(ahe)))

head(df1)

#CONDUCT THE ANALYSIS BELOW

## 2.4 Assignment 3

Deadline: Nov 7, 2021, Midnight

Source: Stock and Watson, $$4^{th}$$ Edition, Exercise 4.1

Data description: You can find the data description here.

Questions

a. Construct a scatterplot of growth and tradesshare with a regression line fit on the top.

b. Look at the data set and find Malta on your graph. Why is Malta an outlier?

c. Using all the observations run a regression of growth on tradeshare. Interpret the intercept and the slope. Predict the growth rate for a country with a trade share of 0.5 and another with a trade share equal to 1.

d. Estimate the regression without Malta and interpret the coefficients. Should Malta be excluded from the regression? Briefly comment.

Header for the R script

Start a new R script, copy/paste the header below and save it to Dropbox\EC282\Assignment3 or a similar path that you created for this homework assignment. Run the R script and make sure that you have the data df1 in your environment. Conduct the analysis below the header.

###############################################################################
# list the packages we need and loads them, installs them automatically if we don't have them
# add any package that you need to the list
need <- c('glue', 'dplyr','readxl', 'ggplot2','tidyr','AER','scales','mvtnorm',
'stargazer','httr', 'repmis')

have <- need %in% rownames(installed.packages())
if(any(!have)) install.packages(need[!have])
invisible(lapply(need, library, character.only=T))

# Save the R script to the assignment 1 folder before this
# To set up the working directory
getwd()
setwd(getwd()) #change getwd() here is you need to set a different working directory

#this clears the workspace
rm(list = ls())
#this sets the random number generator seed to my birthday for replication

options(scipen=999)
###############################################################################
#get the data url
df1.url <- 'https://www.dropbox.com/s/lbk73b0amzfj8px/Growth.xlsx?dl=1'
#download the data
GET(df1.url, write_disk(tdf <- tempfile(fileext = ".xlsx")))
#check if it worked

df1 <- read_excel(tdf) %>%
mutate(growth = growth + rnorm(length(growth))/5)

head(df1)

#CONDUCT THE ANALYSIS BELOW

## 2.5 Assignment 4

Deadline: Nov 14, 2021, Midnight

Source: Stock and Watson, $$4^{th}$$ Edition, Exercise 5.3

Data description: You can find the data description here.

Questions

a. Run a regression of birthweight on age. Interpret the coefficient on age. Is the coefficients statistically significant?

b. Estimate the mean and the standard error of birth weight for (i) mother who smoked during the pregnancy and (ii) mother who did not smoke during the pregnancy.

c. Estimate the difference between (i) and (ii). Construct a 95% confidence interval for the difference in the average birthweight for smoking and nonsmoking mothers.

d. Run a regression of birthweight on on the binary variable smoker explain how the estimated intercept, slope related to your previous answers. How about the standard error of $$\hat{\beta}_1$$?

Header for the R script

Start a new R script, copy/paste the header below and save it to Dropbox\EC282\Assignment4 or a similar path that you created for this homework assignment. Run the R script and make sure that you have the data df1 in your enviroment. Conduct the analysis below the header.

###############################################################################
# list the packages we need and loads them, installs them automatically if we don't have them
# add any package that you need to the list
need <- c('glue', 'dplyr','readxl',  'ggplot2','tidyr','AER','scales','mvtnorm',
'stargazer','httr', 'repmis')

have <- need %in% rownames(installed.packages())
if(any(!have)) install.packages(need[!have])
invisible(lapply(need, library, character.only=T))

# Save the R script to the assignment 1 folder before this
# To set up the working directory
getwd()
setwd(getwd()) #change getwd() here is you need to set a different working directory

#this clears the workspace
rm(list = ls())
#this sets the random number generator seed to my birthday for replication

options(scipen=999)
###############################################################################
#get the data url
df1.url <- 'https://www.dropbox.com/s/z8r6hc0r4ytt4f8/birthweight_smoking.xlsx?dl=1'
#download the data
GET(df1.url, write_disk(tdf <- tempfile(fileext = ".xlsx")))
#check if it worked
df1 <- read_excel(tdf) %>%
mutate(birthweight = birthweight + rnorm(length(birthweight)) * 50)

head(df1)

#CONDUCT THE ANALYSIS BELOW

## 2.6 Assignment 5

Deadline: Nov 21, 2021, Midnight

Source: Stock and Watson, $$4^{th}$$ Edition, Exercise 6.1

Data description: You can find the data description here.

Questions

a. Regress (i) birthweight on smoker and (ii) birthweight on smoker, alcohol, and nprevist . Compare the estimated coefficient on smoker in (i) and (ii). Does the regression suffer from omitted variable bias?

b. Predict the birthweight for a child whose mother smoked during the pregnancy, did not drink alcohol, and had 8 prenatal care visits.

c. Compare the $$R^2$$ and adjusted-$$R^2$$ from (ii), why are they so similar?

Header for the R script

Start a new R script, copy/paste the header below and save it to Dropbox\EC282\Assignment5 or a similar path that you created for this homework assignment. Run the R script and make sure that you have the data df1 in your environment. Conduct the analysis below the header.

###############################################################################
# list the packages we need and loads them, installs them automatically if we don't have them
# add any package that you need to the list
need <- c('glue', 'dplyr','readxl', 'ggplot2','tidyr','AER','scales','mvtnorm',
'stargazer','httr', 'repmis')

have <- need %in% rownames(installed.packages())
if(any(!have)) install.packages(need[!have])
invisible(lapply(need, library, character.only=T))

# Save the R script to the assignment 1 folder before this
# To set up the working directory
getwd()
setwd(getwd()) #change getwd() here is you need to set a different working directory

#this clears the workspace
rm(list = ls())
#this sets the random number generator seed to my birthday for replication

options(scipen=999)
###############################################################################
#get the data url
df1.url <- 'https://www.dropbox.com/s/z8r6hc0r4ytt4f8/birthweight_smoking.xlsx?dl=1'
#download the data
GET(df1.url, write_disk(tdf <- tempfile(fileext = ".xlsx")))
#check if it worked
df1 <- read_excel(tdf) %>%
mutate(birthweight = birthweight + rnorm(length(birthweight)) * 50)

head(df1)

#CONDUCT THE ANALYSIS BELOW

## 2.7 Assignment 6

Deadline: Nov 28, 2021, Midnight

Source: Stock and Watson, $$4^{th}$$ Edition, Exercise 7.1

Data description: You can find the data description here.

Questions

a. Regress (i) birthweight on smoker, alcohol,nprevist, and unmarried. Interpret the coefficient on unmarried.

b. Construct a 95% confidence interval on for the coefficient. Is it statistically significant? Is the magnitude of the coefficient large?

c. Looking at this regression, a family advocacy group claims that higher rates of marriage will lead to healthier babies thus one obviuous public policy is to encourage marriage. Do you agree?

d. Consider the data set that you have and briefly discuss what variables can be added to the regression to help to solve question (c).

e. Run the regression with these additional controls. How did the coefficient on marriage has changed with these additional controls.

Header for the R script

Start a new R script, copy/paste the header below and save it to Dropbox\EC282\Assignment6 or a similar path that you created for this homework assignment. Run the R script and make sure that you have the data df1 in your environment. Conduct the analysis below the header.

###############################################################################
# list the packages we need and loads them, installs them automatically if we don't have them
# add any package that you need to the list
need <- c('glue', 'dplyr','readxl', 'ggplot2','tidyr','AER','scales','mvtnorm',
'stargazer','httr', 'repmis')

have <- need %in% rownames(installed.packages())
if(any(!have)) install.packages(need[!have])
invisible(lapply(need, library, character.only=T))

# Save the R script to the assignment 1 folder before this
# To set up the working directory
getwd()
setwd(getwd()) #change getwd() here is you need to set a different working directory

#this clears the workspace
rm(list = ls())
#this sets the random number generator seed to my birthday for replication

options(scipen=999)
###############################################################################
#get the data url
df1.url <- 'https://www.dropbox.com/s/z8r6hc0r4ytt4f8/birthweight_smoking.xlsx?dl=1'
#download the data
GET(df1.url, write_disk(tdf <- tempfile(fileext = ".xlsx")))
#check if it worked
df1 <- read_excel(tdf) %>%
mutate(birthweight = birthweight + rnorm(length(birthweight)) * 50)

head(df1)

#CONDUCT THE ANALYSIS BELOW

## 2.8 Assignment 7

Deadline: Dec 5, 2021, Midnight

Source: Stock and Watson, $$4^{th}$$ Edition, Exercise 8.1

Data description: You can find the data description here.

Questions

a. Using a regression, show the average infant mortality rate infrate for for cities with lead pipes and for cities with nonlead pipes. Is there a statistically significant different in the averages?

b. Amount of lead leached from lead pipes depends on the chemistry of the water running through the pipes. Lower the ph, more asidic the water is and the more lead is leached. Run a regression of infrate on lead, ph, and the interaction term lead $$times$$ ph

c. Does lead have a statistically significant effect on infant mortality? Explain.

d. Does the effect of lead on infant mortality depend on ph? Is this dependence statistically significant?

e. Construct a 95% confidence interval for the effect of lead on infant mortality when ph is 6.5

Header for the R script

Start a new R script, copy/paste the header below and save it to Dropbox\EC282\Assignment7 or a similar path that you created for this homework assignment. Run the R script and make sure that you have the data df1 in your environment. Conduct the analysis below the header.

###############################################################################
# list the packages we need and loads them, installs them automatically if we don't have them
# add any package that you need to the list
need <- c('glue', 'dplyr','readxl', 'ggplot2','tidyr','AER','scales','mvtnorm',
'stargazer','httr', 'repmis')

have <- need %in% rownames(installed.packages())
if(any(!have)) install.packages(need[!have])
invisible(lapply(need, library, character.only=T))

# Save the R script to the assignment 1 folder before this
# To set up the working directory
getwd()
setwd(getwd()) #change getwd() here is you need to set a different working directory

#this clears the workspace
rm(list = ls())
#this sets the random number generator seed to my birthday for replication

options(scipen=999)
###############################################################################
#get the data url
df1.url <- 'https://www.dropbox.com/s/5nszqnejl7uu9f5/lead_mortality.xlsx?dl=1'
#download the data
GET(df1.url, write_disk(tdf <- tempfile(fileext = ".xlsx")))
#check if it worked
df1 <- read_excel(tdf) %>%
mutate(infrate1 = infrate + rnorm(length(infrate))/50)

head(df1)

#CONDUCT THE ANALYSIS BELOW