Interaction between predictors

In the lecture12, we used the advertising dataset and retained the following linear model:

\[\widehat{Sales} = \hat{\beta_0} + \hat{\beta_1} TV + \hat{\beta_2} Radio + \hat{\beta_3} (TV:Radio)\]

Load the data and perform the linear regression

Load the data set again (you can download it from here)
Perform the multiple linear regression with 3 predictors and interaction and store it in an object called adv_lm.

Writing the equation associated to the linear regression

Now, using formula in the red box above, we would like to explicitely formulate this linear model using the estimated coefficients. For example, let’s consider the following hypothetical linear model \(y = a \times x + b\) with the estimates \(a = 2\) and \(b = 1\). We would write it as \(y = 2 \times x + 1\).

Extract the coefficients and round the values to 3 digits after the decimal point.
Write down the equation associated to the linear model using the estimated coefficients.

Tip

Bonus question: Use glue::glue() to generate the equation in R
# Lets create some example objects
a <- 2
b <- 1

# The following line will not modify our character sequence:
glue::glue("y = a * x + b")
## y = a * x + b
# character sequences surrounded by curly braces are detected
# as R objects and replaced by their values
glue::glue("y = {a} * x + {b}")
## y = 2 * x + 1
Can you reject the null hypothesis for each coefficient?
Rewrite the equation (by hand) and factorise \(TV\), i. e such as \(TV\times(\hat{\beta_1} + x \times Radio)\)
We would like to invest 2 units (2000$) in advertisings.

Using the factorised equation quickly determine whether it is better:

  • to invest all the money in TV
  • to invest all the money in Radio
  • invest in both media
Using R and the fitted model, predict the increase in sales for an investment of 1000$ in TV and 1000$ in Radio.
Similarly, predict the increase in sales for an investment of 2000$ in TV without any investment in Radio.

Calculate the ratio of the sales when investing in both media to TV alone.

Tip

use the function predict(). And supply as argument a data.frame/tibble where colnames are predictor names and values the desired investment to test

Fitting the model without interaction

Create a new model in R explaining the \(sales\) by the investment in \(TV\) and \(Radio\) but without an interaction.
Extract the coefficients
Write the equation associated to the model using the predicted coefficients
Quickly compare this equation to the factorised one in the previous question.

Do you think that the efficiency of investing 2000$ will be similar in both models?

Using the model without interaction and R, predict the increase in sales for an investment of 1000$ in TV and 1000$ in Radio.
Similarly (using the model without interaction), predict the increase in sales for an investment of 2000 $ in TV only.

Calculate the ratio of the sales when investing in both media to TV alone.

Conclude

Simulated data

We are going to manipulate data where the relationship between the variables is simulated. This example comes from Gareth et al. 2013.

Generate the data

Use the following code to generate the data:

set.seed(1)
simul1 <- tibble(x1 = runif(100),
                     x2 = 0.5 * x1 + rnorm(100) / 10,
                     y  = 2 + 2 * x1 + 0.3 * x2 + (rnorm(100)))
variable y is using a linear model
+ write the equation (as text in your markdown document)
+ write the regression coefficients (as text in your markdown document)
Plot the relationship between x1 and x2. You might want to display all relationships using ggpairs

Tip

ggpairs comes from the package GGally
What is the Pearson’s correlation coefficient between x1 and x2?
Will the correlation between predictors be a problem for the linear regression you just wrote down?

Name this effect.

Fit a linear model explaining y by x1 + x2 and describe the summary results.

Do you accept \(H_0: \hat{\beta_1} = 0\) and \(H_0: \hat{\beta_2} = 0\)?

Fit a linear model explaining y by x1 alone.

Do you accept \(H_0: \hat{\beta_1} = 0\)?
What happened?

Fit a linear model explaining y by x2 alone.

Do you accept \(H_0: \hat{\beta_1} = 0\)?
What happened?

One observation was unfortunately forgotten and is: x1 = 0.1, x2 = 0.8, y = 6.

Add it to the existing simul1 and store the result as simul2.

Tip

Have a look at the function add_row() defined in the package tibble
Plot the relationship between x1 and x2 again using the updated dataset simul2.
Fit the previous linear model again using simul2