Salary can be influenced by many variables. Among these, years of professional experience and total courses completed in college are critical. we test this hypothesis with a simulated dataset including an outcome variable, salary, and two predictors, years of experience and courses completed. Here are a few questions based on what was covered in the lectures and the lab. Have fun!
Source DataSet :- https://spark-public.s3.amazonaws.com/stats1/datafiles/Stats1.13.HW.04.txt
Glimpse of the Dataset :-
To read in the Data set :-
PE<- read.table("Stats1.13.HW.04.txt",header = T)
1.What is the correlation between salary and years of professional experience?
R- code
> round(cor(PE$salary ,PE$years),2)
[1] 0.74
2.What is the correlation between salary and courses completed?
R- code
> round(cor(PE$salary , PE$courses),2)
[1] 0.54
3.What is the percentage of variance explained in a regression model with salary as the outcome variable and professional experience as the predictor variable?
Ans : 55
R- code
model1<- lm(PE$salary ~ PE$years)
summary(model1)
4 .Compared to the model from Question 3, would a regression model predicting salary from the number of courses be considered a better fit to the data?
we need to compare here model1 and model4
R- code
model1<- lm(PE$salary ~ PE$years)
summary(model1)
model4 <- lm(PE$salary ~ PE$courses)
summary(model4)
Since the re-gression co-efficient is higher in model1 , MODEL1 regression model with salary as the outcome variable and professional experience as the predictor variable will be a better fit than MODEL4
predicting salary from the number of courses be considered a better fit to the data .
5. Now let's include both predictors (years of professional experience and courses completed) in a regression model with salary as the outcome. Now what is the percentage of variance explained?
Ans :- 65
R- code
model2<- lm(PE$salary ~ PE$years+PE$courses)
summary(model2)
6 .What is the standardized regression coefficient for years of professional experience, predicting salary?
Ans :- .74
R- code
model6 <- lm(scale(PE$salary) ~ scale(PE$years))
summary(model6)
7.What is the standardized regression coefficient for courses completed, predicting salary?
Ans:- .54
R- code
model7<- lm(scale(PE$salary) ~ scale(PE$courses))
summary(model7)
8.What is the mean of the salary distribution predicted by the model including both years of professional experience and courses completed as predictors? (with 0 decimal places)
Ans :- 75426
R- code
model2<- lm(PE$salary ~ PE$years+PE$courses)
summary(model2)
> PE$predicted <- fitted(model2)
> mean (PE$predicted)
[1] 75426.44
9.What is the mean of the residual distribution for the model predicting salary from both years of professional experience and courses completed? (with 0 decimal places)
Ans :- 0
R- code
model2<- lm(PE$salary ~ PE$years+PE$courses)
summary(model2)
> PE$residual <- resid(model2)
> mean(PE$residual)
[1] -1.893208e-14
10 .Are the residuals from the regression model with both predictors normally distributed?
Ans :- YES
R- code
model2<- lm(PE$salary ~ PE$years+PE$courses)
summary(model2)
PE$residual <- resid(model2)
hist(PE$residual)
Source DataSet :- https://spark-public.s3.amazonaws.com/stats1/datafiles/Stats1.13.HW.04.txt
Glimpse of the Dataset :-
To read in the Data set :-
PE<- read.table("Stats1.13.HW.04.txt",header = T)
1.What is the correlation between salary and years of professional experience?
R- code
> round(cor(PE$salary ,PE$years),2)
[1] 0.74
2.What is the correlation between salary and courses completed?
R- code
> round(cor(PE$salary , PE$courses),2)
[1] 0.54
3.What is the percentage of variance explained in a regression model with salary as the outcome variable and professional experience as the predictor variable?
Ans : 55
R- code
model1<- lm(PE$salary ~ PE$years)
summary(model1)
4 .Compared to the model from Question 3, would a regression model predicting salary from the number of courses be considered a better fit to the data?
we need to compare here model1 and model4
R- code
model1<- lm(PE$salary ~ PE$years)
summary(model1)
model4 <- lm(PE$salary ~ PE$courses)
summary(model4)
predicting salary from the number of courses be considered a better fit to the data .
5. Now let's include both predictors (years of professional experience and courses completed) in a regression model with salary as the outcome. Now what is the percentage of variance explained?
Ans :- 65
R- code
model2<- lm(PE$salary ~ PE$years+PE$courses)
summary(model2)
6 .What is the standardized regression coefficient for years of professional experience, predicting salary?
Ans :- .74
R- code
model6 <- lm(scale(PE$salary) ~ scale(PE$years))
summary(model6)
7.What is the standardized regression coefficient for courses completed, predicting salary?
Ans:- .54
R- code
model7<- lm(scale(PE$salary) ~ scale(PE$courses))
summary(model7)
8.What is the mean of the salary distribution predicted by the model including both years of professional experience and courses completed as predictors? (with 0 decimal places)
Ans :- 75426
R- code
model2<- lm(PE$salary ~ PE$years+PE$courses)
summary(model2)
> PE$predicted <- fitted(model2)
> mean (PE$predicted)
[1] 75426.44
9.What is the mean of the residual distribution for the model predicting salary from both years of professional experience and courses completed? (with 0 decimal places)
Ans :- 0
R- code
model2<- lm(PE$salary ~ PE$years+PE$courses)
summary(model2)
> PE$residual <- resid(model2)
> mean(PE$residual)
[1] -1.893208e-14
10 .Are the residuals from the regression model with both predictors normally distributed?
Ans :- YES
R- code
model2<- lm(PE$salary ~ PE$years+PE$courses)
summary(model2)
PE$residual <- resid(model2)
hist(PE$residual)