Pages

Sunday, October 20, 2013

Stats with R - 4

Salary can be influenced by many variables. Among these, years of professional experience and total courses completed in college are critical.  we test this hypothesis with a simulated dataset including an outcome variable, salary, and two predictors, years of experience and courses completed. Here are a few questions based on what was covered in the lectures and the lab. Have fun!

 Source DataSet :-   https://spark-public.s3.amazonaws.com/stats1/datafiles/Stats1.13.HW.04.txt

Glimpse of the Dataset :-


To read in the Data set :-
PE<- read.table("Stats1.13.HW.04.txt",header = T)

1.What is the correlation between salary and years of professional experience?

R- code
> round(cor(PE$salary ,PE$years),2)
[1] 0.74

2.What is the correlation between salary and courses completed?

R- code
> round(cor(PE$salary , PE$courses),2)
[1] 0.54

3.What is the percentage of variance explained in a regression model with salary as the outcome variable and professional experience as the predictor variable?
Ans : 55 

R- code
model1<- lm(PE$salary ~ PE$years)
summary(model1)

4 .Compared to the model from Question 3, would a regression model predicting salary from the number of courses be considered a better fit to the data?
we need to compare here model1 and model4

R- code
model1<- lm(PE$salary ~ PE$years)
summary(model1)


model4 <- lm(PE$salary ~ PE$courses)

summary(model4)



Since the re-gression co-efficient  is  higher in model1 , MODEL1 regression model with salary as the outcome variable and professional experience as the predictor variable will be a better fit than  MODEL4
predicting salary from the number of courses be considered a better fit to the data .

5. Now let's include both predictors (years of professional experience and courses completed) in a regression model with salary as the outcome. Now what is the percentage of variance explained?
Ans :- 65 

R- code
model2<- lm(PE$salary ~ PE$years+PE$courses)

summary(model2)




6 .What is the standardized regression coefficient for years of professional experience, predicting salary?
Ans :- .74
R- code
model6 <- lm(scale(PE$salary) ~ scale(PE$years))
summary(model6)


7.What is the standardized regression coefficient for courses completed, predicting salary?
Ans:- .54
R- code
model7<- lm(scale(PE$salary) ~ scale(PE$courses))
summary(model7)


8.What is the mean of the salary distribution predicted by the model including both years of professional experience and courses completed as predictors? (with 0 decimal places)
Ans :- 75426
R- code
model2<- lm(PE$salary ~ PE$years+PE$courses)
summary(model2)
> PE$predicted <- fitted(model2)
> mean (PE$predicted)
[1] 75426.44

9.What is the mean of the residual distribution for the model predicting salary from both years of professional experience and courses completed? (with 0 decimal places)
Ans :-  0
R- code
model2<- lm(PE$salary ~ PE$years+PE$courses)
summary(model2)
> PE$residual <- resid(model2)
> mean(PE$residual)
[1] -1.893208e-14

10 .Are the residuals from the regression model with both predictors normally distributed?
Ans :- YES
 R- code
model2<- lm(PE$salary ~ PE$years+PE$courses)
summary(model2)
PE$residual <- resid(model2)

hist(PE$residual)

2 comments: