SAS, R, Analytics, Big Data and Me: December 2013

This week we are going to work on an example at the intersection of decision-making and global warming. The simulated dataset includes a dependent variable, change, for a list of 27 countries. Change indicates whether these countries are willing to take action now against global warming, or if they would rather wait and see (1 = act now, 0 = wait and see). Predictors include: median age (age), education index (educ), gross domestic product (gdp), and CO2 emissions (co2).

Source DataSet :- https://d396qusza40orc.cloudfront.net/stats1%2Fdatafiles%2FStats1.13.HW.10.txt

To read in the Data set :-
BL <- read.table("stats1-datafiles-Stats1.13.HW.10.txt",header= T)

Glimpse of the Dataset :-

1.What is the median population age for the countries which voted to take action against global warming? (round to 2 decimal places)

Ans:- 35.78

R- Output

2.Run a logistic regression including all predictor variables. Which predictors are significant in this model?

ANS :-The predictors educ and age are significant with a p value lower than .05

lrfit = glm(BL$change ~ BL$educ + BL$age + BL$gdp + BL$co2, family = binomial)

summary(lrfit)

R- Output

3. What does the negative value for the estimate of educ means?

Countries with a lower education index score are more likely to chose to act now

Countries with a higher education index score are more likely to chose to wait and see

Educ and change are negatively correlated

All of the above

4 .What is the confidence interval for educ, using profiled log-likelihood? (round to 2 decimal places, and give the lower bound first and the upper bound second, separated by a space)

Ans :- -31.17 -3.03

confint(lrfit)

R- Output

> confint(lrfit)

2.5 % 97.5 %

(Intercept) -7.120294915 7.62212837

BL$educ -31.171217249 -3.03349629

BL$age 0.151757331 0.73814438

BL$gdp -1.677996402 0.67880052

BL$co2 -0.001889047 0.00185111

5 . What is the confidence interval for age, using standard errors? (round to 2 decimal places, and give the lower bound first and the upper bound second, separated by a space)

Ans :- 0.09 0.65

R- Output

confint.default(lrfit)

2.5 % 97.5 %

(Intercept) -7.016319882 6.981360625

BL$educ -27.332984665 -0.119112070

BL$age 0.092877885 0.651260815

BL$gdp -0.986127869 0.835048069

BL$co2 -0.001910404 0.001148391

6. Compare the present model with a null model. What is the difference in deviance for the two models? (round to 2 decimal places)

Ans :- 16.30

R- Output

> with(lrfit, null.deviance - deviance)

[1] 16.30328

7. How many degrees of freedom are there for the difference between the two models?

Ans:- 4

R- Output

> with(lrfit, df.null - df.residual)

[1] 4

8.Is the p-value for the difference between the two models significant?

Ans:- Yes

R- Output

> with(lrfit, pchisq(null.deviance-deviance, df.null-df.residual, lower.tail = FALSE))

[1] 0.002638074 ---- this indicates that it is significant

9.Do chi-squared values differ significantly if you drop educ as a predictor in the model?

Ans :- Yes

R- Output

> wald.test(b = coef(lrfit), Sigma = vcov(lrfit), Terms = 2)

Wald test:

----------

Chi-squared test:

X2 = 3.9, df = 1, P(> X2) = 0.048

10. What is the percentage of cases that can be classified correctly based on our model?

Ans:- 81

R- Output

> ClassLog(lrfit, BL$change)

$rawtab

resp

0 1

FALSE 6 2

TRUE 3 16

$classtab

resp

0 1

FALSE 0.6666667 0.1111111

TRUE 0.3333333 0.8888889

$overall

[1] 0.8148148

SAS, R, Analytics, Big Data and Me

Pages

Friday, December 6, 2013

R your Reference(useful websites)

Sunday, December 1, 2013

Stats with R - 8 (Logistic Regression)