Pages

Sunday, December 1, 2013

Stats with R - 8 (Logistic Regression)

This week we are going to work on an example at the intersection of decision-making and global warming. The simulated dataset includes a dependent variable, change, for a list of 27 countries. Change indicates whether these countries are willing to take action now against global warming, or if they would rather wait and see (1 = act now, 0 = wait and see). Predictors include: median age (age), education index (educ), gross domestic product (gdp), and CO2 emissions (co2).


To read in the Data set :-
BL <- read.table("stats1-datafiles-Stats1.13.HW.10.txt",header= T)

Glimpse of the Dataset :-




1.What is the median population age for the countries which voted to take action against global warming? (round to 2 decimal places)
Ans:- 35.78

R- Output
2.Run a logistic regression including all predictor variables. Which predictors are significant in this model?
ANS :-The predictors educ and age are  significant with a p value lower than .05

lrfit = glm(BL$change ~ BL$educ + BL$age + BL$gdp + BL$co2, family = binomial)

summary(lrfit)
R- Output

3. What does the negative value for the estimate of educ means?

4 .What is the confidence interval for educ, using profiled log-likelihood? (round to 2 decimal places, and give the lower bound first and the upper bound second, separated by a space)
Ans :- -31.17 -3.03
confint(lrfit)
R- Output
> confint(lrfit)

                    2.5 %      97.5 %

(Intercept)  -7.120294915  7.62212837

BL$educ     -31.171217249 -3.03349629
BL$age        0.151757331  0.73814438
BL$gdp       -1.677996402  0.67880052
BL$co2       -0.001889047  0.00185111

5 . What is the confidence interval for age, using standard errors? (round to 2 decimal places, and give the lower bound first and the upper bound second, separated by a space)
Ans :- 0.09 0.65

R- Output
confint.default(lrfit)
                    2.5 %       97.5 %
(Intercept)  -7.016319882  6.981360625
BL$educ     -27.332984665 -0.119112070
BL$age        0.092877885  0.651260815
BL$gdp       -0.986127869  0.835048069
BL$co2       -0.001910404  0.001148391

6. Compare the present model with a null model. What is the difference in deviance for the two models? (round to 2 decimal places)
Ans :- 16.30
R- Output
> with(lrfit, null.deviance - deviance)
[1] 16.30328

7. How many degrees of freedom are there for the difference between the two models?
Ans:- 4
R- Output
> with(lrfit, df.null - df.residual)
[1] 4

8.Is the p-value for the difference between the two models significant?
Ans:- Yes
R- Output
> with(lrfit, pchisq(null.deviance-deviance, df.null-df.residual, lower.tail = FALSE))
[1] 0.002638074 ---- this indicates that it is significant

9.Do chi-squared values differ significantly if you drop educ as a predictor in the model?
Ans :- Yes 
R- Output
> wald.test(b = coef(lrfit), Sigma = vcov(lrfit), Terms = 2)
Wald test:
----------
Chi-squared test:
X2 = 3.9, df = 1, P(> X2) = 0.048

10. What is the percentage of cases that can be classified correctly based on our model?
Ans:- 81
R- Output

> ClassLog(lrfit, BL$change)

$rawtab

       resp
         0  1
  FALSE  6  2
  TRUE   3 16

$classtab
       resp
                0         1
  FALSE 0.6666667 0.1111111
  TRUE  0.3333333 0.8888889

$overall
[1] 0.8148148