This site contains code snippets that I develop while learning and experimenting with SAS, R and Linux.
Friday, December 6, 2013
Sunday, December 1, 2013
Stats with R - 8 (Logistic Regression)
This week we are going to work on an example at the intersection of decision-making and global warming. The simulated dataset includes a dependent variable, change, for a list of 27 countries. Change indicates whether these countries are willing to take action now against global warming, or if they would rather wait and see (1 = act now, 0 = wait and see). Predictors include: median age (age), education index (educ), gross domestic product (gdp), and CO2 emissions (co2).
BL <- read.table("stats1-datafiles-Stats1.13.HW.10.txt",header= T)
Glimpse of the Dataset :-
1.What is the median population age for the countries which voted to take action against global warming? (round to 2 decimal places)
Ans:- 35.78
R- Output
2.Run a logistic regression including all predictor variables. Which predictors are significant in this model?
ANS :-The predictors educ and age are  significant with a p value lower than .05
lrfit = glm(BL$change ~ BL$educ + BL$age + BL$gdp + BL$co2, family = binomial)
summary(lrfit)
R- Output
3. What does the negative value for the estimate of educ means?
4 .What is the confidence interval for educ, using profiled log-likelihood? (round to 2 decimal places, and give the lower bound first and the upper bound second, separated by a space)
Ans :- -31.17 -3.03
confint(lrfit)
R- Output
> confint(lrfit)
                    2.5 %      97.5 %
(Intercept)  -7.120294915  7.62212837
BL$educ     -31.171217249 -3.03349629
BL$age        0.151757331  0.73814438
BL$gdp       -1.677996402  0.67880052
BL$co2       -0.001889047  0.00185111
5 . What is the confidence interval for age, using standard errors? (round to 2 decimal places, and give the lower bound first and the upper bound second, separated by a space)
Ans :- 0.09 0.65
R- Output
confint.default(lrfit)
                    2.5 %       97.5 %
(Intercept)  -7.016319882  6.981360625
BL$educ     -27.332984665 -0.119112070
BL$age        0.092877885  0.651260815
BL$gdp       -0.986127869  0.835048069
BL$co2       -0.001910404  0.001148391
6. Compare the present model with a null model. What is the difference in deviance for the two models? (round to 2 decimal places)
Ans :- 16.30
R- Output
> with(lrfit, null.deviance - deviance)
[1] 16.30328
7. How many degrees of freedom are there for the difference between the two models?
Ans:- 4
R- Output
> with(lrfit, df.null - df.residual)
[1] 4
8.Is the p-value for the difference between the two models significant?
Ans:- Yes
R- Output
> with(lrfit, pchisq(null.deviance-deviance, df.null-df.residual, lower.tail = FALSE))
[1] 0.002638074 ---- this indicates that it is significant
9.Do chi-squared values differ significantly if you drop educ as a predictor in the model?
Ans :- Yes 
R- Output
> wald.test(b = coef(lrfit), Sigma = vcov(lrfit), Terms = 2)
Wald test:
----------
Chi-squared test:
X2 = 3.9, df = 1, P(> X2) = 0.048
10. What is the percentage of cases that can be classified correctly based on our model?
Ans:- 81
R- Output
> ClassLog(lrfit, BL$change)
$rawtab
       resp
         0  1
  FALSE  6  2
  TRUE   3 16
$classtab
       resp
                0         1
  FALSE 0.6666667 0.1111111
  TRUE  0.3333333 0.8888889
$overall
[1] 0.8148148
Subscribe to:
Comments (Atom)
 


