This site contains code snippets that I develop while learning and experimenting with SAS, R and Linux.
Friday, December 6, 2013
Sunday, December 1, 2013
Stats with R - 8 (Logistic Regression)
This week we are going to work on an example at the intersection of decision-making and global warming. The simulated dataset includes a dependent variable, change, for a list of 27 countries. Change indicates whether these countries are willing to take action now against global warming, or if they would rather wait and see (1 = act now, 0 = wait and see). Predictors include: median age (age), education index (educ), gross domestic product (gdp), and CO2 emissions (co2).
BL <- read.table("stats1-datafiles-Stats1.13.HW.10.txt",header= T)
Glimpse of the Dataset :-
1.What is the median population age for the countries which voted to take action against global warming? (round to 2 decimal places)
Ans:- 35.78
R- Output
2.Run a logistic regression including all predictor variables. Which predictors are significant in this model?
ANS :-The predictors educ and age are significant with a p value lower than .05
lrfit = glm(BL$change ~ BL$educ + BL$age + BL$gdp + BL$co2, family = binomial)
summary(lrfit)
R- Output
3. What does the negative value for the estimate of educ means?
4 .What is the confidence interval for educ, using profiled log-likelihood? (round to 2 decimal places, and give the lower bound first and the upper bound second, separated by a space)
Ans :- -31.17 -3.03
confint(lrfit)
R- Output
> confint(lrfit)
2.5 % 97.5 %
(Intercept) -7.120294915 7.62212837
BL$educ -31.171217249 -3.03349629
BL$age 0.151757331 0.73814438
BL$gdp -1.677996402 0.67880052
BL$co2 -0.001889047 0.00185111
5 . What is the confidence interval for age, using standard errors? (round to 2 decimal places, and give the lower bound first and the upper bound second, separated by a space)
Ans :- 0.09 0.65
R- Output
confint.default(lrfit)
2.5 % 97.5 %
(Intercept) -7.016319882 6.981360625
BL$educ -27.332984665 -0.119112070
BL$age 0.092877885 0.651260815
BL$gdp -0.986127869 0.835048069
BL$co2 -0.001910404 0.001148391
6. Compare the present model with a null model. What is the difference in deviance for the two models? (round to 2 decimal places)
Ans :- 16.30
R- Output
> with(lrfit, null.deviance - deviance)
[1] 16.30328
7. How many degrees of freedom are there for the difference between the two models?
Ans:- 4
R- Output
> with(lrfit, df.null - df.residual)
[1] 4
8.Is the p-value for the difference between the two models significant?
Ans:- Yes
R- Output
> with(lrfit, pchisq(null.deviance-deviance, df.null-df.residual, lower.tail = FALSE))
[1] 0.002638074 ---- this indicates that it is significant
9.Do chi-squared values differ significantly if you drop educ as a predictor in the model?
Ans :- Yes
R- Output
> wald.test(b = coef(lrfit), Sigma = vcov(lrfit), Terms = 2)
Wald test:
----------
Chi-squared test:
X2 = 3.9, df = 1, P(> X2) = 0.048
10. What is the percentage of cases that can be classified correctly based on our model?
Ans:- 81
R- Output
> ClassLog(lrfit, BL$change)
$rawtab
resp
0 1
FALSE 6 2
TRUE 3 16
$classtab
resp
0 1
FALSE 0.6666667 0.1111111
TRUE 0.3333333 0.8888889
$overall
[1] 0.8148148
Subscribe to:
Posts (Atom)