SAS, R, Analytics, Big Data and Me: Stats with R

Case study :- Cognitive training is a rapidly growing market with potential to further expand in the future. Several computerized software programs promoting cognitive improvements have been developed in recent years, with controversial results and implications. In a distinct literature, aerobic exercise has been shown to broadly enhance cognitive functions, in humans and animals. My research group is attempting to bring together these two trends of research, leading to an emerging third approach: designed sport training. Specifically designed sports are an optimal way to combine the benefits of traditional cognitive training and aerobic exercise into a single activity. So, suppose we conducted a training experiment in which subjects were randomly assigned to one of two conditions: Designed sport training (des)  and Aerobic training (aer). Also, assume that we measured both verbal and spatial reasoning before and after training, using four separate measures: • S1 • S2 • V1 • V2. Simulated data are available here. Save the file to your computer and read it into R to complete the assignment and answer the following questions.

Source DataSet :- https://spark- public.s3.amazonaws.com/stats1/datafiles/Stats1.13.HW.03.txt

The data set somewhat looks like this :-

Reading in the dataset in R
data <- read.table("Stats1.13.HW.03.txt",header = T)

1.What is the correlation between S1 and S2 pre-training?
Ans:- 0.49 (rounding to two significant digit )
R- code
> cor(data$S1.pre, data$S2.pre)
[1] 0.4920231

2.What is the correlation between V1 and V2 pre-training?
Ans:- 0.90 (rounding to two significant digit )
R- code
> cor(data$V1.pre, data$V2.pre)
[1] 0.9038863

3. With respect to the measurement of two distinct constructs, spatial reasoning and verbal reasoning, the pattern of correlations pre-training reveals:
Ans :- The pattern of correlations pre- training reveals BOTH Convergent validity and Divergent validity

R- code
> data$V.pre = (data$V1.pre + data$V2.pre)/ 2
> data$S.pre = (data$S1.pre + data$S2.pre)/ 2
> cor(data$S.pre, data$V.pre)
[1] 0.1186354

4.Correlations from the control group could be used to estimate test/retest reliability. If so, which test is most reliable? ---
Ans :- V2

R- code
> data.aer = subset(data, data$cond=="aer")
> cor(data.aer$S1.pre, data.aer$S1.post)
[1] 0.6277946
> cor(data.aer$S2.pre, data.aer$S2.post)
[1] 0.633611
> cor(data.aer$S1.pre, data.aer$S1.post)
[1] 0.6277946
> cor(data.aer$S2.pre, data.aer$S2.post)
[1] 0.633611
> cor(data.aer$V1.pre, data.aer$V1.post)
[1] 0.744725
> cor(data.aer$V2.pre, data.aer$V2.post) - #This test is more reliable
[1] 0.9075993

5 .Does there appear to be a correlation between spatial reasoning before training and the amount of improvement in spatial reasoning?
Ans :- No
(This is because the variables spatial reasoning (data$S.pre) and amount of improvement in spatial reasoning (data$Sgain) are negatively correlated
R- code
> data$S.pre = (data$S1.pre + data$S2.pre) / 2
> data$S.post = (data$S1.post + data$S2.post) / 2
> data$Sgain = data$S.post - data$S.pre
> cor(data$S.pre, data$Sgain)
[1] -0.09280867

6 .Does there appear to be a correlation between verbal reasoning before training and the amount of improvement in verbal reasoning?
Ans :- No
(This is because the variables verbal reasoning (data$V.pre) and improvement in verbal reasoning (data$Vgain) are negatively correlated
R- code
> data$V.pre = (data$V1.pre + data$V2.pre)/ 2
> data$V.post = (data$V1.post + data$V2.post) / 2
> data$Vgain = data$V.post - data$V.pre
> cor(data$V.pre, data$Vgain)

[1] -0.05822132

7.Which group exhibited more improvement in spatial reasoning?
Ans :- des

R- code

8. Create a color scatterplot matrix for all 4 measures at pre-test. Do the scatterplots suggest two reliable and valid constructs?
Ans :- YES
R- code
base <- cbind(data[3], data[4], data[7], data[8])
base.r <- abs(cor(base))
base.color <- dmat.color(base.r)
base.order <- order.single(base.r)
cpairs(base,base.order ,panel.color = base.color,gap = .5,main = "Variables ordered and colored by correlation")

9 Create a color scatterplot matrix for all 4 measures at post-test. Do the scatterplots suggest two reliable and valid constructs?
Ans :- YES
R- code
base <- cbind(data[5], data[6], data[9], data[10])
base.r <- abs(cor(base))
base.color <- dmat.color(base.r)
base.order <- order.single(base.r)
cpairs(base,base.order ,panel.color = base.color,gap = .5,main = "Variables ordered and colored by correlation")

10 What is the major change from pre-test to post-test visible on the color matrix?
Ans : Variance

SAS, R, Analytics, Big Data and Me

Pages

Sunday, October 13, 2013

Stats with R - 3

2 comments: