Pages

Wednesday, May 15, 2013

concepts on DLM, DSD

The contents of the SAS data set PERM.JAN_SALES are listed below:

VARIABLE NAME TYPE
idnum character variable
sales_date numeric date value
A comma delimited raw data file needs to be created from the PERM.JAN_SALES data set. The SALES_DATE values need to be in
a MMDDYY10 form.
Which one of the following SAS DATA steps correctly creates this raw data file?

A. libname perm 'SAS-data-library';
data _null_;
set perm.jan_sales;
file 'file-specification' dsd = ',';
put idnum sales_date : mmddyy10.;
run;

B. libname perm 'SAS-data-library';
data _null_;
set perm.jan_sales;
file 'file-specification' dlm = ',';
put idnum sales_date : mmddyy10.;
run;

C. libname perm 'SAS-data-library';
data _null_;
set perm.jan_sales;
file 'file-specification';
put idnum sales_date : mmddyy10. dlm = ',';
run;

D. libname perm 'SAS-data-library';
data _null_;
set perm.jan_sales;
file 'file-specification';
put idnum sales_date : mmddyy10. dsd = ',';
run;
The correct answer is: B
concepts :-
DSD = ',' is invalid because by default DSD is a comma. If you use DSD alone it would work.

First, in put statement, $ sign for character variable, Idnum is not needed.
Ans, A is not correct as the default delimeter (,) for DSD is defined as dsd = ','. It is correct if it was used as: DSD
Ans C and D are not correct because of using the options dlm and dsd in PUT statement. They are the INFILE options

Tuesday, May 7, 2013

SAS interview questions

Following are the most frequent n favourite question of SAS Interviewers :- 

1.What SAS statements would you code to read an external raw data file to a DATA step? 
2.How do you read in the variables that you need? 
3.Are you familiar with special input delimiters? How are they used? 
4.If reading a variable length file with fixed input, how would you prevent SAS from reading the next record if the last variable didn’t have a value? 
5.What is the difference between an informat and a format? Name three informats or formats. 
6.Name and describe three SAS functions that you have used, if any? 
7.How would you code the criteria to restrict the output to be produced? 
8.What is the purpose of the trailing @? The @@? How would you use them? 
9.Under what circumstances would you code a SELECT construct instead of IF statements? 
10.What statement do you code to tell SAS that it is to write to an external file? What statement do you code to write the record to the file? 
11.If reading an external file to produce an external file, what is the shortcut to write that record without coding every single variable on the record? 
12.If you’re not wanting any SAS output from a data step, how would you code the data statement to prevent SAS from producing a set? 
13.What is the one statement to set the criteria of data that can be coded in any step? 
14.Have you ever linked SAS code? If so, describe the link and any required statements used to either process the code or the step itself. 
15.How would you include common or reuse code to be processed along with your statements? 
16.When looking for data contained in a character string of 150 bytes, which function is the best to locate that data: scan, index, or indexc? 
17.If you have a data set that contains 100 variables, but you need only five of those, what is the code to force SAS to use only those variable? 
18.Code a PROC SORT on a data set containing State, District and County as the primary variables, along with several numeric variables. 
19.How would you delete duplicate observations? 
20.How would you delete observations with duplicate keys? 
21.How would you code a merge that will keep only the observations that have matches from both sets. 
22.How would you code a merge that will write the matches of both to one data set, the non-matches from the left-most data set to a second data set, and the non-matches of the right-most data set to a third data set. 
23.What is the Program Data Vector (PDV)? What are its functions? 
24.Does SAS ‘Translate’ (compile) or does it ‘Interpret’? Explain. 
25.At compile time when a SAS data set is read, what items are created? 
26.Name statements that are recognized at compile time only? 
27.Identify statements whose placement in the DATA step is critical.

Sunday, May 5, 2013

Winnners


The reason most people never reach their goals is that they don't define them, learn about them, or even seriously consider them as believable or achievable. Winners can tell you where they are going, what they plan to do along the way, and who will be sharing the adventure with them.

Big Data History


E-commerce, in particular, has exploded data management challenges along three
dimensions: volumes, velocity and variety.

On Volume: 
The lower cost of e-channels enables and enterprise to offer its goods or services to more
individuals or trading partners, and up to 10x the quantity of data about an individual
transaction may be collected—thereby increasing the overall volume of data to be
managed.

On Velocity:
E-commerce has also increased point-of-interaction (POI) speed, and consequently the
pace data used to support interactions and generated by interactions

On Variety: 
Through 2003/04, no greater barrier to effective data management will exist than the
variety of incompatible data formats, non-aligned data structures, and inconsistent data
semantics.

Introduction to Big Data



Big Data 

“Big Data is any data that is expensive to manage 
and hard to extract value from.” 


Big Data Now 

“…the necessity of grappling with Big Data, and 
the desirability of unlocking the information hidden 
within it, is now a key theme in all the sciences – 
arguably the key scientiļ¬c theme of our times.” 




Big Data: Three challenges

Volume
– the size of the data

Velocity 
– the latency of data processing relative to
the growing demand for interactivity

Variety 
– the diversity of sources, formats, quality,
structures