SAS & Statistics: November 2009

Sunday, November 29, 2009

SAS efficiency

This is a great tutorial on SAS efficiency.

Tuesday, November 24, 2009

** ‘Special’ class domain: DM **;
PROC CDISC MODEL=SDTM;
SDTM       SDTMVERSION= "3.1";
DOMAINDATA DATA=WORK.DM
           DOMAIN=DM
           CATEGORY=SPECIAL;
RUN;

** ‘Events’ class domain: AE **;
PROC CDISC MODEL=SDTM;
SDTM       SDTMVERSION= "3.1";
DOMAINDATA DATA=WORK.AE
           DOMAIN=AE
           CATEGORY=EVENTS;
RUN;

** ‘Interventions’ class domain: CM **;
PROC CDISC MODEL=SDTM;
SDTM       SDTMVERSION= "3.1";
DOMAINDATA DATA=WORK.CM
           DOMAIN=CM
           CATEGORY=INTERVENTIONS;
RUN;

** ‘Findings’ class domain: IE **;
PROC CDISC MODEL=SDTM;
SDTM       SDTMVERSION= "3.1";
DOMAINDATA DATA=WORK.IE
           DOMAIN=IE
           CATEGORY=FINDINGS;
RUN;

Link: http://www.lexjansen.com/pharmasug/2008/rs/rs07.pdf
Link: http://support.sas.com/rnd/base/xmlengine/proccdisc/index.html

Monday, November 23, 2009

Startpage in ODS

STARTPAGE=NEVER | NO | NOW | YES
controls page breaks.

NEVER      specifies not to insert page breaks, even before graphics procedures.

CAUTION:
Each graph normally requires an entire page. The default behavior forces a new page after a graphics procedure. STARTPAGE=NEVER turns off that behavior, so specifying STARTPAGE= NEVER might cause graphics to overprint.

NO      specifies that no new pages be inserted at the beginning of each procedure, or within certain procedures, even if new pages are requested by the procedure code. A new page will begin only when a page is filled or when you specify STARTPAGE=NOW.

CAUTION:
Each graph normally requires an entire page. The default behavior forces a new page after a graphics procedure, even if you use STARTPAGE=NO. STARTPAGE=NEVER turns off that behavior, so specifying STARTPAGE= NEVER might cause graphics to overprint.
Alias: OFF
Tip: When you specify STARTPAGE=NO, system titles and footnotes are still produced only at the top and bottom of each physical page, regardless of the setting of this option. Thus, some system titles and footnotes that you specify might not appear when this option is specified.

NOW      forces the immediate insertion of a new page.
Tip: This option is useful primarily when the current value of the STARTPAGE= option is NO. Otherwise, each new procedure forces a new page automatically.

YES      inserts a new page at the beginning of each procedure, and within certain procedures, as requested by the procedure code.
Alias: ON

Default: YES

Link: http://analytics.ncsu.edu/sesug/2007/DP05.pdf
Link: http://www.nesug.org/Proceedings/nesug06/io/io13.pdf

Formdlim option

FORMDLIM='delimiting-character'

'delimiting-character'
specifies in quotation marks a character written to delimit pages. Normally, the delimit character is null, as in this statement:

options formdlim='';

use space will eliminate the delimitor between pages in the listing:

options formdlim=' ';

Data Step Views

Definition of a DATA Step View

A DATA step view is a native view that has the broadest scope of any SAS view. It contains a stored DATA step program that can read data from a variety of sources, including:

raw data files
SAS data files
PROC SQL views
SAS/ACCESS views
DB2, ORACLE, or other DBMS data

Creating DATA Step Views

In order to create a DATA step view, specify the VIEW= option after the final data set name in the DATA statement. The VIEW= option tells SAS to compile, but not to execute, the source program and to store the compiled code in the input DATA step view that is named in the option.

For example, the following statements create a DATA step view named DEPT.A:

libname dept 'SAS-library';

data dept.a / view=dept.a;
... more SAS statements ...
run;

Note that if the SAS view exists in a SAS library, and if you use the same member name to create a new view definition, then the old SAS view is overwritten.

Beginning with Version 8, DATA step views retain source statements. You can retrieve these statements using the DESCRIBE statement. The following example uses the DESCRIBE statement in a DATA step view in order to write a copy of the source code to the SAS log:

data view=inventory;
describe;
run;

For more information on how to create SAS views and use the DESCRIBE statement, see the DATA statement in SAS Language Reference: Dictionary.

What Can You Do with a DATA Step View?

Using a DATA step view, you can do the following:

directly process any file that can be read with an INPUT statement
read other SAS data sets
generate data without using any external data sources and without creating an intermediate SAS data file.

Because DATA step views are generated by the DATA step, they can manipulate and manage input data from a variety of sources including data from external files and data from existing SAS data sets. The scope of what you can do with a DATA step view, therefore, is much broader than that of other types of SAS views.

Differences between DATA Step Views and Stored Compiled DATA Step Programs

DATA step views and stored compiled DATA step programs differ in the following ways:

a DATA step view is implicitly executed when it is referenced as an input data set by another DATA or PROC step. Its main purpose is to provide data, one record at a time, to the invoking procedure or DATA step.
a stored compiled DATA step program is explicitly executed when it is specified by the PGM= option on a DATA statement. Its purpose is usually a more specific task, such as creating SAS data files, or originating a report.

For more information on stored compiled DATA step programs, see Stored Compiled DATA Step Programs.

Restrictions and Requirements

Global statements do not to apply to a DATA step view. Global statements such as the FILENAME, FOOTNOTE, LIBNAME, OPTIONS, and TITLE statements, even if included in the DATA step that created the SAS view, have no effect on the SAS view. If you do include global statements in your source program statements, SAS stores the DATA step view but not the global statements. When the view is referenced, actual execution can differ from the intended execution.

When a view is created, the labels for the variable that it returns are also created. If a DATA step view reads a data set that contains variable labels and a label is changed after the view is created, any procedure output will show the original labels. The view must be recompiled in order for the procedure output to reflect the new variable labels.

If a view uses filerefs or librefs, the fileref or libref that is used is the one that is defined at the time that the view is compiled. This means that if you change the file that is referenced in a fileref that the view uses, the new file is ignored by the view and the file that is referred to by the fileref at the time the view was compiled continues to be used.

Performance Considerations

DATA step code executes each time that you use a DATA step view, which might add considerable system overhead. In addition, you run the risk of having your data change between steps. However, this also means that you get the most recent data available--that is, data when the view is executed as compared to data when the view was compiled.
Depending on how many reads or passes on the data are required, processing overhead increases.
- When one sequential pass is requested, no data set is created. Compared to traditional methods of processing, making one pass improves performance by decreasing the number of input/output operations and elapsed time.
- When random access or multiple passes are requested, the SAS view must build a spill file that contains all generated observations so that subsequent passes can read the same data that was read by previous passes. In some instances, the view SPILL= data set option can reduce the size of a spill file.

Example 1: Merging Data to Produce Reports

If you want to merge data from multiple files but you do not need to create a file that contains the combined data, you can create a DATA step view of the combination for use in subsequent applications.

For example, the following statements define DATA step view MYV9LIB.QTR1, which merges the sales figures in the data file V9LR.CLOTHES with the sales figures in the data file V9LR.EQUIP. The data files are merged by date, and the value of the variable Total is computed for each date.

libname myv9lib 'SAS-library';
libname v9lr 'SAS-library';

data myv9lib.qtr1 / view=myv9lib.qtr1;
merge v9lr.clothes v9lr.equip;
by date;
total = cl_v9lr + eq_v9lr;
run;

The following PRINT procedure executes the view:

proc print data=myv9lib.qtr1;
run;

Example 2: Producing Additional Output Files

In this example, the DATA step reads an external file named STUDENT, which contains student data, and then writes observations that contain known problems to data set MYV9LIB.PROBLEMS. The DATA step also defines the DATA step view MYV9LIB.CLASS. The DATA step does not create a SAS data file named MYV9LIB.CLASS.

The FILENAME and the LIBNAME statements are both global statements and must exist outside of the code that defines the SAS view, because SAS views cannot contain global statements.

Here are the contents of the external file STUDENT:

dutterono  MAT   3
lyndenall  MAT   
frisbee    MAT  94
SCI  95
zymeco     ART  96
dimette         94
mesipho    SCI  55
merlbeest  ART  97
scafernia       91    
gilhoolie  ART 303
misqualle  ART  44
xylotone   SCI  96

Here is the DATA step that produces the output files:

libname myv9lib 'SAS-library';
filename student 'external-file-specification'; 1 

data myv9lib.class(keep=name major credits)
myv9lib.problems(keep=code date) / view=myv9lib.class; 2 
infile student;
input name $ 1-10 major $ 12-14 credits 16-18; 3 
select;  
when (name=' ' or major=' ' or credits=.)
do code=01;
date=datetime();
output myv9lib.problems;
end; 4 
when (0<90)
do code=02;
date=datetime();
output myv9lib.problems;
end; 5 
otherwise
output myv9lib.class;
end;
run; 6

The following example shows how to print the files created previously. The MYV9LIB.CLASS contains the observations from STUDENT that were processed without errors. The data file MYV9LIB.PROBLEMS contains the observations that contain errors.

If the data frequently changes in the source data file STUDENT, there would be different effects on the returned values in the SAS view and the SAS data file:

New records, if error free, that are added to the source data file STUDENT between the time you run the DATA step in the previous example and the time you execute PROC PRINT in the following example, will appear in the SAS view MYV9LIB.CLASS.
On the other hand, if any new records, failing the error tests, were added to STUDENT, the new records would not show up in the SAS data file MYV9LIB.PROBLEM, until you run the DATA step again.

A SAS view dynamically updates from its source files each time it is used. A SAS data file, each time it is used, remains the same, unless new data is written directly to the file.

filename student 'external-file-specification';
libname myv9lib 'SAS-library'; 7 

proc print data=myv9lib.class;
run; 8 

proc print data=myv9lib.problems;
format date datetime18.;
run; 9

	Reference a library called MYV9LIB. Tell SAS where a file that associated with the fileref STUDENT is stored.
	Create a data file called PROBLEMS and a SAS view called CLASS and specify the column names for both data sets.
	Select the file that is referenced by the fileref STUDENT and select the data in character format that resides in the specified positions in the file. Assign column names.
	When data in the column NAME, MAJOR, or CREDITS is blank or missing, assign a code of 01 to the observation where the missing value occurred. Also assign a SAS datetime code to the error and place the information in a file called PROBLEMS.
	When the amount of credits is greater than zero, but less than ninety, list the observations as code 02 in the file called PROBLEMS and assign a SAS datetime code to the observation.
	Place all other observations, which have none of the specified errors, in the SAS view called MYV9LIB.CLASS.
	The FILENAME statement assigns the fileref STUDENT to an external file. The LIBNAME statement assigns the libref MYV9LIB to a SAS library.
	The first PROC PRINT calls the SAS view MYV9LIB.CLASS. The SAS view extracts data on the fly from the file referenced as STUDENT.
	This PROC PRINT prints the contents of the data file MYV9LIB.PROBLEMS. Link: http://www2.sas.com/proceedings/sugi29/067-29.pdf

Saturday, November 21, 2009

Effeciency with SAS

Wednesday, November 11, 2009

“??” FORMAT MODIFIER

DETERMINE IF A CHARACTER STRING CONTAINS ONLY NUMBERS USING THE INPUT FUNCTION AND THE SPECIAL "??" FORMAT MODIFIER

The following excerpt is from SAS OnlineDoc documentation: ? or ??

The optional question mark (?) and double question mark (??) format modifiers suppress the printing of both the error messages and the input lines when invalid data values are read. The ? modifier suppresses the invalid data message. The ?? modifier also suppresses the invalid data message and, in addition, prevents the automatic variable _ERROR_ from being set to 1 when invalid data are read. Below is an example of using ?? to determine whether a variable contains non-numeric values or not:

data _null_;
x = “12345678”;
if (input(x, ?? 8.) eq .) then
put ‘non-numeric’;
else put ‘numeric’;
run;

Running SAS would return “Numeric” in the above example. If we used X=”123a5678”, SAS would return “Non-Numeric”. Note that the input format in the above example is “8.” So only the first 8 bytes of the character string are checked. Thus, X=123456789a would return “Numeric” as it would only be checking the first 8 bytes of the string.

Link: http://www.nesug.org/Proceedings/nesug01/at/at1013.pdf

Sunday, November 8, 2009

Change Position of Variables Using RETAIN

Data a;

      Input x y z;

           Cards;
           1    2    3
           ;
     Run;

Data b;

      Retain z y x;

Set a;
Run;

COUNTW Function (SAS 9.2)

Count Words in a String

proc sort data=sashelp.zipcode(keep=statename)
out=allnames nodupkey;
by statename;
run;
data words;
set allnames;
Words=countw(statename," ");
run;
proc freq data=words;
tables Words;
title1 "Number of Words in U.S. State and Territory Names";
run;

SAS Log

3999   proc sort data=sashelp.zipcode(keep=statename)
4000             out=allnames nodupkey;
4001     by statename;
4002   run;

NOTE: There were 41763 observations read from the data set SASHELP.ZIPCODE.
NOTE: 41707 observations with duplicate key values were deleted.
NOTE: The data set WORK.ALLNAMES has 56 observations and 1 variables.

4003   data words;
4004     set allnames;
4005     Words=countw(statename," ");
4006   run;

NOTE: There were 56 observations read from the data set WORK.ALLNAMES.
NOTE: The data set WORK.WORDS has 56 observations and 2 variables.

4007   proc freq data=words;
4008     tables Words;
4009     title1 "Number of Words in U.S. State and Territory Names";
4010   run;

NOTE: There were 56 observations read from the data set WORK.WORDS.

SAS Listing Output

Number of Words in U.S. State and Territory Names

The FREQ Procedure

Cumulative    Cumulative
Words    Frequency     Percent     Frequency      Percent
----------------------------------------------------------
1          42       75.00            42        75.00
2          12       21.43            54        96.43
3           2        3.57            56       100.00

Thursday, November 5, 2009

Unix Commands

1. basic:

vi temp.sas:

edit mode for temp.sas;

more temp.sas:

view temp.sas:

read only;

rm temp.sas: delete temp.sas [rm ~\$zzz.doc (delete temporary file)];

mkdir test:

create directory called test;

rmdir test:

remove directory test, test needs be emptied first;

rm -r test: 

remove directory test, test doesn't need to be emptied first;

cd ..:

go to the directory one level up;

cd tmpdir:   

go to directory called tmpdir;

cp temp.sas temp1.sas:

copy temp.sas as temp.sas;

2. save+quit:

:ZZ-->at the end of edit line;  or,

:wq;  

:x;

3. quit w/o saving:

:e!  (still in the edit windows)

:q!  (quit edit windows)

4. 
1).             change            delete/cut         copy

__________________________________________

1 letter         r      x                        yl

5 letter         5s                      5x                       5yl

1 word                         cw                      dw                       yw

5 words                      5cw    5dw                     5yw

1 line                         cc                      dd         yy

5 lines                       5cc                    5dd                     5yy

to line begining     c0     d0                       y0

to line end      c$                      d$                       y$

3 words back     3cb                    3db                     3yb

2).

__________________________________________

move to line 6:                                                :6 or 6G

change aaa to bbb on line 2:                       :2s/aaa/bbb;

to the beginning:                                            0;

to the end:                     $;

10 letters lowcase<-->upcase:   10~;

Join 5 lines:                                                    5J;

insert:                                                               i   or    a;

repeat:                         .;

paste:                                                                  p;

top line of screen:             H;

last line of screen:                                      L;

middle line of screen:          M;

find aaassss:                                                    /aaassss;      n;

3). combinations:

delete                 copy                       from cursor to

__________________________________________

dH         yH        top of screen

dL                         yL                           end of screen

dG                         yG                           end of file

d+                        y+                           next line

d13G                     y13G                       13 line

4). recover large delete (upper to 9):

"1p:              recover last deletion

"5p:              recover fifth-to-last deletion

5). name own buffer (upper to 9):

"add:             delete current line and save in buffer a;

"a7dd:            delete 7 lines from current cursor position and save in buffer a;

"a7yy:            copy 7 lines from current cursor position and save in buffer a;

"ap:              paste buffer a information from current cursor position;

6). ex editor:

:3,18d    delete lines 3 through 18;

:3,18m23  move lines 3 through 18 after line 23;

:3,18co23               copy lines 3 through 18 and paste after line 23;

:=        total line number;

:.,$d                       delete current line ( . )  to the end of file ( $ );

:20,.m$                   move line 20 through current line ( . ) to the end ( $ );

:%d                           delete all lines ( % );

:%co$                       copy all lines ( % ) and paste to the end ( $ );

:-,+co0                   copy three lines to the top (0);

:230,$w temp.sas   save 230 to end as temp.sas;

:.,600w temp.sas         save current line to line 600 as temp.sas;

:r /temp/data.sas        copy file data.sas in directory /temp and paste from current cursor position;

:e#      switch two open file back and forth;

:e!      return to original unsaved version;

7). global replacement:

:%s/run/jump/g                   replace all run with jump;

:2,10s/jump/run/g  replace all jump from line 2 to 10 with run;

:2,10s/jump/run/gc           replace all jump from line 2 to 10 with run, but need confirm;

:%s/./\U&/g                          change to -->  UPPERCASE;

:%s/./\L&/g                          change to -->  LOWERCASE;

:.,+5s/$/?/                          add "?" at the end of 6 lines from current line;

:g/^/mo0                                reverse the order;

:v/Paid in Full/s/$/Overdue/     if there is no "Paid in Full" on the line, will append a "   Overdue" at the end of the Line;

:%s/^/ >   /      add " >   " at the beginning of each line;

:%s/^/          /       add "         " at the beginning of each line;

:%s/$/ <   /      add " <   " at the end of each line;

:%s/^  *\(.*\)/\1/  delete all leading space of each line;

:%s/\(.*\)  *$/\1/  delete all ending space of each line;

:g/^$/d                                  delete all blank lines;

8). advanced editing:

:set ic                        search is no case sensitive;

:set noic                      search is case sensitive;

:set window=50            50 lines each screen;

:!date           give date and time;

:r !date                        read date to file;

:r !sort aa.sas          sort aa.sas and paste all from current cursor;

9). print: (c-tasc specific)

hp45 -z1 temp.lst

hp45 -z1 -p17 temp.lst

hp45 temp.lst

10). home/xxx/programs/subdir

grep neopt *.lst |more

11).

cp /home/xxx/programs/subdir/*.*  .

cp ../xxx/*.*  .

compress *

uncompress *

12). 

chmod 751 temp.sas: change read only mode to edit mode;

chmod 770 temp.sas: allow group members to access;

chmod g+s pgmsdir: allow directory access;

13). more memory requisition to run sas:

-memsize 256m -work /lv10/tmp temp.sas

14). 

cal           -->   show calendar;

spell a.sas   -->   show misspelled words;

df            -->   show free space;

who           -->   who is log on;

date --> show current date & time;

15). set up timer for auto run:
crontab -e
00 05 06 11 * /lv06/sas91/sas /home/xxxxx/programs/temp.sas
(close editor using :wq)

The above command will submit a  sas program called temp to run at 5:00AM on November 6. The general format of the crontab command is minute hour day-of-month month weekday (Sunday is 0).

00 15 * * 1-5 /lv03/sas82/sas /home/dlxxx/somejob.sas

4 45 11 28 9 * /usr/local/sas/SASFoundation/9.3/sas /sasdata/ltrc/reports/temp.sas

16). Search for file with a specific name in a set of files (-name)  

find . -name "*conf" -print

This command will search in the current directory and all sub directories for a file ending with conf. 

Note: The -print option will print out the path of  any file that is found with that name.  In general -print will print out the path of any file that meets the find criteria. 

17). Use ls to create a sas file

ls . > temp.sas

This command will create a sas file temp.sas, which includes all the directory names, file names under current directory. Replace directory name with . to create file for other directory. 

18). Use diff to compare two sas files

diff program1.sas program2.sas

SAS & Statistics