Sunday, January 31, 2010

Utilize SAS Audit Files to Keep Track of SAS Data Set Updates

You can direct SAS to keep an audit trail of all of the updates made to a SAS data set. This is particularly handy if you need to determine who updated a SAS data set at any given time for a security review or a data integrity audit. The audit trail file allows you to track patterns of when particular data were added, modified, or deleted. You may even be able to fashion your own update roll-back program from the audit file.

Creating an audit trail file for a SAS data set is easily done via the DATASETS Procedure.

proc datasets library=ctemp;
        audit orsales;
        initiate;
run;
quit;

Executing the code results in a SAS audit file named orsales.sas7baud being built in the CTEMP SAS data library. When updates are made to the orsales SAS data set, entries are made to the attendant orsales SAS audit file.

proc contents data=ctemp.orsales(type=audit);
run;

proc print data=ctemp.orsales(type=audit);
run;

You can determine the variables available in your audit file or print the audit file observations using PROC CONTENTS and PROC PRINT, respectively, using the TYPE=AUDIT data set option.

Note that the Member Type in the Attributes section of the CONTENTS output will state "AUDIT", and that you will get a few additional audit-related variables. You can use other SAS procedures, such as PROC REPORT, on the SAS audit file, too.

Sunday, January 24, 2010

Jump start in PROC REPORT

PROC REPORT coding can get verbose, especially if there are a lot of variables requiring DEFINE statements. Here is a little trick which may speed things up a bit.

First, run PROC REPORT against the data set of interest, including the LIST option to list the code in the log and the NOEXEC option to prevent SAS from actually running the code. For example:

proc report data = sashelp.class list noexec ;
run ;

This should appear in the log:

PROC REPORT DATA=SASHELP.CLASS LS=96 PS=55 SPLIT="/" CENTER ;
    COLUMN Name Sex Age Height Weight;
    DEFINE Name / DISPLAY FORMAT= $8. WIDTH=8 SPACING=2 LEFT "Name" ;
    DEFINE Sex / DISPLAY FORMAT= $1. WIDTH=1 SPACING=2 LEFT "Sex" ;
    DEFINE Age / SUM FORMAT= BEST9. WIDTH=9 SPACING=2 RIGHT "Age" ;
    DEFINE Height / SUM FORMAT= BEST9. WIDTH=9 SPACING=2 RIGHT "Height" ;
    DEFINE Weight / SUM FORMAT= BEST9. WIDTH=9 SPACING=2 RIGHT "Weight" ;
RUN;

Copy the code to the program editor, and make any needed changes. Suppose we want average heights and weights by sex and age. Just insert the NOWD option in the PROC statement (assuming that interactive windowing is not needed), comment out the reference to NAME in the COLUMN statement and the DEFINE statement for NAME, and change the first option in each of the remaining DEFINE statements:

PROC REPORT DATA=SASHELP.CLASS LS=96 PS=55 SPLIT="/" CENTER nowd ;
    COLUMN /* Name */ Sex Age Height Weight;
*   DEFINE Name / DISPLAY FORMAT= $8. WIDTH=8 SPACING=2 LEFT "Name" ;
    DEFINE Sex / group FORMAT= $1. WIDTH=1 SPACING=2 LEFT "Sex" ;
    DEFINE Age / group FORMAT= BEST9. WIDTH=9 SPACING=2 RIGHT "Age" ;
    DEFINE Height / mean FORMAT= BEST9. WIDTH=9 SPACING=2 RIGHT "Height" ;
    DEFINE Weight / mean FORMAT= BEST9. WIDTH=9 SPACING=2 RIGHT "Weight" ;
RUN;

The output:

S
e
x Age Height Weight
F 11 51.3 50.5
    12 58.05 80.75
    13 60.9 91
    14 63.55 96.25
    15 64.5 112.25
M 11 57.5 85
    12 60.366667 103.5
    13 62.5 84
    14 66.25 107.5
    15 66.75 122.5
    16 72 150

Cosmetic refinements can be made by changing the WIDTH option for SEX and the FORMAT options for HEIGHT and WEIGHT.

Create candle stick plots

Link: http://support.sas.com/kb/24/914.html

Thursday, January 21, 2010

Use EXIST to check dataset

Use a simple DATA step function named 'EXIST' to check for the existence of a certain SAS data file.
%MACRO D_EXIST(INPUT);
%SYSFUNC(EXIST(&INPUT))
%MEND D_EXIST;
The macro %D_EXIST could be called from within an expression.
%IF %D_EXIST(SASHELP.CLASS) %THEN %DO;
The function EXIST can also be used to find other type of data sets in the directory by changing the types of members that it should look for.

Wednesday, January 20, 2010

Monday, January 18, 2010

Determining the number of observations in a SAS data set efficiently

SAS data sets consist of a descriptor portion and a data portion that contains the data values.The descriptor portion of a SAS data set contains detailed information about the data set. This information includes:
  • The name of the data set and its member type
  • The date and time the data set was created
  • The number of observations
  • The number of variables
  • The engine type
Using the descriptor portion is one of the most efficient and quickest ways of determining the number of observations in a SAS data set.
Here there are some examples:

/* Before SAS V9*/
data _NULL_;
if 0 then set sashelp.class nobs=n;
call symput('nrows',trim(left(put(n,8.))));
stop;
run;
%put nobs=&nrows;
 
/* SAS V9.x and higher */
data _NULL_;
if 0 then set sashelp.class nobs=n;
call symputx('nrows',n);
stop;
run;
%put nobs=&nrows;
  • The "if 0" is a a conditional statement designed to never be executed at run-time and works since the "set sashelp.class" gives the header information of the data set class to the compiler which sets up the pdv accordingly, but is skipped in the execution time because the if condition is false.
  • Nobs is a SAS automatic variable which contains the number of records in the dataset named in the set statement. The code, nobs = n, places the value of nobs (the number of records in the sashlep.class dataset) in the variable n. [NOBS option limitations: 1."The number of observations includes those observations that are marked for deletion but are not yet deleted."2."For certain SAS views, SAS cannot determine the number of observations. In these cases, SAS sets the value of the NOBS= variable to the largest positive integer value that is available in your operating environment.]
  • The STOP statement is used to prevent an endless loop because EOF is never reached.
Many times users don't need to know how many obs there are, just whether there are any. In such cases, a data step like the following might be just what is needed. This technique also has the advantage that it works for views and other data sets where the number of observations is not available in the data set header.

data _null_;
call symput('AnyObs','0'); /* set to 0 for No */
set &data(obs=1);
call symput('AnyObs','1'); /* if got here there are obs so set to 1 for Yes */
run;

The meta data can also be accessed by the macro language using DATA step functions. The ATTRN function is used with the NLOBS argument in the %OBSCNT macro shown below to retrieve the number of non-deleted observations from the meta data.
%macro obscnt(dsn);
%local nobs dsnid;
%let nobs=.;
 
%* Open the data set of interest;
%let dsnid = %sysfunc(open(&dsn));
 
%* If the open was successful get the;
%* number of observations and CLOSE &dsn;
%if &dsnid %then %do;
%let nobs=%sysfunc(attrn(&dsnid,nlobs));
%let rc  =%sysfunc(close(&dsnid));
%end;
%else %do;
%put Unable to open &dsn - %sysfunc(sysmsg());
%end;
 
%* Return the number of observations;
&nobs
%mend obscnt;

Monday, January 11, 2010

Forcing the Concatenation of Data Sets

When one or more variables in the input data set, represented by the DATA= option, are not present in the BASE= data set, or the variables in the input data set are not of the same type as the variables in the BASE= data set, an optional FORCE option can be specified on the PROC APPEND statement to force the concatenation of the data sets. Using the FORCE option also prevents an error from being generated in these situations. The next example illustrates the FORCE option being specified in the concatenation of the TXN data set to the MASTER data set.

PROC APPEND
BASE=master
DATA=txn
FORCE;
RUN;

Friday, January 1, 2010

Systask Statement

You can launch SAS programs from within a SAS program and suspend execution of the launching SAS program until the launched SAS programs complete their execution. Doing this, you can control the execution sequence of your SAS programs if/when you do not want to depend entirely upon an OS's scheduling software (e.g. CRON on UNIX and Linux, Windows Scheduler on Windows, etc.).


For example, in your BIGPROGRAM.SAS, you could have the following code:

systask command "sas your_clever_program1.sas" taskname=program1;
systask command "sas your_clever_program2.sas" taskname=program2;

waitfor _all_ program1 program2;

/* Other SAS code that processes the data sets created by the two afore-mentioned programs */

Note that the _all_ option states that your program is to wait for both program1 _AND_ program2 to complete before resuming execution. As an alternative, you could have coded _any_, which would have your program resume the moment either program1 _OR_ program2 completed execution.

Link: http://support.sas.com/documentation/cdl/en/hostwin/61924/HTML/default/win-stmt-systask.htm
Link: http://www2.sas.com/proceedings/sugi30/021-30.pdf