Monday, January 18, 2010

Determining the number of observations in a SAS data set efficiently

SAS data sets consist of a descriptor portion and a data portion that contains the data values.The descriptor portion of a SAS data set contains detailed information about the data set. This information includes:
  • The name of the data set and its member type
  • The date and time the data set was created
  • The number of observations
  • The number of variables
  • The engine type
Using the descriptor portion is one of the most efficient and quickest ways of determining the number of observations in a SAS data set.
Here there are some examples:

/* Before SAS V9*/
data _NULL_;
if 0 then set sashelp.class nobs=n;
call symput('nrows',trim(left(put(n,8.))));
stop;
run;
%put nobs=&nrows;
 
/* SAS V9.x and higher */
data _NULL_;
if 0 then set sashelp.class nobs=n;
call symputx('nrows',n);
stop;
run;
%put nobs=&nrows;
  • The "if 0" is a a conditional statement designed to never be executed at run-time and works since the "set sashelp.class" gives the header information of the data set class to the compiler which sets up the pdv accordingly, but is skipped in the execution time because the if condition is false.
  • Nobs is a SAS automatic variable which contains the number of records in the dataset named in the set statement. The code, nobs = n, places the value of nobs (the number of records in the sashlep.class dataset) in the variable n. [NOBS option limitations: 1."The number of observations includes those observations that are marked for deletion but are not yet deleted."2."For certain SAS views, SAS cannot determine the number of observations. In these cases, SAS sets the value of the NOBS= variable to the largest positive integer value that is available in your operating environment.]
  • The STOP statement is used to prevent an endless loop because EOF is never reached.
Many times users don't need to know how many obs there are, just whether there are any. In such cases, a data step like the following might be just what is needed. This technique also has the advantage that it works for views and other data sets where the number of observations is not available in the data set header.

data _null_;
call symput('AnyObs','0'); /* set to 0 for No */
set &data(obs=1);
call symput('AnyObs','1'); /* if got here there are obs so set to 1 for Yes */
run;

The meta data can also be accessed by the macro language using DATA step functions. The ATTRN function is used with the NLOBS argument in the %OBSCNT macro shown below to retrieve the number of non-deleted observations from the meta data.
%macro obscnt(dsn);
%local nobs dsnid;
%let nobs=.;
 
%* Open the data set of interest;
%let dsnid = %sysfunc(open(&dsn));
 
%* If the open was successful get the;
%* number of observations and CLOSE &dsn;
%if &dsnid %then %do;
%let nobs=%sysfunc(attrn(&dsnid,nlobs));
%let rc  =%sysfunc(close(&dsnid));
%end;
%else %do;
%put Unable to open &dsn - %sysfunc(sysmsg());
%end;
 
%* Return the number of observations;
&nobs
%mend obscnt;

No comments: