SAS & Statistics: January 2011

Thursday, January 20, 2011

Create a SAS dataset from FORMAT library

libname fmts "/temp/formats";

proc format library=fmts cntlout=fmts.fmtout;
run;

Friday, January 14, 2011

Use BYTE function to create special ASCII character, e.g. plus/minus sign

data _null_;
do i=0 to 255;
x=byte(i);
put i= x=;
end;
y=rank('a');
put y=;
run;

BYTE function results:
i=65 -- 90 x=A -- Z

i=97 -- 122 x=a -- z

i=33 x=!
i=34 x="
i=35 x=#
i=36 x=$
i=37 x=%
i=38 x=&
i=39 x='
i=40 x=(
i=41 x=)
i=42 x=*
i=43 x=+
i=44 x=,
i=45 x=-
i=46 x=.
i=47 x=/

i=162 x=¢
i=163 x=£
i=164 x=¤
i=165 x=¥
i=166 x=¦
i=167 x=§
i=168 x=¨
i=169 x=©
i=170 x=ª
i=171 x=«
i=172 x=¬
i=173 x=
i=174 x=®
i=175 x=¯
i=176 x=°
i=177 x=±
i=178 x=²
i=179 x=³
i=180 x=´
i=181 x=µ
i=182 x=¶
i=183 x=•
i=184 x=¸
i=185 x=¹
i=186 x=º
i=187 x=»
i=188 x=¼
i=189 x=½
i=190 x=¾

RANK function results:
y=97

Tuesday, January 4, 2011

LAG function

LAG Function (SAS 9.2 Doc)

*** Use a third variable to assign the value from the previous record;

array one a b c d;
array two e f g h;
do over one;
temp = lag(one);
if first.id = 0 and last.id=1 and two = . then do;
two = temp;
end;
end;

Calculate new variables using the sum of other columns in PROC REPORT

%macro vol(type,desc);

proc report data=fmxx center headline headskip split='/';
column site_id havevis &type.2 &type.3 &type.p1 &type.p2;

define site_id / group id order=internal width=16 left 'Clinical Center'style(header)={cellwidth= 1.2in just=left} style(column)={cellwidth= 1.2in cellheight=0.17in just=left};
define havevis / sum style(header)={cellwidth=.9in} 'All/Visits' center ;
define &type.2 / sum style(header)={cellwidth=.9in} width=15 "Any/&desc" center ;
define &type.3 / sum style(header)={cellwidth=.9in} width=15 "3+ Volumes/for &desc/n" center ;
define &type.p1 / computed width=10 format=5.1 "3+ Volumes/for &desc/% (1)" center;
define &type.p2 / computed width=10 format=5.1 "3+ Volumes/for &desc/% (2)" center;

compute &type.p1;
&type.p1=(&type.3.sum*100)/&type.2.sum;
endcomp;
compute &type.p2;
&type.p2=(&type.3.sum*100)/havevis.sum;
endcomp;
title1 " ";
run;

%mend;

%vol(head, Head);

If data is already summarized, use the following code for compute:

compute &type.p1;
_c5_=(_c4_*100)/_c3_;
endcomp;
compute &type.p2;
_c6_=(_c4_*100)/_c2_;
endcomp;

PROC LOGISTIC odds ratio estimate

I ran into some problems with the interpretation of the parameter estimate of the logistic model recently. When I contacted SAS, here's their response. Please make sure you specify PARAM=GLM option in the CLASS statement when you want the parameterization to the reference cell coding, 'cause the default is EFFECT. Also I attach the syntax for PARAM below, and you can go to online manual for a better view.

==================== From SAS =======================================

The calculation of the Odds Ratios depends upon the parameterization used for the categorical independent variable. By default, Proc LOGISTIC uses effects coding so the odds ratios are not calculated as EXP(estimate). You can change the parameterization to reference cell coding by using the PARAM=GLM option on the CLASS statement. Using this coding does lead to odds ratios being calculated as EXP(estimate). Note that for continuous variables the odds ratios are always calculated as EXP(estimate). Also note that no matter what parameterization is used, the values of the odds ratios are always the same.
For additional details on this topic, as well as setting the reference levels for the CLASS statement please reference the Syntax section (for the CLASS
statement) and the "Odds Ratio Estimation" portion of the Details section of the LOGISTIC chapter in the SAS/STAT User's Guide for V8.

==================== Syntax from SAS online manual =========================

PARAM=keyword
specifies the parameterization method for the classification variable or variables. Design matrix columns are created from CLASS variables according to the following coding schemes. The default is PARAM=EFFECT. If PARAM=ORTHPOLY or PARAM=POLY, and the CLASS levels are numeric, then the ORDER= option in the CLASS statement is ignored, and the internal, unformatted values are used.
EFFECT
specifies effect coding
GLM
specifies less than full rank, reference cell coding; this option can only be used as a global option ORTHPOLY specifies orthogonal polynomial coding POLYNOMIAL | POLY specifies polynomial coding REFERENCE | REF specifies reference cell coding

The EFFECT, POLYNOMIAL, REFERENCE, and ORTHPOLY parameterizations are full rank. For the EFFECT and REFERENCE parameterizations, the REF= option in the CLASS statement determines the reference level.

Consider a model with one CLASS variable A with four levels, 1, 2, 5, and 7. Details of the possible choices for the PARAM= option follow.

EFFECT
Three columns are created to indicate group membership of the nonreference levels. For the reference level, all three dummy variables have a value of -1. For instance, if the reference level is 7 (REF=7), the design matrix columns for A are as follows.

Effect Coding
A Design Matrix
1 1 0 0
2 0 1 0
5 0 0 1
7 -1 -1 -1

Parameter estimates of CLASS main effects using the effect coding scheme estimate the difference in the effect of each nonreference level compared to the average effect over all 4 levels.

GLM
As in PROC GLM, four columns are created to indicate group membership. The design matrix columns for A are as follows.

GLM Coding
A Design Matrix
1 1 0 0 0
2 0 1 0 0
5 0 0 1 0
7 0 0 0 1

Parameter estimates of CLASS main effects using the GLM coding scheme estimate the difference in the effects of each level compared to the last level.

ORTHPOLY
The columns are obtained by applying the Gram-Schmidt orthogonalization to the columns for PARAM=POLY. The design matrix columns for A are as follows.

Orthogonal Polynomial Coding
A Design Matrix
1 -1.153 0.907 -0.921
2 -0.734 -0.540 1.473
5 0.524 -1.370 -0.921
7 1.363 1.004 0.368

POLYNOMIAL
POLY
Three columns are created. The first represents the linear term (x), the second represents the quadratic term (x2), and the third represents the cubic term (x3), where x is the level value. If the CLASS levels are not numeric, they are translated into 1, 2, 3, ... according to their sorting order. The design matrix columns for A are as follows.

Polynomial Coding
A Design Matrix
1 1 1 1
2 2 4 8
5 5 25 125
7 7 49 343

REFERENCE
REF
Three columns are created to indicate group membership of the nonreference levels. For the reference level, all three dummy variables have a value of 0. For instance, if the reference level is 7 (REF=7), the design matrix columns for A are as follows.

Reference Coding
A Design Matrix
1 1 0 0
2 0 1 0
5 0 0 1
7 0 0 0

Parameter estimates of CLASS main effects using the reference coding scheme estimate the difference in the effect of each nonreference level compared to the effect of the reference level.

PROC GPLOT options: AUTOHREF and AUTOVREF

Use the AUTOHREF and AUTOVREF options on the PLOT statement of PROC GPLOT to draw reference lines at all major tickmarks. Create an annotate data to draw thicker reference lines at the desired tickmarks.

goptions reset=all;

/* Create sample data */
data a;
input x y;
cards;
1 30
2 15
3 40
4 80
5 35
6 40
7 85
8 75
9 55
10 30
;
run;

/* Create an annotate data set to draw darker reference lines */
/* at X values of 3, 6, and 9; Y values of 20, 50, and 80. */
/* The SIZE variable sets the line thickness. */
data anno;
length function color $ 8;
/* Vertical reference lines */
do x=3 to 9 by 3;
function='move'; xsys='2'; ysys='1';
y=0; output;

function='draw'; xsys='2'; ysys='1';
size=3; line=1; color='black'; y=100; output;
end;

/* Horizontal reference lines */
do y=20 to 80 by 30;
function='move'; xsys='1'; ysys='2';
x=0; output;

function='draw'; xsys='1'; ysys='2';
size=3; line=1; color='black'; x=100; output;
end;
run;

/* Create the graph. */
proc gplot data=a;
plot y*x / haxis=axis1 vaxis=axis2 autohref autovref noframe annotate=anno;

symbol1 i=none v=dot c=blue;
axis1 order=(0 to 12 by 1);
axis2 order=(0 to 100 by 10);
title1 h=1.5 f=swiss 'Reference Lines of Varying Thickness';
run;
quit;

PROC LOGISTIC options: selection=, hierarchy=

An additional option that you should be aware of when using SELECTION= with a model that has the interaction as a possible variable is the HIERARCHY= option. It specifies whether and how the model hierarchy requirement is applied and whether a single effect or multiple effects are allowed to enter or leave the model in one step. You can specify that only CLASS effects, or both CLASS and interval effects, be subject to the hierarchy requirement. The HIERARCHY= option is ignored unless you also specify one of the following options: SELECTION=FORWARD, SELECTION=BACKWARD, or SELECTION=STEPWISE.

Model hierarchy refers to the requirement that, for any term to be in the model, all effects contained in the term must be present in the model. For example, in order for the interaction A*B to enter the model, the main effects A and B must be in the model. Likewise, neither effect A nor B can leave the model while the interaction A*B is in the model.

The keywords you can specify in the HIERARCHY= option are described as follows:

NONE
Model hierarchy is not maintained. Any single effect can enter or leave the model at any given step of the selection process.

SINGLE
Only one effect can enter or leave the model at one time, subject to the model hierarchy requirement. For example, suppose that you specify the main effects A and B and the interaction of A*B in the model. In the first step of the selection process, either A or B can enter the model. In the second step, the other main effect can enter the model. The interaction effect can enter the model only when both main effects have already been entered. Also, before A or B can be removed from the model, the A*B interaction must first be removed. All effects (CLASS and interval) are subject to the hierarchy requirement.

SINGLECLASS
This is the same as HIERARCHY=SINGLE except that only CLASS effects are subject to the hierarchy requirement.

MULTIPLE
More than one effect can enter or leave the model at one time, subject to the model hierarchy requirement. In a forward selection step, a single main effect can enter the model, or an interaction can enter the model together with all the effects that are contained in the interaction. In a backward elimination step, an interaction itself, or the interaction together with all the effects that the interaction contains, can be removed. All effects (CLASS and interval) are subject to the hierarchy requirement.

MULTIPLECLASS
This is the same as HIERARCHY=MULTIPLE except that only CLASS effects are subject to the hierarchy requirement.

The default value is HIERARCHY=SINGLE, which means that model hierarchy is to be maintained for all effects (that is, both CLASS and interval effects) and that only a single effect can enter or leave the model at each step.

Create Oracle Tables

libname oradb Oracle User=orauser Password=xxxxxxx Path="@orapth";

proc sql;
create table oradb.dsn as
select * from temp;
quit;
run;

proc append base=oradb.dsn data=temp1 force;
run;

SAS & Statistics