Showing posts with label Data_step. Show all posts
Showing posts with label Data_step. Show all posts

Friday, June 1, 2012

To get obs number of a dataset


There are some approaches to get observation number of a dataset.

I usually use the observation numbers to decide how to generate a TLG. For example, if there is no data, I would generate a table with text as “No observation found…”, instead of no output or a blank page.

Mostly, I would use following codes:
%LET dsid =%sysfunc(OPEN(dataset_name));
%LET nobs =%sysfunc(ATTRN(&dsid, NLOBS));
%LET rc =%sysfunc(CLOSE(&dsid));
It should be good enough for me, because I know the dataset will be there. This means, above codes will not work for SAS views or if a dataset does not exist. If I want to put above codes in more general programs, I may check &dsid if the dataset exists.

I can also use SASHELP.VTABLE to fetch dataset information I want. It usually consumes a little more brain energy to get there, for example, I have to deal more details on libname and memname, and maybe more keystrokes.

If I want to know the observation numbers of subset of a dataset, which means a dataset with WHERE statement, it seems that I have to count the observations one by one. Usually, I use following codes:
%LET nobs=0;
DATA _null_;
    SET dataset_name (WHERE=(where_statement)) end=eof;
    IF eof THEN CALL SYMPUTX("nobs",_n_);
RUN;

It is important to have "%let nobs=0;" to initiate macro variable nobs.

Thursday, May 24, 2012

To flag Baseline records

As an example, the Baseline value for one assessment here is defined as the last non-missing measurement taken on or before the day of the first dosing of study drug.

Different programmers use different approaches to flag Baseline records. I used two steps (using LB domain in SDTM as an example):
1) To sort the assessment results in dataset LB througth PROC SORT by subject assessment date_time...;

PROC SORT DATA=lb;
  BY usubjid lbcat lbscat lbtestcd DESCENDING lbdtc;
RUN;

2) To flag the Baseline value with ABLFL='Y'.

 DATA lb(drop=flag); ** drop the temporary variable **;
  SET lb;
  BY usubjid lbcat lbscat lbtestcd DESCENDING lbdtc;
   ** variable flag is used to identify if one test has ABLFL marked **;
  RETAIN flag;
   ** flag=0 indicating there is no ABLFL marked yet. **; 
  IF first.lbtestcd THEN flag=0;
  ** you can add more conditions in IF logic expressions. **;
  ** TRTSDT below has the first dosing dates. **;
  IF flag=0 and LBDT <= TRTSDT and LBSTRESN ne .THEN DO;
    ABLFL='Y'; ** ABLFL='Y' indicates the baseline record. **;
    flag=1; ** flag=1 indicating the ABLFL is marked. **;
  END;
RUN; 
 

Friday, May 11, 2012

pros and cons in IFN/IFC function

After one of my colleagues told me about IFN/IFC functions, I addicted to use them everywhere until I found some issues. Now, I still use them a lot but more carefully.

IFN and IFC functions have same syntax, only that IFN will return a numeric value and IFC will return a character value.

For an example, syntax of IFC function in SAS Help:
IFC(logical-expression, value-returned-when-true, value-returned-when-false <,value-returned-when-missing>) 

IFC evaluates the first argument, logical-expression. If logical-expression is true (that is, not zero and not missing), then IFC returns the value in the second argument. If logical-expression is a missing value, and you have a fourth argument, then IFC returns the value in the fourth argument. Otherwise, if logical-expression is false, IFC returns the value in the third argument.

It is similar to IF function in MS Excel, or conditional operator in other language. Actually, it is even better, because it could handle when logical expression is a missing value.

IFN/IFC functions are very useful for encoding, decoding, and flagging values.Like most of DATA step functions, they can be used in WHERE statement to simplify logic expressions, or in SQL statements and make PROC SQL more powerful. They will also save a lot of typing and SAS codes usually will be more neat and clean.

Two issues so far I found:
1) It seems that logical expression does not function like IF-THEN-ELSE statement.

For example, we have following IF-THEN-ELSE statement which will work fine.
a=0; b=10;
IF a ne 0 and b/a>0.1 THEN   c="b/a>0.1";

You will get c=" " without any complaints. I think, when SAS system finds a=0 and the logical expression is
"a ne 0 and ...", it will stop checking the rest of logical expression. So it won't reach b/a which is 10/0.


If you use IFC:
c=IFC(a ne 0 and b/a>0.1, "b/a>0.1", " ");
You will get error message:
NOTE: Division by zero detected at line 17 column 29.
a=0 b=10 c=  _ERROR_=1 _N_=1


2) It seems that all arguments are evaluated before it applies the function algorithm.
For example,
data _null_;
   a=0;
   b=ifn(a ne 0, 10/a, .);
   put a b;
run;

I expected that it would evaluate the logical expression and give me missing value directly. It actually gives me error message.
NOTE: Division by zero detected at line 3 column 17.
a=0 b= _ERROR_=1 _N_=1

If you use IF-THEN-ELSE statement, you will not get such message.

Another trick in IFC (I think it is a good) is that if length of the variable is not pre-defined, IFC will return a character value with length=200.
For example, a_var=IFC(1 = 2, 'TRUE', 'FALSE');
We may think, 'TRUE' has 4 characters, will 'FALSE" be truncated, and a_var="FALS"?
Actually, we will get a_var with length=200 and value="FALSE".