Friday, August 10, 2012

To remove Carriage Return and/or Linefeed characters

Couple days ago, one of my programs, which used to compare two sets of SAS datasets, gave me warnings with discrepancies found. I checked the data manually, and figured out that the issues were because of one set of SAS datasets containing CR/LF characters.

I did use function STRIP to remove extra blank chars. I thought that SAS functions, like STRIP, TRIM, CATS, etc., could remove blank characters, like space or CR/LF. Actually, they are not.

Since CR is '0D'x, and LF is '0A'x, there are at least two functions that will do the job.
1) compress(source_text, '0D0A'x);
2) prxchange('s/(\n|\r)//', -1, source_text);

Thursday, June 14, 2012

To calculate SAS program run time

Sometimes, I wanted to compare the efficiency between two approaches. I used following codes to get the program execution run-time:
 
%let _sdtm=%sysfunc(datetime());
*** SAS codes ***;
%let _edtm=%sysfunc(datetime());
%let _runtm=%sysfunc(putn(&_edtm - &_sdtm, 12.4));
%put It took &_runtm second to run the program;

It is also very easy to print the date and time:
%put %sysfunc(putn(&_sdtm, datetime20.));

Friday, June 8, 2012

Importing/exporting to XML file

I used to save metadata in SAS datasets so that I could check received data, or do something else. Usually, I save the metadata with data so that I don't create any new libname. But, other users may not want to see the metadata mixed with real data in SAS datasets.

I don't like to export metadata as a MS Excel sheet. MS Excel may change the original data, and some data structure information will be lost during PROC IMPORT/EXPORT. When the Excel sheet is imported back to SAS dataset, it could be significantly changed comparing to the original dataset.

I may also use plain text file, like CSV, ASCII files. It usually needs more customized codes.

One of solutions is using XML file. Below are the sample codes:

LIBNAME myxml XML "path\file_name.xml" XMLMETA=SCHEMADATA XMLTYPE=MSACCESS;

In the codes, XMLMETA=SCHEMADATA makes the XML file contain both metadata and data; XMLTYPE=MSACCESS is best XMLTYPE that will preserve the original data.

Using "PROC COMPARE" to compare the original dataset and dataset imported from XML file, the variables' attributes are almost identical, except format and informat. For data, I only found that datetime values are coverted to INT when datasets are exported to XML file by now. In most of cases, this works fine for me.

Wednesday, June 6, 2012

To select desired outputs in one PDF file


Recently, I was asked to generate a series of correlation coefficient plots (Scatter Plots). I put them in one big PDF file. Basically, the codes are like:

ods pdf file="***.pdf";
... codes for generating plots ...
ods pdf close;

Today, I was asked to add p-Values in those plots. To get p-Values, I have to use “ODS OUTPUT” (true?) to catch it by “PROC CORR”. The codes are like below:

ods output PearsonCorr=out1;
proc corr data=dataset_name PEARSON;
var x y;
run;
ods output off;

Basically, I just want the p-Value in dataset out1 from "ods output". After I add the codes, the outputs of “PROC CORR” are added to the PDF file. ouch!

Maybe, I used wrong terms. I could not find any answer from google to solve the issue. I turned back to SAS help. After researches and tests, I found a solution: adding “ods pdf exclude/select;”  to turn off/on the outputs in the PDF file. 

So the codes are like below (highlighted in light-green):

ods pdf file="***.pdf";
... codes for generating plots ...

ods pdf exclude all;
ods output PearsonCorr=out1;
proc corr data=dataset_name PEARSON;
var x y;
run;
*ods output off;
ods pdf select all;
... codes for generating plots
ods pdf close;

"PROC CORR" outputs are not shown in the pdf file. It works perfectly well.

Friday, June 1, 2012

To get obs number of a dataset


There are some approaches to get observation number of a dataset.

I usually use the observation numbers to decide how to generate a TLG. For example, if there is no data, I would generate a table with text as “No observation found…”, instead of no output or a blank page.

Mostly, I would use following codes:
%LET dsid =%sysfunc(OPEN(dataset_name));
%LET nobs =%sysfunc(ATTRN(&dsid, NLOBS));
%LET rc =%sysfunc(CLOSE(&dsid));
It should be good enough for me, because I know the dataset will be there. This means, above codes will not work for SAS views or if a dataset does not exist. If I want to put above codes in more general programs, I may check &dsid if the dataset exists.

I can also use SASHELP.VTABLE to fetch dataset information I want. It usually consumes a little more brain energy to get there, for example, I have to deal more details on libname and memname, and maybe more keystrokes.

If I want to know the observation numbers of subset of a dataset, which means a dataset with WHERE statement, it seems that I have to count the observations one by one. Usually, I use following codes:
%LET nobs=0;
DATA _null_;
    SET dataset_name (WHERE=(where_statement)) end=eof;
    IF eof THEN CALL SYMPUTX("nobs",_n_);
RUN;

It is important to have "%let nobs=0;" to initiate macro variable nobs.

Thursday, May 24, 2012

To flag Baseline records

As an example, the Baseline value for one assessment here is defined as the last non-missing measurement taken on or before the day of the first dosing of study drug.

Different programmers use different approaches to flag Baseline records. I used two steps (using LB domain in SDTM as an example):
1) To sort the assessment results in dataset LB througth PROC SORT by subject assessment date_time...;

PROC SORT DATA=lb;
  BY usubjid lbcat lbscat lbtestcd DESCENDING lbdtc;
RUN;

2) To flag the Baseline value with ABLFL='Y'.

 DATA lb(drop=flag); ** drop the temporary variable **;
  SET lb;
  BY usubjid lbcat lbscat lbtestcd DESCENDING lbdtc;
   ** variable flag is used to identify if one test has ABLFL marked **;
  RETAIN flag;
   ** flag=0 indicating there is no ABLFL marked yet. **; 
  IF first.lbtestcd THEN flag=0;
  ** you can add more conditions in IF logic expressions. **;
  ** TRTSDT below has the first dosing dates. **;
  IF flag=0 and LBDT <= TRTSDT and LBSTRESN ne .THEN DO;
    ABLFL='Y'; ** ABLFL='Y' indicates the baseline record. **;
    flag=1; ** flag=1 indicating the ABLFL is marked. **;
  END;
RUN; 
 

Friday, May 11, 2012

pros and cons in IFN/IFC function

After one of my colleagues told me about IFN/IFC functions, I addicted to use them everywhere until I found some issues. Now, I still use them a lot but more carefully.

IFN and IFC functions have same syntax, only that IFN will return a numeric value and IFC will return a character value.

For an example, syntax of IFC function in SAS Help:
IFC(logical-expression, value-returned-when-true, value-returned-when-false <,value-returned-when-missing>) 

IFC evaluates the first argument, logical-expression. If logical-expression is true (that is, not zero and not missing), then IFC returns the value in the second argument. If logical-expression is a missing value, and you have a fourth argument, then IFC returns the value in the fourth argument. Otherwise, if logical-expression is false, IFC returns the value in the third argument.

It is similar to IF function in MS Excel, or conditional operator in other language. Actually, it is even better, because it could handle when logical expression is a missing value.

IFN/IFC functions are very useful for encoding, decoding, and flagging values.Like most of DATA step functions, they can be used in WHERE statement to simplify logic expressions, or in SQL statements and make PROC SQL more powerful. They will also save a lot of typing and SAS codes usually will be more neat and clean.

Two issues so far I found:
1) It seems that logical expression does not function like IF-THEN-ELSE statement.

For example, we have following IF-THEN-ELSE statement which will work fine.
a=0; b=10;
IF a ne 0 and b/a>0.1 THEN   c="b/a>0.1";

You will get c=" " without any complaints. I think, when SAS system finds a=0 and the logical expression is
"a ne 0 and ...", it will stop checking the rest of logical expression. So it won't reach b/a which is 10/0.


If you use IFC:
c=IFC(a ne 0 and b/a>0.1, "b/a>0.1", " ");
You will get error message:
NOTE: Division by zero detected at line 17 column 29.
a=0 b=10 c=  _ERROR_=1 _N_=1


2) It seems that all arguments are evaluated before it applies the function algorithm.
For example,
data _null_;
   a=0;
   b=ifn(a ne 0, 10/a, .);
   put a b;
run;

I expected that it would evaluate the logical expression and give me missing value directly. It actually gives me error message.
NOTE: Division by zero detected at line 3 column 17.
a=0 b= _ERROR_=1 _N_=1

If you use IF-THEN-ELSE statement, you will not get such message.

Another trick in IFC (I think it is a good) is that if length of the variable is not pre-defined, IFC will return a character value with length=200.
For example, a_var=IFC(1 = 2, 'TRUE', 'FALSE');
We may think, 'TRUE' has 4 characters, will 'FALSE" be truncated, and a_var="FALS"?
Actually, we will get a_var with length=200 and value="FALSE".