|
Machine readable means the project plan contains PROC
SQL blocks that are designed to be included into downstream SAS programs
in a way that signals success or failure of each written criterion.
Database constraints may be embedded directly into project
plan documents - like CDISC spreadsheets for SDTM and ADAM datasets.
Machine readable specifications remove (or minimize)
the potential for “Errors and Omissions” that are caused by interpretation(s)
and translation(s) of traditional specifications written in plain English.
THE PROJECT PLAN
Traditional SAS
Programming Specification: Documents like Word and Excel contain instructions
for writing SAS programs.
The problem is that any written description (specification) that is
authored by one person, analyzed by a second, and translated into a
SAS program by a third is likely to be interpreted three different ways.
Machine Readable
Specification (MRS): Ready-to-execute functions are embedded in any
formal project planning documents.
- Project Managers write constraint SQL directly into the project
plan.
- SAS programmers include each MRS straight from the project plant,
as if it were a SAS system function (with a little help from utility
code).
The project plan becomes an active control mechanism.
- Add, modify, or remove any constraint to change “how the system”
works, without changing or viewing SAS source code.
- Constraints provide positive and negative result data for each criterion,
at the same time.
- Every possible data condition can identified and tested before SAS
programmers begin coding.
THE NEED
Traditional program flowcharts, SDLC documentation, and
source code comments are a complete waste of time and effort. Ask any
IT manager how often he/she read through source code, hardcopy documentation,
and hardcopy flowcharts and subsequently used that information when formulating
a pay raise, promotion, or bonus.
Since 1950 there has been no reliable way to programmatically determine when a program is (A) working as expected
or (B) is royally screwing up the entire system with bad data.
PLAIN ENGLIGH
IS EASY TO UNDERSTAND\
PLAIN ENGLISH IS EASY
TO GET WRONG
The traditional approach is to hold team meetings until
everyone has the same interpretation of the specification. Programmers
are then free to write and debug SAS programs until the program looks
like it works right.
- Traditional software
specifications eliminate the potential to programmatically determine
when the written specification is correct.
Exclusive use of Plain English specifications is the
wrong choice because…
- Specs are easy to
interpret the wrong way.
- Specs are rarely
complete (or comprehensive) during the planning phase.
- People of different
cultures read the “intent” very differently.
THE IDEAL SOLUTION
The ideal solution would be a SAS procedure that can
determine when data content meets requirements, AND at the same time,
can determine when there is a problem child that can only found when a
programmer traces through thousands of line of SAS code looking for unexpected
data values.
The ideal solution does not exist, yet. But
if it did, this is what it would look like. The SAS programmer would implement
each data validation as a procedure (e.g. text description for rule 2.1
above) where the programmed rule itself is separate and can be included
into any SAS program.
proc magic; create work.positiveTest create work.negativeTest apply to work.nitrate criteria EXCEL[C:\rules\nitrate\2.1]; quit;
Proc magic parameters (for the ideal solution)
- SAS
dataset on which to apply the specification.
- Data
that describes positive results.
- Data
that describes errors and omissions (negative results).
DATABASE CONSTAINTS
NOT PERFECT, BUT GOOD ENOUGH FOR RIGHT NOW
Constraints can be stored in any number for formats because
the code is nothing more than PROC SQL with some minor logic. The best
implementation is a SAS library where all constraints are stored as a
dataset with zero observations.
SAS programmers test each specification by inserting
data into the constraint with another PROC SQL.
- Every observation
that makes passes each constraint test is good.
- Every observation
that fails any constraint is rejected.
Some of the advanced
analytics that take IT projects from whiteboard to server-side production
can be documented before final
budget approval.
EXAMPLE DATA

EXAMPLE CONSTRAINT
Constraints can be programmed to detect every possible
data problem, including complex combinations that affect multiple columns.
Use a series of constraints for better granularity of the error messages
and for physical separation of rule logic.
- Constraint statements have the same
effect as a WHERE clause.
- Any
observation that fails a constraint is rejected.
- The
system applies an implicit AND operator when multiple constraints are
present.
This constraint demonstrates the most commonly used validation
categories.
proc sql; create table validation1 (label='Validation1') ( ID num, mode varchar(1), age num, measurement num, survey1 num, survey2 num, survey3 num, survey4 num, date1 num, date2 num,
constraint x1 unique(ID) message='Duplicate value',
constraint x2 check(mode in ('M' 'F')) message='Valid values',
constraint x3 check(age GE 18 and
age LE 70) message='Numeric Range',
constraint x4 not null(survey1) message='Missing value',
constraint x5 check(date1 LE date2) message='Column comparison',
constraint x6 check(date1 GT "31JAN2009"d) message='Date comparison'
);
quit;
WRITING YOUR FIRST CONSTRAINT
Variable Names
and Type Cast
Names within the constraint should match actual data names used in
the SAS program whenever possible. When this is not possible, just use
the most descriptive name possible. Programmers have to use PROC SQL
INSERT to use the constraint so name matching will be performed regardless.
Names can also be adjusted during development.
Numeric data length is always 8. Lengths for character data must be
specified. Length can be adjusted during development.
Series of Constraints
When multiple RDBMS tables contribute to the dataset, write a constraint
for each inbound data source, then write a constraint for the final
analysis dataset. This encourages a good programming practice - validating
results for each logical step.
Negative Result Constraints
Negative constraints identify data that should NOT be in the analysis
dataset. Additional programming is necessary for this tactic because
ERROR suppression is required because the good data is rejected. 100%
of the data should be rejected which indicates the SAS programs are
performing as expected.
Production Programs
Constraint invocations may be left in the final production program.
A good project management technique is to specify that all “analysis/debug
constraints” are to be included and invoked ONLY when a specific macro
variable is set.
Triggered debug constraints turn a monster SAS application with 30,000
lines of code into a simple series of logical steps. Project managers
are often able to isolate the problem BEFORE calling a programmer.
USING CONSTRAINTS WITH SAS/AF (GUI)
Constraints allow project managers to store criteria for each field
on the GUI separate from the GUI code.
Advantage
Every single criterion in the system can be modified without SAS programming
skills. This approach REMOVES the need to search source code to answer
the single most important question, “How does the system work?”
Advantage
Each criterion can be made to appear as if stored in a Word or Excel
document, while the actual implementation is a SAS dataset constraint.
Here is a code view of a SAS combobox widget used on a SAS/AF screen.
When the Seller Code field changes the system automatically fires and
event name “Seller Code Changed” that is received by this combobox (line
#12). Method runDependency is invoked through the system eventhandler,
which in turn, executes two method blocks.
Dependency1: Data is inserted into the constraint.
Dependency2: Data that passes through the external constraint filter
is loaded duirectly into the combobox.

JUST ANOTHER PAIN IN THE NECK?
Coding and debugging SQL INSERT procedures may require too much technical
expertise for the average SAS user. Call a system developer
to create a utility that generates SQL INSERT code on-the-fly. If good
programming practices are followed, column names in the constraint can
be matched to column names in the analysis dataset for instant comparisons.
Criteria can be applied as if calling a SAS macro.
Woohoo!!
%validateSDTM(data=work.visit, constraint=constrain01);
%validateADAM(data=work.visit, constraint=constrain02);
ADVANTAGE: PROJECT MANAGER
- Most people learn everything there is to know about
SAS dataset constraints in one hour.
- Constraints highlight complex data problems during
the planning stage.
- Constraints catch “problem-child data” that normally
gets through without causing an ERROR, WARNING, or NOTE.
|