Montura Consulting   Research & Development
Constraints are Machine Readable Specfications

Machine readable means the project plan contains PROC SQL blocks that are designed to be included into downstream SAS programs in a way that signals success or failure of each written criterion.

Database constraints may be embedded directly into project plan documents - like CDISC spreadsheets for SDTM and ADAM datasets.

Machine readable specifications remove (or minimize) the potential for “Errors and Omissions” that are caused by interpretation(s) and translation(s) of traditional specifications written in plain English.

THE PROJECT PLAN

Traditional SAS Programming Specification: Documents like Word and Excel contain instructions for writing SAS programs.

The problem is that any written description (specification) that is authored by one person, analyzed by a second, and translated into a SAS program by a third is likely to be interpreted three different ways.

Machine Readable Specification (MRS): Ready-to-execute functions are embedded in any formal project planning documents.

    • Project Managers write constraint SQL directly into the project plan.  
    • SAS programmers include each MRS straight from the project plant, as if it were a SAS system function (with a little help from utility code).

The project plan becomes an active control mechanism.

  • Add, modify, or remove any constraint to change “how the system” works, without changing or viewing SAS source code.
  • Constraints provide positive and negative result data for each criterion, at the same time.
  • Every possible data condition can identified and tested before SAS programmers begin coding.

THE NEED

Traditional program flowcharts, SDLC documentation, and source code comments are a complete waste of time and effort. Ask any IT manager how often he/she read through source code, hardcopy documentation, and hardcopy flowcharts and subsequently used that information when formulating a pay raise, promotion, or bonus.

Since 1950 there has been no reliable way to programmatically determine when a program is (A) working as expected or (B) is royally screwing up the entire system with bad data.

PLAIN ENGLIGH IS EASY TO UNDERSTAND\
PLAIN ENGLISH IS EASY TO GET WRONG

The traditional approach is to hold team meetings until everyone has the same interpretation of the specification. Programmers are then free to write and debug SAS programs until the program looks like it works right.

  • Traditional software specifications eliminate the potential to programmatically determine when the written specification is correct.

Exclusive use of Plain English specifications is the wrong choice because…

  • Specs are easy to interpret the wrong way.
  • Specs are rarely complete (or comprehensive) during the planning phase.
  • People of different cultures read the “intent” very differently.

THE IDEAL SOLUTION

The ideal solution would be a SAS procedure that can determine when data content meets requirements, AND at the same time, can determine when there is a problem child that can only found when a programmer traces through thousands of line of SAS code looking for unexpected data values.

The ideal solution does not exist, yet. But if it did, this is what it would look like. The SAS programmer would implement each data validation as a procedure (e.g. text description for rule 2.1 above) where the programmed rule itself is separate and can be included into any SAS program.

proc magic;
    create work.positiveTest
    create work.negativeTest
    apply to work.nitrate
    criteria
        EXCEL[C:\rules\nitrate\2.1];
quit;

Proc magic parameters (for the ideal solution)

  • SAS dataset on which to apply the specification.
  • Data that describes positive results.
  • Data that describes errors and omissions (negative results).

DATABASE CONSTAINTS
NOT PERFECT,  BUT GOOD ENOUGH FOR RIGHT NOW

Constraints can be stored in any number for formats because the code is nothing more than PROC SQL with some minor logic. The best implementation is a SAS library where all constraints are stored as a dataset with zero observations.

SAS programmers test each specification by inserting data into the constraint with another PROC SQL.

  • Every observation that makes passes each constraint test is good.
  • Every observation that fails any constraint is rejected. 

Some of  the advanced analytics that take IT projects from whiteboard to server-side production can be documented before final budget approval.

EXAMPLE DATA

EXAMPLE CONSTRAINT

Constraints can be programmed to detect every possible data problem, including complex combinations that affect multiple columns. Use a series of constraints for better granularity of the error messages and for physical separation of rule logic.

  • Constraint statements have the same effect as a WHERE clause.
  • Any observation that fails a constraint is rejected.  
  • The system applies an implicit AND operator when multiple constraints are present.  

This constraint demonstrates the most commonly used validation categories.

proc sql;
create table validation1 (label='Validation1') (
ID  num,
mode  varchar(1),
age  num,
measurement num,
survey1  num,
survey2  num,
survey3  num,
survey4  num, 
date1  num,
date2  num,
    constraint x1 unique(ID)                   message='Duplicate value', 
    constraint x2 check(mode in ('M' 'F'))     message='Valid values',
    constraint x3 check(age GE 18 and
                        age LE 70)             message='Numeric Range',
    constraint x4 not null(survey1)            message='Missing value',
    constraint x5 check(date1 LE date2)        message='Column comparison',
    constraint x6 check(date1 GT "31JAN2009"d) message='Date comparison'
);
quit;


WRITING YOUR FIRST CONSTRAINT

Variable Names and Type Cast

Names within the constraint should match actual data names used in the SAS program whenever possible. When this is not possible, just use the most descriptive name possible. Programmers have to use PROC SQL INSERT to use the constraint so name matching will be performed regardless. Names can also be adjusted during development.

Numeric data length is always 8. Lengths for character data must be specified. Length can be adjusted during development.

Series of Constraints

When multiple RDBMS tables contribute to the dataset, write a constraint for each inbound data source, then write a constraint for the final analysis dataset. This encourages a good programming practice - validating results for each logical step.

Negative Result Constraints

Negative constraints identify data that should NOT be in the analysis dataset. Additional programming is necessary for this tactic because ERROR suppression is required because the good data is rejected. 100% of the data should be rejected which indicates the SAS programs are performing as expected.

Production Programs

Constraint invocations may be left in the final production program. A good project management technique is to specify that all “analysis/debug constraints” are to be included and invoked ONLY when a specific macro variable is set.

Triggered debug constraints turn a monster SAS application with 30,000 lines of code into a simple series of logical steps. Project managers are often able to isolate the problem BEFORE calling a programmer.

USING CONSTRAINTS WITH SAS/AF (GUI)

Constraints allow project managers to store criteria for each field on the GUI separate from the GUI code.

Advantage

Every single criterion in the system can be modified without SAS programming skills. This approach REMOVES the need to search source code to answer the single most important question, “How does the system work?”

Advantage

Each criterion can be made to appear as if stored in a Word or Excel document, while the actual implementation is a SAS dataset constraint.

Here is a code view of a SAS combobox widget used on a SAS/AF screen. When the Seller Code field changes the system automatically fires and event name “Seller Code Changed” that is received by this combobox (line #12). Method runDependency is invoked through the system eventhandler, which in turn, executes two method blocks.

Dependency1: Data is inserted into the constraint.

Dependency2: Data that passes through the external constraint filter is loaded duirectly into the combobox.

 

JUST ANOTHER PAIN IN THE NECK?

Coding and debugging SQL INSERT procedures may require too much technical expertise for the average SAS user. Call a system developer to create a utility that generates SQL INSERT code on-the-fly. If good programming practices are followed, column names in the constraint can be matched to column names in the analysis dataset for instant comparisons.

Criteria can be applied as if calling a SAS macro.
Woohoo!!

%validateSDTM(data=work.visit, constraint=constrain01);
%validateADAM(data=work.visit, constraint=constrain02);

ADVANTAGE: PROJECT MANAGER

  • Most people learn everything there is to know about SAS dataset constraints in one hour.
  • Constraints highlight complex data problems during the planning stage.
  • Constraints catch “problem-child data” that normally gets through without causing an ERROR, WARNING, or NOTE.