Montura Consulting   Research & Development
Data Quality

The SAS System provides "The Power to Know"; however, that power is 100% dependent on the ability to control data quality. Repository Relationship Programming provides Base/SAS programmers with "The Power to Control". For SAS programmers, the major difference between Base/SAS and Object SAS programming is how each section of code is invoked.

The Power to Know - Design Failure

  • Conflicts are common when macro variables are used as a parameter and global variable at the same time.
  • Macro variables are easy to create, hard to trace.
  • Include statements and compiled macros turn SAS program debugging into a problematic, time-consuming, frustrating issue.
  • Major code modification without the original programmer is a real problem; however, data content is changing frequently.

The Power to Control - It's Easy

  • Each object (individual source code) has protected internal variables.
  • Global variable conflicts are easy to resolve.
  • Debugging is simple.
  • EPW (external program worklist) is a key information source that identifies every SAS source code and execution sequence.

QA Work List

  • Test each object (individual source code) or series of dependent objects as single validation.
  • Create data and global parameters that will trigger each pass/fail/ignore result.
  • Compare actual and expected results (data and/or parameters), provide one pass/fail/ignore return code.

SAS Programmer Work List

  • Identify every abnormal condition that will cause immediate application shutdown.
  • Every source code in the application must provide QA with a unique a return code. SAS provides an automatic variable (SYSCC) but it does not provide a detailed level of information that is necessary for QA. Return codes may be categorized as follows:
    • Success
    • Fail
    • Ignore
    • Abnormal Error (syntax, connectivity, external database reject, etc.)
  • Explicitly identify why each ignore condition triggered, include global variable names and values.

SAS Architect Work List

  • Provide a routine that loads each required global variable from a SAS dataset (or some other resource) prior to each QA test execution.
  • Provide a routine that captures every global variable following each QA execution, inserting name/value pairs into a SAS dataset so QA can compare expected values with actual values.

REQUIREMENTS


CURRENT SITUATION

Modify the following SAS application, consisting of five SAS object programs. Add a new validation program following validation1.scl and do NOT rename any existing source code.

  • data.scl
  • validation1.scl
  • < add new code here >
  • validation2.scl
  • validation3.scl
  • qa.scl

OBJECTIVE

Three ENVLIST contain variables used within the application, each ENVLIST is used like a macro data vector.

  • gTrans
  • gProd
  • gMessage

Data.scl loads SAS dates into ENVLIST properties named gTrans and gProd. Variables inside each list can be accessed though class properties of the same name in each SAS object. SAS date values in the inbound data source do not always conform to SLA requirements and the application crashes with fatal errors. SAS programmer will add a new program that validates each SAS date. Identify each variable that fails to meet specifications, and supply the reason it failed. gTrans and gProd may contain between zero and any number of SAS dates. gMessage is empty by default, SAS programmer may insert any number of new result messages.

SAS programmer will identify three types of result.

  • Positive - passed every data quality test.
  • Negative - failed any data quality test.
  • Ignore - test not required.

SPECS

Data in gTrans ENVLIST is new transactional data.

Data in gProd ENVLIST is current production data.

Issue FAIL on the following conditions:

  • Every variable name in gTrans must have a matching variable of the same name in gProd.
  • gProd may contain variables NOT found in gTrans.
  • Transaction variable is NOT numeric.
  • Transaction date is null or missing.
  • Transaction date is more than 5 years in the past.
  • Transaction date is more than 5 years in the future.

General Requirements

  • IGNORE every test when gTrans is empty.
  • Stop validation test execution on any FAIL detect.
  • Issue a FAIL, SUCCESS, or IGNORE result into gResult.
  • Do not add, delete, or modify data in gTrans or gProd

Messages

  • Insert a unique FAIL, SUCCESS or IGNORE code into gResult, name the variable with the value of DESCRIPTION (automatic SAS variable.

Metadata

  • Add a unique code and message text for each FAIL, SUCCESS, IGNORE condition into the message dataset.

SAS OBJECT SOURCE CODE


IGNORE

The following code tests gTrans for zero data elements.

  • Insert IGNORE message into the global gResult list - the actual text description is kept is a SAS dataset..
  • Insert a STOP message into the local iControl list, this will terminate further execution of the current object (program).
  • Any subsequent validations will execute normally.
interface1: method;                                                                
if listlen(gTrans)=0 then do;
insertc(gResult, 'IGNORE0001', -1, description); insertc(iControl, description||' '||_method_, -1, 'STOP'); end;
endmethod;

FAIL

The following code tests the typecast of each variable in gTrans.

  • Insert a FAIL message if any variable is non-numeric variable.
  • Issue a STOP message into global gControl list.
  • The application terminates immediately due to "dirty data" conditions.
interface2: method;                                 
dcl num xItem;
do xItem=1 to listlen(gTrans);
if itemtype(gTrans, xItem) NE 'N' then do;
insertc(gResult, 'TYPECAST001', -1, 'FAIL');
insertc(gControl, description||' '||_method_, -1, 'STOP');
end;
endmethod;

FAIL

The following code tests each date value.

  • Insert a FAIL message if any variable is out of range or is missing.
  • Issue a STOP message into global gControl list.
  • The application terminates immediately due to "dirty data" conditions.
interface3: method;                                 
dcl num xItem transDate;
dcl num minDate=intnx('year', today(), -5); dcl num maxDate=intnx('year', today(), +5);
do xItem=1 to listlen(gTrans); transDate=getitemn(gTrans, xItem); if transDate LT minDate then do; insertc(gResult, 'DIRT001', -1, 'FAIL'); insertc(gControl, description||' '||_method_, -1, 'STOP'); end; if transDate GT maxDate then do; insertc(gResult, 'DIRT002', -1, 'FAIL'); insertc(gControl, description||' '||_method_, -1, 'STOP'); end; if transDate=. then do; insertc(gResult, 'DIRT003', -1, 'FAIL'); insertc(gControl, description||' '||_method_, -1, 'STOP'); end;
end;
endmethod;

FAIL

The following code tests if each variable name in gTrans is also present in gProd.

  • Insert a FAIL message if any variable is absent.
  • Issue a STOP message into global gControl list.
  • The application terminates immediately due to "dirty data" conditions.
interface4: method;                                 
dcl num xItem; dcl char transName;
do xItem=1 to listlen(gTrans); transName=getitemn(gTrans, xItem); if nameditem(gProd, transName)=0 then do; insertc(gResult, 'ABSENT001', -1, 'FAIL'); insertc(iControl, description||' '||_method_, -1, 'STOP'); end;
end;
endmethod;

SUCCESS

You code it.


COPYRIGHT © 1989 - 2011 Montura, Inc.
All rights reserved. This material may not be published, broadcast, rewritten or redistributed.
All material on this website is drawn directly from US Patent Repository Relationship Programming
7,984,422
Call 510-798-8367 to obtain you license for use today.
Violators will be prosecuted and perhaps persecuted with undesirable press release news as well..

Terms & Conditions -- Privacy Policy

.