Montura Consulting   Research & Development
Data Relationship

Terminology

Object:: created when SAS compiles SCL source code.

Instance: a non-executable object becomes an executable instance when loaded into memory.

Data Relationship Theory

Physical Independence between Programs

A relationship is a dynamic connection created between instances that are in computer memory. Think of this as a temporary data channel that only exists while the program is executing. The connection can only be performed in-memory because the operation relies on unique instance identifiers (object program identifier) that is created by the system as each object is loaded into memory.

Physical independence means two things for SAS programmers.

  • There is no need for macro variables.
  • There is no need for named or positional macro parameters.

Automation

Data relationships are based on automatic connections between local and global properties that can only be created after software has been loaded into memory. Each program in the application can be programmed to predetermine the data it needs is present or absent. This means each program can signal the application to stop processing immediately when that same condition would normally cause the entire application to crash. The first step is to verify the expected data containers are present. The second step is to verify that data from the correct SAS dataset is inside.

STEP #1: Verify global elements are present.

    • Positive/Negative message for the global data channel.
    • Positive/Negative message for data elements within the global channel.
public list gDatasetRow    / (sendEvent='N', autocreate='N');                                                 
public list message / (sendEvent='N');
public list requiredColumn / (initialValue={'CUSTOMERID', 'DATE', 'AMOUUNT'});

step1: method / (description='Verify global data channels are present');
dcl num dataFound;

if nameditem(envlist('G'), 'datasetRow')
then dataFound=1;
else dataFound=0;

if dataFound then gDatasetRow=getniteml(envlist('G'), 'datasetRow');

if dataFound then insertc(message, 'CONFIRMED: global data vector DATASETROW is present');
else insertc(message, 'ERROR: this program requires a global data vector named DATASETROW');
endmethod;

runIteration: method / (description='Verify expected data elements are present');
dcl num xColumn;
dcl char columnName;

do xColumn=1 to listlen(requiredColumn);
columnName=getitemc(requiredColumn, xColumn);

if nameditem(gDatasetRow, columnName)
then insertn(message, 'CONFIRMED: '||columnName||' is present', -1);
else insertn(message, 'ERROR: this program requires a data element nanmed '||columnName, -1);
end;
endmethod;

STEP #2: Use a local property (gDatasetRow) to access one variable (tradedate) in the global channel (datasetRow).

interface1: method;
    dcl num xParameter tradeDate;
    dcl num tradeDate=getnitemn(gDatasetRow, 'tradedate');


    submit continue sql;
    create table work.ponzi as
        select *
        from in.bloomberg
        where traderName='BERNARD MADOFF' and
              tradeDate=&tradeDate;
    quit;
    endsubmit;

    if symgetn('sqlobs')=0 then do;
        submit continue sql;
        insert into ponziDetector set
            traderName='BERNARD MADOFF',
            tradeDate =&tradeDate,
            tradeCount=0;
        quit;
        endsubmit;
    end;
endmethod;

Limitations and Exclusions

  1. Any number of instances may participate in a relationship.
  2. Instances may join, and subsequently leave, a relationship at any time.
  3. A relationship may be implemented across any number of physically separate software applications.

 

ENVLIST - Fast and Easy

ENVLIST is an automatic session-level SAS resource.

The global environment list ENVLIST is an automatic resource that contains data all SAS applications can share during the same SAS session. The data remains in the global environment list until it is explicitly removed or the SAS session terminates. This means that one application can insert data into the global environment list, exit normally, and the next application can use that data. To create a relationship, assign any ENVLIST list property to any instance list property.

Example- Create global data containers.

    public list gName / (initialValue={                                   
'parameter',
'global',
'dataset1',
'dataset2'
});

interface1: method / (description='Create global data vectors');
dcl num xVector;
do xVector=1 to listlen(gVector);
insertl(envlist('G'), makelist(0, 'G'), -1, getitemc(gVector, xVector));
end;
endmethod;

Example - Connect local property to global container.

This example shows how to automatically link four local properties to four global properties during application startup. When this program is loaded into memory the _INIT method executes automatically and this program will have four data relationships.

    public list gParameter / (sendEvent='N', autocreate='N');
public list gGlobal / (sendEvent='N', autocreate='N'); public list gDataset1 / (sendEvent='N', autocreate='N'); public list gDataset2 / (sendEvent='N', autocreate='N');

_init: method / (state='o');
_super();

gParameter=getniteml(envlist('G'), 'parameter');
gGlobal =getniteml(envlist('G'), 'global'); gDataset1 =getniteml(envlist('G'), 'dataset1'); gDataset2 =getniteml(envlist('G'), 'dataset2');
endmethod;

 

HIGH PERFORMANCE COMPUTING

Use this method on the following conditions.

  • Multiple programs need access to data stored in arrays with two or more dimensions.
  • The entire application, or a portion thereof, is replicated several times for parallel processing.
  • Processing speed is a concern.

The idea is to replicate ENVLIST with your own custom-coded object. The implementation is the same. Each list identifier in workspace needs to be assigned to a matching property in each target instance. As before, identifiers become available after software has been loaded into memory so the assignment is still a post-compiler operation.

The graphic below illustrates how the LIST identifier from WORKSPACE functions as original data container. The identifier is transferred to all other programs where it is assigned to a property of the same name. Identifier 4434 then functions as pointer back to the original data container in the program named WORKSPACE.