Document toolboxDocument toolbox

Accessing Columns and Derivations using Compliance Rules

Introduction

There will be compliance rules which need to access extended design attributes such as:

  • Columns

  • Constraints

  • Transformer Stage Variables

  • Transformer Loop Conditions

  • Transformer Loop Variables

As of Compliance version 2.1, all these design attributes are now available as properties within the graph. A new Gremlin Pipe has also been introduced that enables deep inspection of derivations.

Accessing extended design attributes

Columns, Constraints and Transformer Variables are all accessible as properties within the Graph Model. The following diagram shows where these properties are available within Graph models representing Jobs and Shared Containers:

Property Name

Type

Description

Availability

Property Name

Type

Description

Availability

filterConstraint

Constraint

Represents the derivation(s) used to apply output constraints on stages like Transformers and Lookups

All output Pins which have a constraint defined

columns

Array of Column

List of columns on links

All links, list will be empty when no column metadata is defined

stageVars

Array of StageVariable

List of all stage variables defined on Transformer Stages

Transformer Stages, list will be empty when no stage variables are defined

loopCondition

StageVariable

The condition used to define loops within transformer Stages

Transformer Stage, is null if no Loop Condition is defined

loopVars

Array of StageVariable

List of all loop variables defined on Transformer Stages

Transformer Stage, is null if no Loop Condition is defined

The properties available for each type are as follows:

Constraint

Property

Type

Description

Property

Type

Description

name

String

Its possible for link constraints defined within DataStage to have a name, however it is almost always null

derivation

String

The constraint condition as entered by the user

parsedDerivation

String

The same constraint condition after the DataStage designer has parsed it and applied Transforms

Column

Property

Type

Description

Property

Type

Description

name

String

Name of column

key

Boolean

Is the column a key? A true value means the user has selected box in the column definition screens of DataStage

sqlType

String

The Type as defined within the column definition screens of DataStage. All type names are defined in UPPER CASE format.

NOTE: due to the way DataStage represents column data internally, this field can be null.

typeCode

String

Internal Type of the column. For parallel, this is the native OSH types

length

Integer

Maximum Length/Precision of the column, a zero value represents undefined/unbounded

scale

Integer

Fractional Digits of the column, a -1 value represents undefined

nullable

Boolean

Is Nullable?

description

String

Description text provided by the user in the columns definition screens of DataStage

derivation

String

Derivation for the column. This is usually null for source stages but can also be inaccurate for some Stored Procedure stages

parsedDerivation

String

The derivation after the DataStage designer has parsed it and applied Transforms

StageVariable

Property

Type

Description

Property

Type

Description

name

String

Name of the stage variable

initialValue

String

Initial value of the stage variable

sqlType

String

The Type as defined within the column definition screens of DataStage. All type names are defined in UPPER CASE format.

precision

Integer

Maximum Length/Precision of the column, a zero value represents undefined/unbounded

scale

Integer

Fractional Digits of the column, a -1 value represents undefined

derivation

String

Derivation for the column.

parsedDerivation

String

The derivation after the DataStage designer has parsed it and applied Transforms

Example

Implementing a compliance rule which verifies all column names start with “col” would like the following:

item.graph // start with the graph model .E // select edges .link.as("link") // retain only link edges and apply a label for later access .columns // access the columns properties, this is an array .scatter // use scatter to "unroll" the array and process each column in the array .sideEffect{ column, pipes -> // process each column, declaring the optional second parameter to get access to previously labeled pipes if (!it.name.startsWith("col")) { // find any columns which don't start with "col" // access the edge that we labeled "link" (line 3) using "pipes.link" compliance.failure("${pipes.link.linkName}.${column.name} does not start with 'col'") } }

Inspection of Derivations

Derivations used by Constraints, Columns and StageVariables are available as string values which match the values DataStage developers see while designing jobs and shared containers. However, beyond simple pattern matching, strings representing derivations are difficult to inspect for compliance purposes. The derivations pipe can be used to parse derivations and construct an Abstract Syntax Tree (AST) which can be inspected as a sub tree of the job graph model.

For example, consider the following derivation:

"Hello " : myLink.FullName[" ", 1, 1] : " there are " : (24 * 7) : " hours in a week"

When processed by the derivations pipe, the following AST would be produced:

When the derivations pipe processes a graph Vertex, an AST will be generated for derivations based on the following table

Vertex Type

Properties with derivations which will be parsed

Example

Vertex Type

Properties with derivations which will be parsed

Example

Stage

stageVars

loopCondition

loopVars

item.graph // job/shared container graph model .V.stage // vertices of type stage .derivations // generate an ASTs for all stageVars, loopCondition and loopVars .has('type', 'constant') // process the AST just like the graph model .comply {"Derivation has constant ${it.value}"}

Pin

filterConstraint

columns of output link

The derivations pipe also accepts an optional closure argument which is called for each property with derivations to be parsed. An AST tree will only be constructed when the closure returns true, this allows properties to be filtered before an AST is constructed.

For example, the following could be used to process only Column derivations:

Abstract Syntax Tree Vertex Types

Expression

There is always one “expression” type vertex which is the root of the AST tree and will have one immediate child if the expression is non-empty.

Property

Type

Value

Property

Type

Value

type

String

“expression”

derivation

String

Derivation string that the AST was constructed from

Inherited from the source of the derivation. For example, as expression created from a Constraint will have:

  • name

  • derivation

  • parsedDerivation

Example query for selecting this node:

Constant

Represents a constant String or Number within the derivation and has no children.

Property

Type

Value

Property

Type

Value

type

String

“constant”

value

String

String representing the constant. For numeric values, these will be the number while string value include the quotes as defined within the derivation.

Note: Numeric values will always be positive as the negative sign is parsed the negate operator (unary type)

Example query for selecting this node:

Identifier

Represents all “Identifiers” (such as Parameters, Columns, StageVariables, etc) within the derivation. Identifiers have no children.

Property

Type

Value

Property

Type

Value

type

String

“identifier”

category

String

Describes the type of identifier. Possible values are:

  • column - Column identifier

  • constant - DataStage constant such as “DSJ.ME”

  • parameter - Job/Shared Container parameter

  • macro - DataStage macro such as “DSJobName”

  • linkVariable - Transformer Link Variable such as “outputLink.REJECTED”

  • stageVariable - Transformer Stage/Loop Variable

  • systemVariable - DataStage system variable such as “@TRUE”

  • invalid - A function/routine name was specified but without parenthesis to call it. Caused by syntax errors

  • unknown - An unknown identifier was detected. Caused by syntax errors

name

String

Name of the identifier. Column names a fully qualified with “Link.Column” and System Variables do not include the “@” prefix.

Example query for selecting all references to Job/Shared Container references:

Call

Represents all function/routine calls. There is one child per argument used for the call, each connecting Edge is labeled as argument and has an index property which indicates which argument child represents.

Property

Type

Value

Property

Type

Value

type

String

“call”

category

String

Distinguishes between the different type of calls. Possible values:

  • function - built-in transformer functions

  • routine - built-in server routine

  • custom - An unknown call is assumed to be a custom server/parallel routine

  • invalid - Identifier is called as a function. This is a result of a syntax error.

Example query for selecting all constant conversion strings passed to “Oconv” calls:

Unary

Unary Vertex types represents operations with only one operand. Derivations support unary + and - operations but + is considered a no-op and is not represented in the AST. An example unary operation would be -Link.Column which negates the value represented by Link.Column. Unary Vertex types have a single child representing the expression the operand will be applied to.

Property

Type

Value

Property

Type

Value

type

String

“unary”

operator

String

Only - operators will appear in the AST and represent a “negate” operation.

Example query to select all negated constant expressions:

Infix

Infix Vertex types represent operations with two operands separated by an operator. Addition (5 + 3) and Concatenation ("Hello" : " world!") are two examples of infix operations. Infix operations a grouped into multiple subtypes called a category, all of which have to children connected by edges: a left edge and a right edge.

Arithmetic

Property

Type

Value

Property

Type

Value

type

String

“infix”

category

String

“arithmetic”

operator

String

Arithmetic operators:

  • + - Addition

  • - - Subtraction

  • * - Multiplication

  • / - Division

  • ^ - Exponent

String

Property

Type

Value

Property

Type

Value

type

String

“infix”

category

String

“string”

operator

String

String operators:

  • : - concatenation

  • Matches - Server specific pattern matching operator

Relational

Property

Type

Value

Property

Type

Value

type

String

“infix”

category

String

“relational”

operator

String

Relational operators:

  • = - equal

  • <> - not equal

  • > - greater than

  • >= - greater than equal

  • < - less than

  • <= - less than equal

DataStage derivation support multiple symbols for the same operator but they are normalized to the operators show above. For example, using 4 # 2 is normalized to 4 <> 2 within the AST

Logical

Property

Type

Value

Property

Type

Value

type

String

“infix”

category

String

“logical”

operator

String

Logical operators:

  • and

  • or

Example query to select all constraints where a column is tested to be < another value. eg. Link.Column < 5:

IfThenElse

The IfThenElse Vertex type represents am If <condition> Then <then expression> Else <else expression> statement within the parsed derivation. It has no properties other than the type but the children represent the condition, then expression and else condition.

Property

Type

Value

Property

Type

Value

type

String

“IfThenElse”

Example to select all else expressions used in Column derivation:

Substring

Substring Vertex types represent the Substring operator [] which accepts 1 or 2 arguments: Link.Column[3] or Link.Column[2, 4]. This vertex type has no property other than type and will have 2 or 3 children depending on the number of arguments specified:

  • string - string expression which the substring is being performed against

  • length - length of the substring

  • start - start of the substring operation, will only exist when the two arguments are used with the [] operator

Field

Field Vertex types represent the Substring operator [] which accepts 3 arguments: Link.Column[“:”, 1, 2]. This syntax is specific to extracting substrings based on a delimiter, the occurrence to extract and the number to include. This vertex type has no property other than type but has 4 children:

  • String - the string expression which the substring is being performed against

  • delimiter - delimiter that splits the string

  • occur - which occurrence to extract

  • count - how many occurrences to extract.

Property

Type

Value

Property

Type

Value

type

String

“field”

Example query to select all delimiters used to extract fields in Column derivations:

 

© 2015-2024 Data Migrators Pty Ltd.