Accessing Columns and Derivations using Compliance Rules
- 1 Introduction
- 2 Accessing extended design attributes
- 2.1 Constraint
- 2.2 Column
- 2.3 StageVariable
- 2.4 Example
- 3 Inspection of Derivations
- 3.1 Abstract Syntax Tree Vertex Types
- 3.1.1 Expression
- 3.1.2 Constant
- 3.1.3 Identifier
- 3.1.4 Call
- 3.1.5 Unary
- 3.1.6 Infix
- 3.1.7 IfThenElse
- 3.1.8 Substring
- 3.1.9 Field
- 3.1 Abstract Syntax Tree Vertex Types
Introduction
There will be compliance rules which need to access extended design attributes such as:
Columns
Constraints
Transformer Stage Variables
Transformer Loop Conditions
Transformer Loop Variables
As of Compliance version 2.1, all these design attributes are now available as properties within the graph. A new Gremlin Pipe has also been introduced that enables deep inspection of derivations.
Accessing extended design attributes
Columns, Constraints and Transformer Variables are all accessible as properties within the Graph Model. The following diagram shows where these properties are available within Graph models representing Jobs and Shared Containers:
Property Name | Type | Description | Availability |
---|---|---|---|
filterConstraint |
| Represents the derivation(s) used to apply output constraints on stages like Transformers and Lookups | All output Pins which have a constraint defined |
columns | Array of | List of columns on links | All links, list will be empty when no column metadata is defined |
stageVars | Array of | List of all stage variables defined on Transformer Stages | Transformer Stages, list will be empty when no stage variables are defined |
loopCondition |
| The condition used to define loops within transformer Stages | Transformer Stage, is null if no Loop Condition is defined |
loopVars | Array of | List of all loop variables defined on Transformer Stages | Transformer Stage, is null if no Loop Condition is defined |
The properties available for each type are as follows:
Constraint
Property | Type | Description |
---|---|---|
name | String | Its possible for link constraints defined within DataStage to have a name, however it is almost always null |
derivation | String | The constraint condition as entered by the user |
parsedDerivation | String | The same constraint condition after the DataStage designer has parsed it and applied Transforms |
Column
Property | Type | Description |
---|---|---|
name | String | Name of column |
key | Boolean | Is the column a key? A true value means the user has selected box in the column definition screens of DataStage |
sqlType | String | The Type as defined within the column definition screens of DataStage. All type names are defined in UPPER CASE format. NOTE: due to the way DataStage represents column data internally, this field can be null. |
typeCode | String | Internal Type of the column. For parallel, this is the native OSH types |
length | Integer | Maximum Length/Precision of the column, a zero value represents undefined/unbounded |
scale | Integer | Fractional Digits of the column, a |
nullable | Boolean | Is Nullable? |
description | String | Description text provided by the user in the columns definition screens of DataStage |
derivation | String | Derivation for the column. This is usually null for source stages but can also be inaccurate for some Stored Procedure stages |
parsedDerivation | String | The derivation after the DataStage designer has parsed it and applied Transforms |
StageVariable
Property | Type | Description |
---|---|---|
name | String | Name of the stage variable |
initialValue | String | Initial value of the stage variable |
sqlType | String | The Type as defined within the column definition screens of DataStage. All type names are defined in UPPER CASE format. |
precision | Integer | Maximum Length/Precision of the column, a zero value represents undefined/unbounded |
scale | Integer | Fractional Digits of the column, a |
derivation | String | Derivation for the column. |
parsedDerivation | String | The derivation after the DataStage designer has parsed it and applied Transforms |
Example
Implementing a compliance rule which verifies all column names start with “col” would like the following:
item.graph // start with the graph model
.E // select edges
.link.as("link") // retain only link edges and apply a label for later access
.columns // access the columns properties, this is an array
.scatter // use scatter to "unroll" the array and process each column in the array
.sideEffect{ column, pipes -> // process each column, declaring the optional second parameter to get access to previously labeled pipes
if (!it.name.startsWith("col")) { // find any columns which don't start with "col"
// access the edge that we labeled "link" (line 3) using "pipes.link"
compliance.failure("${pipes.link.linkName}.${column.name} does not start with 'col'")
}
}
Inspection of Derivations
Derivations used by Constraints, Columns and StageVariables are available as string values which match the values DataStage developers see while designing jobs and shared containers. However, beyond simple pattern matching, strings representing derivations are difficult to inspect for compliance purposes. The derivations
pipe can be used to parse derivations and construct an Abstract Syntax Tree (AST) which can be inspected as a sub tree of the job graph model.
For example, consider the following derivation:
"Hello " : myLink.FullName[" ", 1, 1] : " there are " : (24 * 7) : " hours in a week"
When processed by the derivations
pipe, the following AST would be produced:
When the derivations
pipe processes a graph Vertex, an AST will be generated for derivations based on the following table
Vertex Type | Properties with derivations which will be parsed | Example |
---|---|---|
Stage |
| item.graph // job/shared container graph model
.V.stage // vertices of type stage
.derivations // generate an ASTs for all stageVars, loopCondition and loopVars
.has('type', 'constant') // process the AST just like the graph model
.comply {"Derivation has constant ${it.value}"}
|
Pin |
|
The derivations
pipe also accepts an optional closure argument which is called for each property with derivations to be parsed. An AST tree will only be constructed when the closure returns true
, this allows properties to be filtered before an AST is constructed.
For example, the following could be used to process only Column
derivations:
Abstract Syntax Tree Vertex Types
Expression
There is always one “expression” type vertex which is the root of the AST tree and will have one immediate child if the expression is non-empty.
Property | Type | Value |
---|---|---|
type | String | “expression” |
derivation | String | Derivation string that the AST was constructed from |
… | … | Inherited from the source of the derivation. For example, as expression created from a
|
Example query for selecting this node:
Constant
Represents a constant String or Number within the derivation and has no children.
Property | Type | Value |
---|---|---|
type | String | “constant” |
value | String | String representing the constant. For numeric values, these will be the number while string value include the quotes as defined within the derivation. Note: Numeric values will always be positive as the negative sign is parsed the negate operator (unary type) |
Example query for selecting this node:
Identifier
Represents all “Identifiers” (such as Parameters, Columns, StageVariables, etc) within the derivation. Identifiers have no children.
Property | Type | Value |
---|---|---|
type | String | “identifier” |
category | String | Describes the type of identifier. Possible values are:
|
name | String | Name of the identifier. Column names a fully qualified with “Link.Column” and System Variables do not include the “@” prefix. |
Example query for selecting all references to Job/Shared Container references:
Call
Represents all function/routine calls. There is one child per argument used for the call, each connecting Edge is labeled as argument
and has an index
property which indicates which argument child represents.
Property | Type | Value |
---|---|---|
type | String | “call” |
category | String | Distinguishes between the different type of calls. Possible values:
|
Example query for selecting all constant conversion strings passed to “Oconv” calls:
Unary
Unary Vertex types represents operations with only one operand. Derivations support unary +
and -
operations but +
is considered a no-op and is not represented in the AST. An example unary operation would be -Link.Column
which negates the value represented by Link.Column
. Unary Vertex types have a single child representing the expression the operand will be applied to.
Property | Type | Value |
---|---|---|
type | String | “unary” |
operator | String | Only |
Example query to select all negated constant expressions:
Infix
Infix Vertex types represent operations with two operands separated by an operator. Addition (5 + 3
) and Concatenation ("Hello" : " world!"
) are two examples of infix operations. Infix operations a grouped into multiple subtypes called a category
, all of which have to children connected by edges: a left
edge and a right
edge.
Arithmetic
Property | Type | Value |
---|---|---|
type | String | “infix” |
category | String | “arithmetic” |
operator | String | Arithmetic operators:
|
String
Property | Type | Value |
---|---|---|
type | String | “infix” |
category | String | “string” |
operator | String | String operators:
|
Relational
Property | Type | Value |
---|---|---|
type | String | “infix” |
category | String | “relational” |
operator | String | Relational operators:
DataStage derivation support multiple symbols for the same operator but they are normalized to the operators show above. For example, using |
Logical
Property | Type | Value |
---|---|---|
type | String | “infix” |
category | String | “logical” |
operator | String | Logical operators:
|
Example query to select all constraints where a column is tested to be <
another value. eg. Link.Column < 5
:
IfThenElse
The IfThenElse Vertex type represents am If <condition> Then <then expression> Else <else expression>
statement within the parsed derivation. It has no properties other than the type
but the children represent the condition
, then
expression and else
condition.
Property | Type | Value |
---|---|---|
type | String | “IfThenElse” |
Example to select all else expressions used in Column derivation:
Substring
Substring Vertex types represent the Substring operator []
which accepts 1 or 2 arguments: Link.Column[3]
or Link.Column[2, 4]
. This vertex type has no property other than type
and will have 2 or 3 children depending on the number of arguments specified:
string
- string expression which the substring is being performed againstlength
- length of the substringstart
- start of the substring operation, will only exist when the two arguments are used with the[]
operator
Field
Field Vertex types represent the Substring operator []
which accepts 3 arguments: Link.Column[“:”, 1, 2]
. This syntax is specific to extracting substrings based on a delimiter, the occurrence to extract and the number to include. This vertex type has no property other than type
but has 4 children:
String
- the string expression which the substring is being performed againstdelimiter
- delimiter that splits the stringoccur
- which occurrence to extractcount
- how many occurrences to extract.
Property | Type | Value |
---|---|---|
type | String | “field” |
Example query to select all delimiters used to extract fields in Column derivations:
© 2015-2024 Data Migrators Pty Ltd.