Dynamic Arrays
Introduction
Dynamic Arrays are a concept from DS Basic that is available for use within Server Job Transformers for which there is no Parallel equivalent. Unlike arrays in most programming languages, Dynamic Arrays in DataStage are not a special type with their own data representation. Instead, they are just strings with special characters called System Marks or System Delimiters which can be interpreted to break the string into a multi-dimensional array of strings.
The table below shows the System Delimiters used by DataStage:
System Variable | Description | ASCII Code Point | Unicode Code Point |
---|---|---|---|
@IM | Item Mark | 0xFF | 0xF8FF |
@FM | Field Mark | 0xFE | 0xF8FE |
@VM | Value Mark | 0xFD | 0xF8FD |
@SM | Sub-value Mark | 0xFC | 0xF8FC |
@TM | Text Mark | 0xFB | 0xF8FB |
@NULL.STR | Null String | 0x80 | 0xF8F7 |
These values are hierarchical, meaning a string can represents a set of Items, each of which can contain Fields, containing Values, containing Sub-values, etc.
The examples below are reproduced from IBM’s documentation but have been reformatted to correctly display Field Marks {FM}
, Value Marks {VM}
and Subvalue marks {SM}
.
The following character string is a dynamic array with two fields:
TOM{SM}DICK{SM}HARRY{VM}BETTY{SM}SUE{SM}MARY{FM}JONES{VM}SMITH
^^
The two fields are:
TOM{SM}DICK{SM}HARRY{VM}BETTY{SM}SUE{SM}MARY
^^
and:
JONES{VM}SMITH
Conceptually, this dynamic array has an infinite number of fields, all of which are empty except the first two. References made to the third or fourth field, for example, return an empty string.
The first field has two values:
and:
The first value has three sub-values: TOM
, DICK
, and HARRY
. The second value also has three sub-values: BETTY
, SUE
, and MARY
.
The second field has two values: JONES
and SMITH
. Each value has one sub-value: JONES
and SMITH
.
Both Strings and Dynamic Arrays in DataStage Server are exactly the same, the only difference is how Transformer Functions interpret the data. For example, a Dynamic Array containing two fields Hello
and World
can be interpreted as a string containing the Field Mark character:
Just as a string contain now System Delimiters:
can be interpreted as a Dynamic Array containing one field Hello World
.
Conversion Challenges
In theory, assuming all Server Transformer Functions are available in the Parallel Engine and references to System Variables such as @FM
were converted to Char(254)
, Server Transformer logic could be converted directly to Parallel without any loss of functionality because Dynamic Arrays are just Strings. However, the Code Points used by System Delimiters would have been chosen to limit the likelihood of them clashing with legitimate characters within any NLS Code Page. As a result, System Delimiters aren't usually valid characters so all Server NLS character encoding routines have built-in support for these characters. Any attempts to use these characters within DataStage Parallel Jobs have a high probability of triggering Invalid Character
warnings which result in data loss. This is especially true when working with Unicode.
The @Null.Str
System Delimiter is the exception and usually maps to the Euro Symbol €
. This is most likely because Code Point `0x80 was not originally assigned to the Euro Symbol when DataStage Server was released. Most NLS code pages only added this symbol when the Euro was launched in 1999.
Since the only way to distinguish between a Dynamic Array and a String within a DataStage Server job is how the String data is interpreted, it is not possible to directly identify how often Dynamic Arrays are used within our research repository of real world Server jobs. The best and only option for understanding the use of System Delimiters is to identify calls to Server Functions which interpret String inputs as Dynamic Arrays.
Conversion Approach
Given the (relatively) low use of Server Functions which support Dynamic Arrays, I don’t think its likely that many DataStage Server users utilize Dynamic Array logic in their jobs. S2PX is also founded on the assumption that a vast majority of calls to the Substrings
function use Dynamic Arrays comprising a single field (i.e. A string without a system delimiter) which is the equivalent of DataStage's substring ([]
) operator.
Unfortunately, this analysis does have some limitations:
Use of Dynamic Arrays might be under represented since we can’t directly identify the use of Dynamic Arrays
Lack of awareness of Dynamic Arrays within DataStage development community could lead to heavy use for small number of DataStage sites that are not represented in our research set
These limitations, combined with the very high difficulty of manually converting jobs which use Dynamic Arrays suggests that it may be unwise to completely dismiss Dynamic Arrays as an unused Server Feature and make it out of scope for Server to Parallel conversion.
Converting superficial use of System Delimiters and Dynamic Arrays
This approach ensures jobs which use System Delimiters such as @IM
and @FM
will compile and run without triggering data loss due to Invalid Character
warnings. The idea is to change the code points used by System Delimiters to rarely-used control characters within the standard 0x00 - 0x7F range. Doing so will prevent data loss and ensure the code point remains the same regardless of whether unicode is used. Below are the code points for System Delimiters:
System Variable | Code Point (Hex) | Code Point(Dec) | ASCII Character Name |
---|---|---|---|
@IM | 0x1F | 31 | Unit Separator |
@FM | 0x1E | 30 | Record Separator |
@VM | 0x1D | 29 | Group Separator |
@SM | 0x1C | 28 | File Separator |
@TM | 0x1A | 26 | Substitute |
@NULL.STR | 0x19 | 25 | End of Medium |
Built-in Server functions which support Dynamic Arrays will be converted by S2PX to equivalent Parallel functions. However, edge cases related to System Delimiters will not produce the same results as Server:
Function | Conversion | Comment |
---|---|---|
Substrings(string, start, length) |
| The disproportionately high use of The naïve translation will handle simple strings but will not produce the expected results when used with a Dynamic Array |
Exchange(string, xx, yy) |
| Will operate correctly unless These are edge cases that will result in incorrect output in parallel when used. |
Extract(dynamic.array, field#) Extract(dynamic.array, field#, value#) Extract(dynamic.array, field#, value#, subvalue#) |
| Will correctly extract Dynamic array fields containing new System Delimiters. Fails on an edge case where the extracted field contains just the |
All other functions will be left unchanged and will result in compile errors.
© 2015-2024 Data Migrators Pty Ltd.