Document toolboxDocument toolbox

Dynamic Arrays

Introduction

Dynamic Arrays are a concept from DS Basic that is available for use within Server Job Transformers for which there is no Parallel equivalent. Unlike arrays in most programming languages, Dynamic Arrays in DataStage are not a special type with their own data representation. Instead, they are just strings with special characters called System Marks or System Delimiters which can be interpreted to break the string into a multi-dimensional array of strings.

The table below shows the System Delimiters used by DataStage:

System Variable

Description

ASCII Code Point

Unicode Code Point

System Variable

Description

ASCII Code Point

Unicode Code Point

@IM

Item Mark

0xFF

0xF8FF

@FM

Field Mark

0xFE

0xF8FE

@VM

Value Mark

0xFD

0xF8FD

@SM

Sub-value Mark

0xFC

0xF8FC

@TM

Text Mark

0xFB

0xF8FB

@NULL.STR

Null String

0x80

0xF8F7

These values are hierarchical, meaning a string can represents a set of Items, each of which can contain Fields, containing Values, containing Sub-values, etc.

The examples below are reproduced from IBM’s documentation but have been reformatted to correctly display Field Marks {FM}, Value Marks {VM} and Subvalue marks {SM}.

The following character string is a dynamic array with two fields:

TOM{SM}DICK{SM}HARRY{VM}BETTY{SM}SUE{SM}MARY{FM}JONES{VM}SMITH ^^

The two fields are:

TOM{SM}DICK{SM}HARRY{VM}BETTY{SM}SUE{SM}MARY ^^

and:

JONES{VM}SMITH

Conceptually, this dynamic array has an infinite number of fields, all of which are empty except the first two. References made to the third or fourth field, for example, return an empty string.

The first field has two values:

and:

The first value has three sub-values: TOM, DICK, and HARRY. The second value also has three sub-values: BETTY, SUE, and MARY.

The second field has two values: JONES and SMITH. Each value has one sub-value: JONES and SMITH.

Both Strings and Dynamic Arrays in DataStage Server are exactly the same, the only difference is how Transformer Functions interpret the data. For example, a Dynamic Array containing two fields Hello and World can be interpreted as a string containing the Field Mark character:

Just as a string contain now System Delimiters:

can be interpreted as a Dynamic Array containing one field Hello World.

Conversion Challenges

In theory, assuming all Server Transformer Functions are available in the Parallel Engine and references to System Variables such as @FM were converted to Char(254), Server Transformer logic could be converted directly to Parallel without any loss of functionality because Dynamic Arrays are just Strings. However, the Code Points used by System Delimiters would have been chosen to limit the likelihood of them clashing with legitimate characters within any NLS Code Page. As a result, System Delimiters aren't usually valid characters so all Server NLS character encoding routines have built-in support for these characters. Any attempts to use these characters within DataStage Parallel Jobs have a high probability of triggering Invalid Character warnings which result in data loss. This is especially true when working with Unicode.

The @Null.Str System Delimiter is the exception and usually maps to the Euro Symbol . This is most likely because Code Point `0x80 was not originally assigned to the Euro Symbol when DataStage Server was released. Most NLS code pages only added this symbol when the Euro was launched in 1999.

Since the only way to distinguish between a Dynamic Array and a String within a DataStage Server job is how the String data is interpreted, it is not possible to directly identify how often Dynamic Arrays are used within our research repository of real world Server jobs. The best and only option for understanding the use of System Delimiters is to identify calls to Server Functions which interpret String inputs as Dynamic Arrays.

Conversion Approach

Given the (relatively) low use of Server Functions which support Dynamic Arrays, I don’t think its likely that many DataStage Server users utilize Dynamic Array logic in their jobs. S2PX is also founded on the assumption that a vast majority of calls to the Substrings function use Dynamic Arrays comprising a single field (i.e. A string without a system delimiter) which is the equivalent of DataStage's substring ([]) operator.

Unfortunately, this analysis does have some limitations:

  • Use of Dynamic Arrays might be under represented since we can’t directly identify the use of Dynamic Arrays

  • Lack of awareness of Dynamic Arrays within DataStage development community could lead to heavy use for small number of DataStage sites that are not represented in our research set

These limitations, combined with the very high difficulty of manually converting jobs which use Dynamic Arrays suggests that it may be unwise to completely dismiss Dynamic Arrays as an unused Server Feature and make it out of scope for Server to Parallel conversion.

Converting superficial use of System Delimiters and Dynamic Arrays

This approach ensures jobs which use System Delimiters such as @IM and @FM will compile and run without triggering data loss due to Invalid Character warnings. The idea is to change the code points used by System Delimiters to rarely-used control characters within the standard 0x00 - 0x7F range. Doing so will prevent data loss and ensure the code point remains the same regardless of whether unicode is used. Below are the code points for System Delimiters:

System Variable

Code Point (Hex)

Code Point(Dec)

ASCII Character Name

System Variable

Code Point (Hex)

Code Point(Dec)

ASCII Character Name

@IM

0x1F

31

Unit Separator

@FM

0x1E

30

Record Separator

@VM

0x1D

29

Group Separator

@SM

0x1C

28

File Separator

@TM

0x1A

26

Substitute

@NULL.STR

0x19

25

End of Medium

Built-in Server functions which support Dynamic Arrays will be converted by S2PX to equivalent Parallel functions. However, edge cases related to System Delimiters will not produce the same results as Server:

Function

Conversion

Comment

Function

Conversion

Comment

Substrings(string, start, length)

string[start, length]

The disproportionately high use of substrings suggest that most calls to this function are not aware of its Dynamic Array capabilities and instead use it with simple strings.

The naïve translation will handle simple strings but will not produce the expected results when used with a Dynamic Array

Exchange(string, xx, yy)

Convert(string, Iconv(xx, "MX0C"), Iconv(yy, "MX0C"))

Will operate correctly unless xx or yy hex values represent the Server’s definitions of System Delimiters. There is also a special case where setting yy to "FF" (Server’s hex value for @IM) will remove xx from the string.

These are edge cases that will result in incorrect output in parallel when used.

Extract(dynamic.array, field#)

Extract(dynamic.array, field#, value#)

Extract(dynamic.array, field#, value#, subvalue#)

Field(dynamic.array, Char(30), field#)

Field(Field(dynamic.array, Char(30), field#), Char(29), value#)

Field(Field(Field(dynamic.array, Char(30), field#), Char(29), value#), Char(28), subvalue#)

Will correctly extract Dynamic array fields containing new System Delimiters.

Fails on an edge case where the extracted field contains just the @Null.Str system delimiter. The server function will convert to @Null but this function will continue to treat the field as a simple string.

All other functions will be left unchanged and will result in compile errors.

© 2015-2024 Data Migrators Pty Ltd.