Document toolboxDocument toolbox

Transformer Lookups

In Server Jobs, lookups are implemented using Transformer stages with a reference input link. Here’s a simple example of a Server Job using a Transformer to perform a lookup:

A Server Job with a Lookup Transformer
The Server Job’s Transformer definition

In this example:

  • The data from Sequential File inputFile and Hashed File referenceFile are joined on the key field key1.

  • The join uses a Key Expression which joins rows with referenceLink.key1 values equal to inLink.key1 * 5

  • The Hashed File referenceFile used as the reference source can be configured to pre-load the file to memory (for in-memory lookups). Otherwise, it is used as a sparse lookup.

Also note that S2PX converts all lookups to sparse lookups by default, however the conversion process is more sophisticated for lookups using Hashed File-based references, which are described later.

Notable differences between this Server implementation and a Parallel implementation are:

  • Parallel Transformer stages do not accept reference links so Parallel lookups are implemented using a dedicated Lookup Stage. S2PX will therefore convert Server Transformer stages used for lookup operations into Parallel Lookup stages.

  • The Server Transformer’s Key Expressions are not available in the Parallel Lookup Stage - the values being used for the join need to be pre-supplied, either directly from a data source or calculated using an up-stream transformer. To handle this S2PX will identify the use of Transformer Key Expressions and generate Parallel jobs containing a pre-lookup Transformer stage which pre-calculates keys for comparison in the Lookup stage.

  • Note that you cannot insert a Transformer stage between a Lookup Stage and the stage being used as that lookup’s reference.

The above example job will be converted into a job which looks like this:

Because S2PX replicates Server Job designs using containers you would expect a S2PX-generated Job to look like this:

Unfortunately, DataStage does not support the use of sparse lookup operations inside Job Containers where the lookup sources is located outside the Job Container. S2PX addresses this by bringing the reference source inside the Container, resulting in a Job design that looks like this:

As mentioned above, the Parallel Lookup stage does not accept user-specified expressions for key lookups. S2PX therefore injects a pre-lookup Prepare Transformer stage which pre-calculates any required Key Expressions and supplies them as one or more ephemeral column(s). These columns are only used to support the Lookup functionality and aren't propagated downstream once they have been used. S2PX also generates a post-lookup Logic Transformer stage which calculates other Transformer Stage Variables and output link derivations as required. These Transformers also perform pre- and post-processing required to support the S2PX translation of Hashed Files - See Hashed File Lookups for more details.

Another difference between Server and Parallel Jobs is that Server Jobs support multiple sparse lookups per Transformer stage, whereas the Parallel Lookup stage doesn’t. When a Server lookup has multiple reference sources then S2PX translates these into individual Parallel Lookup stages, each of which supports a single sparse lookup source. This also provides the option for users to easily toggle each generated Lookup between in-memory and sparse lookup modes as part of any post-conversion tuning activity.

Hashed File Lookups

Note that…

An important issue this introduces is that the specially formatted data retrieved from the Hashed File-derived database tables will need to be pre-converted before they can be used in a Parallel Lookup (which does not permit Key Expressions). The requirement to introduce leading ! characters (used as a prefix for non-null fields) and @ characters (indicating Nulls) in order to ensure lookup keys match one another is another responsibility of the Prepare Transformer, introduced above. These temporary changes are then removed in the Logic Transformer, restoring the input data to its original format. See Using Hashed-File database tables in lookups for more information.

Sparse vs. in-memory

If a Server Job features a Hashed File which is created in its entirety within the Job (i.e. one part of the Job writes to the Hashed File with the Create file or Clear file before writing options set) and that Hashed File is subsequently used in a lookup then S2PX assumes the lookup is an in-memory lookup and translates the Parallel stage(s) accordingly.

 

© 2015-2024 Data Migrators Pty Ltd.