Transformer Lookups
In Server Jobs, lookups are implemented using Transformer stages with a reference input link. Here’s a simple example of a Server Job using a Transformer to perform a lookup:
In this example:
The data from Sequential File
inputFile
and Hashed FilereferenceFile
are joined on the key fieldkey1
.The join uses a Key Expression which joins rows with
referenceLink.key1
values equal toinLink.key1 * 5
The Hashed File referenceFile used as the reference source can be configured to pre-load the file to memory (for in-memory lookups). Otherwise, it is used as a sparse lookup.
Also note that S2PX converts all lookups to sparse lookups by default, however the conversion process is more sophisticated for lookups using Hashed File-based references, which are described later.
Notable differences between this Server implementation and a Parallel implementation are:
Parallel Transformer stages do not accept reference links so Parallel lookups are implemented using a dedicated Lookup Stage. S2PX will therefore convert Server Transformer stages used for lookup operations into Parallel Lookup stages.
The Server Transformer’s Key Expressions are not available in the Parallel Lookup Stage - the values being used for the join need to be pre-supplied, either directly from a data source or calculated using an up-stream transformer. To handle this S2PX will identify the use of Transformer Key Expressions and generate Parallel jobs containing a pre-lookup Transformer stage which pre-calculates keys for comparison in the Lookup stage.
Note that you cannot insert a Transformer stage between a Lookup Stage and the stage being used as that lookup’s reference.
The above example job will be converted into a job which looks like this:
Because S2PX replicates Server Job designs using containers you’d expect a S2PX-generated Job to look like this:
DataStage does not support the use of sparse lookup sources outside of Job Containers so S2PX addresses this by bringing the reference source inside the Container, resulting in a Job design looking like this:
As mentioned above, the Parallel Lookup stage does not accept user-specified expressions for key lookups. S2PX therefore injects a pre-lookup Prepare Transformer stage which pre-calculates any required Key Expressions and supplies them as one or more ephemeral column(s). These columns are only used to support the Lookup functionality and aren't propagated downstream once they have been used. S2PX also generates a post-lookup Logic Transformer stage which calculates other Transformer Stage Variables and output link derivations as required. These Transformers also perform pre- and post-processing required to support the S2PX translation of Hashed Files - See Hashed File Lookups for more details.
Another difference between Server and Parallel Jobs is that Server Jobs support multiple sparse lookups per Transformer stage, whereas the Parallel Lookup stage doesn’t. When a Server lookup has multiple reference sources then S2PX translates these into individual Parallel Lookup stages, each of which supports a single sparse lookup source. This also provides the option for users to easily toggle each generated Lookup between in-memory and sparse lookup modes as part of any post-conversion tuning activity.
Hashed File Lookups
Note that…
Server Hashed Files are converted to Parallel DRS Stages
The Hashed File data are loaded into an underlying relational database which is subsequently accessed by Parallel Jobs using the DRS stage
Those data are stored using a special S2PX-specific format which is necessary to align with Server’s somewhat flexible approach to Hashed File metadata!
An important issue this introduces is that the specially-formatted data retrieved from the Hashed File-derived database tables will need to be pre-converted before they can be used in a Parallel Lookup (which does not permit Key Expressions). The requirement to introduce leading !
characters (used as a prefix for non-null fields) and @
characters (indicating Nulls) in order to ensure lookup keys match one another is another responsibility of the Prepare Transformer, introduced above. These temporary changes are then removed in the Logic Transformer, restoring the input data to its original format. See Using Hashed-File database tables in lookups for more information.
Sparse vs. in-memory
If a Server Job features a Hashed File which is created in its entirety within the Job (i.e. one part of the Job writes to the Hashed File with the Create file
or Clear file before writing
options set) and that Hashed File is subsequently used in a lookup then S2PX assumes the lookup is an in-memory lookup and translates the Parallel stage(s) accordingly.
Â
© 2015-2024 Data Migrators Pty Ltd.