Document toolboxDocument toolbox

(v6) Transformer Reject Links

Summary

A ‘reject row’ is an incoming row which meets one or more conditions which are identified by the Transformer and cause it to be propagated to one or more dedicated reject links. The configuration of reject links in the Transformer Stage differs significantly between DataStage’s Server and Parallel Jobs and depends greatly upon the reason for the row being rejected. Rejects are identified by expressions in the constraint field of each of a Transformer’s output links. Each type of reject, and how they’re identified, is summarised below. Note that in these cases the ‘reject link’ must be identified in the Transformer by selecting the checkbox in the output link’s Reject Row column.

#

Reason for row rejection

Transformer link constraint

#

Reason for row rejection

Transformer link constraint

1

The row failing to match a business rule

An expression defining your business rule, e.g. CustomerType <> "Active"

2

The output row on {OutputLinkName} caused a write failure on a downstream stage directly-connected to an output link

{OutputLinkName}.REJECTEDCODE

3

The output row on {OutputLinkName} did not meet its output constraint and was consequently not output

{OutputLinkName}.REJECTEDCODE = DSE.NOERROR AND {OutputLinkName}.REJECTED

4

The output row on {OutputLinkName} was rejected either because it failed to match a business rule specified in its constraint or because a downstream write operation failed.

{OutputLinkName}.REJECTED

5

As a ‘catch all’ condition the Constraint field can be left blank in which case the reject link will catch all rows which have not been successfully written to any of the output links processed up to this point. For this reason the reject link should be the last link in the defined processing order. This condition is analogous to the use of an Otherwise constraint in Parallel Transformers.

None

Note that the difference between #5 and #3 in the table above is that #3 detects when a row on {OutputLinkName} fails to meet its output constraint whereas #5 detects a row on any output link which has failed to meet its respective output constraint.

The manner by which these conditions are detected and handled in Server Jobs is described in detail on this IBM documentation page. The Parallel engine, however, provides alternative mechanisms for handling each of these these rejects scenarios, some of which require a degree of manual remediation. Server jobs which use reject links, and will consequently require some manual effort, are identified for you by the Transformer Has Reject Links asset query.

Example Manual Remediation

The following Server Job (called ‘Transformer_with_Rejects’) demonstrates two types of reject link behaviour in a server job. In Server Job, downstream stages which have the capacity to reject data - the Oracle Connectors in the example below - will notify their upstream Transformer of the rejection. This Transformer Transformer_1 then provides the rejection handling mechanism which, in this case, propagates those rows to the Rejects link. Note that in this example the reject link Rejects propagates not only rows flagged as rejected by the downstream Connectors but also rows rejected for non-technical reasons identified within the Transformer, perhaps because the row fails some business rule.

The Parallel engine does not support the handling of reject links in the same manner as the Server engine so Jobs that contain this logic need to be identified and manually updated after conversion to a Parallel Job . In this example we’ll start by identifying those Jobs which utilise a reject-handling design pattern which requires restructuring for compatibility with the Parallel Engine. These jobs can be identified using the S2PX Analysis process:

$> mettleci s2px analysis \ -isx MyProjectExport.isx \ -report MyProjectReport.xlsx MettleCI Command Line (build 161) (C) 2018-2022 Data Migrators Pty Ltd Analysing conversion coverage... queries configuration discovered new query discovered - 'S2PX Functionality Gap by Stage Type' (SERVER_JOB) new query discovered - 'S2PX Functionality Gap by Stage Type' (SERVER_SHARED_CONTAINER) * Transformer_with_Rejects (SERVER_JOB) <SNIP>

Reviewing the analysis output spreadsheet tells us that the reject link rejects in our example job is configured to output data rows which have been rejected by two of the Transformer’s other output links:

file

asset

assetType

query

documentation

stageName

linkName

file

asset

assetType

query

documentation

stageName

linkName

MyProjectExport.isx

Transformer_with_Rejects

SERVER_JOB

Transformer Has Reject Links

(link)

Transformer_1

dataOut1

MyProjectExport.isx

Transformer_with_Rejects

SERVER_JOB

Transformer Has Reject Links

(link)

Transformer_1

dataOut2

Examining Transformer Transformer_1 we see the constraints against each output link:

We can see that dataOut1 and dataOut2 each have business rule constraints determining their output, and a constraint determining which rows are propagated to the rejects link:

dataOut1.REJECTED - any row rejected due to…

  • failing the business-rule constraint or

  • rejected due to a write error in the downstream Connector

dataOut2.REJECTEDCODE - any row rejected due to…

  • rejected due to a write error in the downstream Connector

There are two types of rejects being propagated, each necessitating different methods of remediation in the generated Parallel Job:

Type of Reject

Server Reject Link Constraint

Methods of Remediation in Parallel

Type of Reject

Server Reject Link Constraint

Methods of Remediation in Parallel

Error writing in downstream stage

{OutputLinkName}.REJECTEDCODE

  • Create a Reject Link against the target stage.

  • Database Connectors: reject links allow DB error codes to be returned, for more appropriate triage of reject rows.

  • File Stages: set Reject Mode to “output”. Reject links output properties cannot be configured.

Failed business-rule constraint

{OutputLinkName}.REJECTEDCODE = DSE.NOERROR AND {OutputLinkName}.REJECTED

  • Create the complement of the constraint on linkname against another output link

Catch-All (Failed constraints or write errors)

Individual Links (may appear multiple times referencing different output links):

{OutputLinkName}.REJECTED

  • Combination of the above methods

 

A ‘catch all’ constraint:

Link Type set as reject, no constraint entered.

This is the equivalent of testing all other output links' .REJECTED code.

  • Convert the Server Reject Link to a Parallel Reject Link. To do so…

    • exit the Transformer editor,

    • right-click the link, and

    • select Convert to Reject.

For the example Job shown above S2PX would deliver a Parallel Job with an identical appearance to the original Server Job:

Note that the following logic transformations have been performed:

  • String comparisons in the output link constraints have been updated to ensure they replicate the behaviour of the original Server Job

  • The constraint on the Reject link has been automatically converted by S2PX in an effort to render an expression compatible with the DataStage Parallel engine.

Despite the constraint on the Reject link being a valid Parallel expression the Job fails to compile because the .REJECTED and .REJECTEDCODE return codes are not recognised by the Parallel engine. These were deliberately retained by S2PX as a reminder that manual remediation needs to be performed.

To update the Parallel Job design to handle the reject rows we’ll modify the Job so that rather than configure the Transformer to identify and generate all rejects we’ll allow the Connectors which interface to the database to handle the rejects relevant to them.

Remediation Steps

Start by Creating new reject links from each of the output connectors which are detecting and generating reject rows:

Next, select the database error codes you wish to track (selecting all error codes is recommended), and append ERRORCODE and ERRORMSG to the fields in the link to improve future error handling diagnosis.

Update the original Rejects link with a constraint complementary to dataOut1, i.e.: StrCmp(dataIn.a, "Apple") <> 0. This now becomes a standard link that specifically outputs the rows that dataOut1 filters out.

With these changes applied the job will now behave as designed.


See also

 

© 2015-2024 Data Migrators Pty Ltd.