Transformer Reject Links
Summary
A ‘reject row’ is an incoming row which meets one or more conditions which are identified by the Transformer and cause it to be propagated to one or more dedicated reject links. The configuration of reject links in the Transformer Stage differs significantly between DataStage’s Server and Parallel Jobs and depends greatly upon the reason for the row being rejected. Rejects are identified by expressions in the constraint field of each of a Transformer’s output links. Each type of reject, and how they’re identified, is summarised below. Note that in these cases the ‘reject link’ must be identified in the Transformer by selecting the checkbox in the output link’s Reject Row column.
# | Reason for row rejection | Transformer link constraint |
---|---|---|
1 | The row failing to match a business rule | An expression defining your business rule, e.g. |
2 | The output row on |
|
3 | The output row on | |
4 | The output row on |
|
5 | As a ‘catch all’ condition the Constraint field can be left blank in which case the reject link will catch all rows which have not been successfully written to any of the output links processed up to this point. For this reason the reject link should be the last link in the defined processing order. This condition is analogous to the use of an Otherwise constraint in Parallel Transformers. | None |
Note that the difference between #5 and #3 in the table above is that #3 detects when a row on {OutputLinkName}
fails to meet its output constraint whereas #5 detects a row on any output link which has failed to meet its respective output constraint.
The manner by which these conditions are detected and handled in Server Jobs is described in detail on this IBM documentation page. The Parallel engine, however, provides alternative mechanisms for handling each of these these rejects scenarios, some of which require a degree of manual remediation. Server jobs which use reject links, and will consequently require some manual effort, are identified for you by the Transformer Has Reject Links asset query.
Example Manual Remediation
The following Server Job (called ‘Transformer_with_Rejects’) demonstrates two types of reject link behaviour in a server job. In Server Job, downstream stages which have the capacity to reject data - the Oracle Connectors in the example below - will notify their upstream Transformer of the rejection. This Transformer Transformer_1 then provides the rejection handling mechanism which, in this case, propagates those rows to the Rejects link. Note that in this example the reject link Rejects propagates not only rows flagged as rejected by the downstream Connectors but also rows rejected for non-technical reasons identified within the Transformer, perhaps because the row fails some business rule.
The Parallel engine does not support the handling of reject links in the same manner as the Server engine so Jobs that contain this logic need to be identified and manually updated after conversion to a Parallel Job . In this example we’ll start by identifying those Jobs which utilise a reject-handling design pattern which requires restructuring for compatibility with the Parallel Engine. These jobs can be identified using the S2PX Analysis process:
$> mettleci s2px analysis \
-isx MyProjectExport.isx \
-report MyProjectReport.xlsx
MettleCI Command Line (build 161)
(C) 2018-2022 Data Migrators Pty Ltd
Analysing conversion coverage...
queries configuration discovered
new query discovered - 'S2PX Functionality Gap by Stage Type' (SERVER_JOB)
new query discovered - 'S2PX Functionality Gap by Stage Type' (SERVER_SHARED_CONTAINER)
* Transformer_with_Rejects (SERVER_JOB)
<SNIP>
Reviewing the analysis output spreadsheet tells us that the reject link rejects in our example job is configured to output data rows which have been rejected by two of the Transformer’s other output links:
Examining Transformer Transformer_1 we see the constraints against each output link:
We can see that dataOut1 and dataOut2 each have business rule constraints determining their output, and a constraint determining which rows are propagated to the rejects link:
dataOut1.REJECTED
- any row rejected due to…
failing the business-rule constraint or
rejected due to a write error in the downstream Connector
dataOut2.REJECTEDCODE
- any row rejected due to…
rejected due to a write error in the downstream Connector
There are two types of rejects being propagated, each necessitating different methods of remediation in the generated Parallel Job:
Type of Reject | Server Reject Link Constraint | Methods of Remediation in Parallel |
---|---|---|
Error writing in downstream stage |
|
|
Failed business-rule constraint |
|
|
Catch-All (Failed constraints or write errors) | Individual Links (may appear multiple times referencing different output links):
|
|
A ‘catch all’ constraint: Link Type set as reject, no constraint entered. This is the equivalent of testing all other output links' |
|
For the example Job shown above S2PX would deliver a Parallel Job with an identical appearance to the original Server Job:
Note that the following logic transformations have been performed:
String comparisons in the output link constraints have been updated to ensure they replicate the behaviour of the original Server Job
The constraint on the Reject link has been automatically converted by S2PX in an effort to render an expression compatible with the DataStage Parallel engine.
Despite the constraint on the Reject link being a valid Parallel expression the Job fails to compile because the .REJECTED
and .REJECTEDCODE
return codes are not recognised by the Parallel engine. These were deliberately retained by S2PX as a reminder that manual remediation needs to be performed.
To update the Parallel Job design to handle the reject rows we’ll modify the Job so that rather than configure the Transformer to identify and generate all rejects we’ll allow the Connectors which interface to the database to handle the rejects relevant to them.
Remediation Steps
Start by Creating new reject links from each of the output connectors which are detecting and generating reject rows:
Next, select the database error codes you wish to track (selecting all error codes is recommended), and append ERRORCODE
and ERRORMSG
to the fields in the link to improve future error handling diagnosis.
Update the original Rejects link with a constraint complementary to dataOut1, i.e.: StrCmp(dataIn.a, "Apple") <> 0
. This now becomes a standard link that specifically outputs the rows that dataOut1 filters out.
With these changes applied the job will now behave as designed.
See also
Transformer Has Reject Links asset query
Defining Constraints and Handling Rejects - IBM Documentation
Connector-specific documentation
© 2015-2024 Data Migrators Pty Ltd.