While most DataStage jobs can be tested by using MettleCI Unit Testing simply to replace input and output stages, some job designs will necessitate a more advanced Unit Testing configuration. This page outlines Unit Testing patterns that can be applied to test these job designs.
Table of Contents |
---|
Input stage with rejects
The Input stage can be Unit Tested by including both read and reject links in the given clause of the Unit Test Spec. MettleCI Workbench will automatically detect this Unit Testing pattern during creation of new Unit Tests.
The CSV data specified for the rejects link should contain records used for Unit Testing the flow of records through the reject path(s) of the job.
Code Block | ||
---|---|---|
| ||
given: - stage: Input link: Read path: Input-Read.csv - stage: Input link: Rejects path: Input-Rejects.csv when: ... |
Output stage with rejects
The output stage can be Unit Tested by including the write link in the then clause of the Unit Test Spec and the reject in the given clause of the Unit Test Spec. MettleCI Workbench will automatically detect this Unit Testing pattern during creation of new Unit Tests.
The CSV data specified for the rejects link should contain records used for Unit Testing the flow of records through the reject path(s) of the job.
Code Block | ||
---|---|---|
| ||
given: - stage: Output link: Rejects path: Output-Rejects.csv when: ... then: - stage: Output link: Write path: Output-Write.csv |
Stored Procedure stage
A Stored Procedure Stage will not only connect to an external Database for processing but it will also produce output records which are not deterministic. To Unit Test this job design, the Stored Procedure Stage needs to be removed from the job and replaced with Unit Test Data. This is done by adding the input link in the then clause of the Unit Test Spec and the output link in the given clause of the Unit Test Spec.
The CSV data specified by the given clause contains the data that will test the flow of records from the Stored Procedure stage. The data could simulate what would be produced by the real stored procedure if it had processed the Unit Test input records, however this is not a requirement.
Code Block | ||
---|---|---|
| ||
given: - stage: StoredProcedure link: Output path: StoredProcedure-Output.csv when: ... then: - stage: StoredProcedure link: Input path: StoredProcedure-Input.csv |
Classic Surrogate Key Generator stage
The classic Surrogate Key Generator stage will generate sequential keys from a given start value which is typically set via a Job Parameter. To ensure that the same surrogate key values are generated during unit test execution will always match the expected values, add a fixed value for the start value Job Parameter in the when clause of the Unit Test Spec.
Code Block | ||
---|---|---|
| ||
given: ... when: job: KeyGeneratorExample parameters: START_KEY: 100 then: ... |
Database or Flat File backed Surrogate Key Generator stage
Surrogate Key Generators backed by a Database or a Flat File will produce output records which are not deterministic. The use of a Database backed Surrogate Key Generator will also require a live connection to an external Database which is not ideal for Unit Testing. To Unit Test job designs containing this type of Surrogate Key Generator, the Surrogate Key Generator stage needs to be removed from the job and replaced with Unit Test Data. This is done by adding the input link in the then clause of the Unit Test Spec and the output link in the given clause of the Unit Test Spec.
The CSV data specified by the given clause contains the data that will test the flow of records from the Surrogate Key Generator stage. The data could simulate what would be produced by the real Surrogate Key Generator if it had processed the Unit Test input records, however this is not a requirement. The easiest way to simulate the Surrogate Key Generator output records using MettleCI Workbench would be to copy the CSV specified in the then clause, add a new column to represent the generated key and set appropriate key values.
Code Block | ||
---|---|---|
| ||
given: - stage: KeyGenerator link: Output path: KeyGenerator-Output.csv when: ... then: - stage: KeyGenerator link: Input path: KeyGenerator-Input.csv |
Sparse Lookup stage
When building DataStage jobs using the Lookup stage, performing a Sparse or Normal lookup is as simple as changing the lookup type of the reference Database stage. When a DataStage job is compiled to OSH and executed, the lookup stage is not used to perform the sparse lookup. Instead, the Lookup is replaced with the Database operator which is responsible for reading input rows, looking up values from the database and producing output records. It is for this reason that all Database log messages in the DataStage Director are attributed to the lookup stage and why the Database stage never appears in the Monitor of the DataStage Director.
It is not possible for the Unit Test Harness to change the lookup from Sparse to Normal without fundamentally transforming the run time job design, this would invalidate any Unit Test results. To Unit Test job designs using sparse lookups, add the input link in the then clause of the Unit Test Spec and the output link in the given clause of the Unit Test Spec.
The CSV data specified by the given clause contains the data that will test the flow of records from the Sparse Lookup stage. The data could simulate what would be produced by the real Sparse Lookup Stage if it had processed the Unit Test input records, however this is not a requirement.
Code Block |
---|
given: - stage: SparseLookup link: Output path: SparseLookup-Output.csv when: ... then: - stage: SparseLookup link: Input path: SparseLookup-Input.csv |
For jobs where the vast majority of job logic is implemented using Sparse Lookup stages, replacing all lookups with Unit Test data will result in little to no DataStage logic being tested.
For this type of Job design, an alternative testing approach is to leave the sparse lookup in place and replace input and output stages with Unit Test data. A live Database connection will be required during testing but the when clause can be used to set job parameters that dictate database connection settings. Technically this is an Integration Test, not a Unit Test and the Unit Test Harness does not provide any functionality for populating database reference tables with Unit Test data prior to test execution and is is left as an exercise for the user.
Code Block | ||
---|---|---|
| ||
given: - stage: Source link: Input path: Source-Output.csv when: parameters: DbName: MyUnitTestDb DbUser: MyUnitTestUser DbPass: {iisenc}dKANDae233DJIQeiidm== then: - stage: Taret link: Output path: Target-Output.csv |