Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Sparse Lookup Stage

When building DataStage jobs using the Lookup stage, performing a Sparse or Normal lookup is as simple as changing the lookup type of the reference Database stage. However, when a DataStage job is compiled to OSH and executed, the Lookup stage is not used to perform the sparse lookup. Instead, the Lookup is replaced with the Database operator which is responsible for reading input rows, looking up values from the database and producing output records. It is for this reason that all Database log messages in the DataStage Director are attributed to the lookup stage and why the Database stage never appears in the Monitor of the DataStage Director.

It is not possible for the MettleCI Unit Test feature to change the lookup from Sparse to Normal without fundamentally transforming the run-time job design. Doing so would invalidate any Unit Test results, defeating the purpose of this MettleCI function. To Unit Test job designs using Sparse lookups, add the input link in the then clause of the Unit Test Spec and the output link in the given clause of the Unit Test Spec.

The CSV input specified by the given clause contains the data that will become the flow of records from the Sparse Lookup stage. The data could simulate what would be produced by the real Sparse Lookup Stage if it had processed the Unit Test input records, however they don't have to.

Code Block
given:
  - stage: SparseLookup
    link: Output
    path: SparseLookup-Output.csv
when:
...
then:
  - stage: SparseLookup
    link: Input
    path: SparseLookup-Input.csv

An Alternative Approach for Multiple Sparse Lookups

For jobs where the vast majority of job logic is implemented using Sparse Lookup stages, replacing all lookups with Unit Test data would result in little-to-no DataStage logic being tested (as illustrated below).

For this type of Job design, an alternative testing approach is to leave the Sparse Lookup in place and replace the input and output stages with Unit Test data. A live Database connection will be required during testing but the when clause can be used to set job parameters that dictate database connection settings.

Note

Technically this is an Integration Test, not a Unit Test: The Unit Test Harness does not provide any functionality for populating database reference tables with Unit Test data prior to test execution, users are responsible for managing Integration Test setup and tear down through governance and/or CI/CD pipeline customisation.

Code Block
languageyaml
given:
  - stage: Source
    link: Input
    path: Source-Output.csv
when:
  parameters:
    DbName: MyUnitTestDb
    DbUser: MyUnitTestUser
    DbPass: {iisenc}dKANDae233DJIQeiidm==
then:
  - stage: Taret
    link: Output
    path: Target-Output.csv