TO DO - Unit Testing Stages within a Local/Shared Container 🔒

At present the MettleCI automated unit test specification generator does not handle shared or local containers. The yaml specification it generates will have the inputs and outputs for the job modeled correctly but it will omit any inputs or outputs that are within containers. The unit test harness can handle containers, but we must modify the yaml by hand to include the inputs and outputs within containers. They go in the usual place (given: for inputs and then:for outputs) but with names that disambiguate the links and stages referenced.

A unit test specification’s input and output path names (csv file names) are (by default) formed from the stage and link names of the input/output being intercepted (stage-link.csv). These names can be anything, though, as long as they are consistent within a given test case. What matters is the stage and link names, as the harness needs these to replace them with csv files. For jobs that do not have containers, using the stage and link names “as is” is sufficient, but containers add potential ambiguity.

This is resolved by dot prefixing the container name to the front of the stage name to give a name of the form containerInvocation.stage In the case of nested containers the outer container is prepended in front of the inner, to as many levels as necessary to model the nesting accurately.

Note: Container inputs and outputs themselves are not modeled or intercepted since they are really just connectors with no physically manifested input or output.

For example, consider this container, which has container input/output (not needed to be modeled in the yaml) and actual physical input/output as well, which we do need to intercept/test with.

Here is a job using it

When we generate a unit test spec from this job, the resulting yaml looks like this

---
given:
- stage: "sf_orders"
  link: "ln_filter"
  path: "sf_orders-ln_filter.csv"
when:
  job: "processOrders"
  controller: null
  parameters: {}
then:
- stage: "sf_samples"
  link: "ln_samples"
  path: "sf_samples-ln_samples.csv"
  cluster: null
  ignore: null
- stage: "sf_addressedOrders"
  link: "ln_addressed"
  path: "sf_addressedOrders-ln_addressed.csv"
  cluster: null
  ignore: null
- stage: "sf_summary"
  link: "ln_summary"
  path: "sf_summary-ln_summary.csv"
  cluster: null
  ignore: null

As can be seen the physical inputs within the container are not present in the given:section and the physical outputs are not present in the then: section. We must add them. The rules given above say construct the stage name for these inputs/outputs by prepending the container name. In our case this is OrderAddressC1 (the invocation, not the container name itself, as we need to be able to handle multiple uses of the same container within a job)

To understand this better, consider this “monolithic” job in which the container was “undone” and all stages are present.

If we generate a unit test spec for the above, it looks like this. As you can see all the stages are present, as expected. Note particularly stages ds_cust in the given: and ds_flaggedCust in the when: ... these are the stages inside the container that we need to manually add to the containerized job's test spec

---
given:
- stage: "sf_orders"
  link: "ln_filter"
  path: "sf_orders-ln_filter.csv"
- stage: "ds_cust"
  link: "ln_cust"
  path: "ds_cust-ln_cust.csv"
when:
  job: "monolithic_v1"
  controller: null
  parameters: {}
then:
- stage: "ds_flaggedCust"
  link: "ln_flagged"
  path: "ds_flaggedCust-ln_flagged.csv"
  cluster: null
  ignore: null
- stage: "sf_samples"
  link: "ln_samples"
  path: "sf_samples-ln_samples.csv"
  cluster: null
  ignore: null
- stage: "sf_addressedOrders"
  link: "ln_addressed"
  path: "sf_addressedOrders-ln_addressed.csv"
  cluster: null
  ignore: null
- stage: "sf_summary"
  link: "ln_summary"
  path: "sf_summary-ln_summary.csv"
  cluster: null
  ignore: null

Note: We do not need to always “undo” containers, but it can make things clearer for understanding how to add the missing stages your first few times.

The yaml we need can be created by taking the original yaml generated and adding the appropriate stage/link/path entries.

Here is the yaml for the ProcessOrders job after we modify it.

---
given:
- stage: "sf_orders"
  link: "ln_filter"
  path: "sf_orders-ln_filter.csv"
- stage: "OrderAddressC1.ds_cust"
  link: "ln_cust"
  path: "OrderAddressC1-ds_cust-ln_cust.csv"
when:
  job: "processOrders"
  controller: null
  parameters: {}
then:
- stage: "OrderAddressC1.ds_flaggedCust"
  link: "ln_flagged"
  path: "OrderAddressC1-ds_flaggedCust-ln_flagged.csv"
  cluster: null
  ignore: null
- stage: "sf_samples"
  link: "ln_samples"
  path: "sf_samples-ln_samples.csv"
  cluster: null
  ignore: null
- stage: "sf_addressedOrders"
  link: "ln_addressed"
  path: "sf_addressedOrders-ln_addressed.csv"
  cluster: null
  ignore: null
- stage: "sf_summary"
  link: "ln_summary"
  path: "sf_summary-ln_summary.csv"
  cluster: null
  ignore: null

Lines 6-8 and 14-18 were added to the generated yaml. In each case the stage name has been prepended with OrderAddressC1. because that is the name of the container invocation. This ensures uniqueness across all container invocations in the job. (globally unique) The link names do not need this disambiguation as they are already scoped to the source/target stages. The path (name of the csv file) can be anything you like. We chose a name that shows the connection ( container-stage-link.csv ) but we did not have to.

Here is a successful test run