Question

There's something about the CLI that I don't quite understand, still.. it's the -project-cache option. When given as a parameter (to several different CLI invocation types) it enables incremental things. Is this cache supposed to be on the DS engine, or on the MCI host machine? I've seen it done both ways and it confuses me a bit.

Answer

The directory supplied to the -project-cache option is the location where the CLI will read/write state information used for performing incremental operations. This directory exists wherever the MettleCI CLI executes, which is normally the MettleCI host rather than the DS Engine tier.

There are a few things which need to be considered when deciding the location of project cache directories:

The directory must be unique to a project
It must be directly accessible from the CLI (ie. You can't specify an engine path if the CLI is running on a Client)
If multiple instances of the CLI are used for incremental operations, the -project-cache needs to be kept in sync across all instances. Shared storage is usually the easiest way to do this.

The last point becomes important if you are running a pool of agents on the CI/CD pipeline. If they all have independent copies of the -project-cache files, incrementals won't be calculated correctly resulting in a lot of unnecessary operations being performed.

So even on windows, best if agents have a shared dir (on a network drive?) to do this. OR restrict to just one agent using labels or whatever? (which is not a good idea for scalability etc) ...

I think none of our agents currently (on the various demoX clients, the test1 client, etc ) use shared storage yea? (edited)

OS doesn't really matter. Its more about the number of CLI "instances". Most of our installs include a single "MettleCI Host" which has the CLI. In this case, there is only one CLI instance and a local -project-cache can be used.

If a new server with the same version of the DataStage Client is setup, then you'd need to move the existing -project-cache directory to shared storage

As an example of what could happen if deploy to the same datastage project with two different -project-cache directories:

Initial Deploy: CLI #1 imports and compiles all ISX files

Job A is checked in: CLI #2 expects project to be empty, deletes content of project and imports and compiles all ISX files

Job B is checked in: CLI #1 thinks everything has been changed since the last time CLI #1 performed a deployment. deletes content of project and imports and compiles all ISX files.

Cycle continues....

As can be seen, they will interfere with each other and will more than likely be slower than a full deployment due to the clean up process. Obviously this is worst case scenario because each one runs alternately but the behaviour will converge to worst case scenario as the number of CLI instances increase :disappointed:

MettleCI CLI and the 'project-cache' directory

Question

Answer