Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Question

There's something about the CLI that I don't quite understand , still.. it's the -project-cache option parameter to some MettleCI CLI commands. When given as a parameter (to several different CLI invocation types) it enables incremental things. Is operations. What is stored in this cache and where is it supposed to be on the DS enginelive? On the DataStage Engine tier, or on the MCI host machine? I've seen it done both ways and it confuses me a bit.Agent Host?

Answer

The directory supplied to the -project-cache option is the location where the CLI will read/write state information (which we sometimes refer to as asset fingerprints) used for performing incremental operations. This directory exists should exist wherever the MettleCI CLI executes a command which relies on incremental behaviour, which is normally occurs on the MettleCI host rather than the DS Engine tierAgent host under the instruction of your build agent.

Project Cache Location

In the sample pipelines shipped with MettleCI the incremental MettleCI CLI commands, assume the use of a locally-stored project cache and refer to the following project cache location:

Code Block
%AGENTMETTLEHOME%\cache\%IISENGINENAME%\%DATASTAGE_PROJECT%

… which will normally translate to something similar to…

Code Block
C:\MettleCI\CLI\cache\MY.ENGINE.HOSTNAME\MyDataStageProject\
Info

Note that the project cache will always be a Windows-style filesystem reference (using backslashes), as many of the MettleCI CLI commands required in a CI/CD pipeline must run on Windows.

There are a few things which need to be considered when deciding the location of project cache directories:

  • The directory must be unique to a DataStage project

  • It must be directly accessible from the CLI (iei.e.. You can't specify an engine path if the CLI is running on a Client)

  • Status
    colourRed
    titleADVANCED
    If multiple instances of the CLI are to be used for incremental operations , (and hence multiple independent CLI instances need to share a common view of the incremental environment’s status) then the -project-cache needs to be kept in sync available and synchronised across all of those CLI instances. Shared storage is usually the easiest way to do this.This is normally achieved using shared storage.

Using multiple CLI environments

The last point becomes important if you are running a pool ‘pool’ of agents on the CI/CD pipeline. If they were to all have maintain their own independent copies of the -project-cache files , incrementals won't be calculated correctly resulting in a lot of unnecessary operations being performed.

So even on windows, best if agents have a shared dir (on a network drive?) to do this. OR restrict to just one agent using labels or whatever? (which is not a good idea for scalability etc) ...

I think none of our agents currently (on the various demoX clients, the test1 client, etc ) use shared storage yea? (edited)

OS doesn't really matter. Its more about the number of CLI "instances". Most of our installs include a single "MettleCI Host" which has the CLI. In this case, there is only one CLI instance and a local -project-cache can be used.

If a new server with the same version of the DataStage Client is setup, then you'd need to move the existing -project-cache directory to shared storagethen incremental fingerprints will differ between CLI instances, resulting in a significant drop in the performance benefits delivered by the incremental approach.

As an example of what could happen if a mettleci datastage deploy command deploys to the same datastage DataStage project with two different -project-cache directories:

  1. Initial

...

  1. Deployment using MyCache

    1. Command mettleci datastage deploy ... -project-cache MyCache is invoked.

    2. This imports and compiles all ISX files as MyCache does not currently contain any stage information.

    3. This initially expensive process will always be required to establish. the initial contents of the project cache.

...

  1. First Deployed Change using OtherCache

    1. Job MyFirstJob is checked in

...

    1. .

    2. Command mettleci datastage deploy ... -project-cache OtherCache is invoked.

    3. This imports and compiles all ISX files

...

Job B is checked in: CLI #1 thinks everything has been changed since the last time CLI #1 performed a deployment. deletes content of project and imports and compiles all ISX files.

Cycle continues....

...

    1. as OtherCache does not currently contain any stage information.

  1. Second Deployed Change using MyCache

    1. Command mettleci datastage deploy ... -project-cache MyCache is run.

    2. This compares the Git codebase to the target DataStage environment and sees that all fingerprints are misaligned. The command consequently re-imports and re-compiles all ISX files as MyCache contains state information which is not aligned with the target. This could have been avoided using the proper project cache.

Alternatively you can use you agent labelling strategy to ensure that only a single CLI instance is used by your pipeline which eliminates the need to coordinate the project-cache amongst multiple environments. In these cases a locally-stored project cache can be used. If you subsequently decide to horizontally scale your agent capability then you will need to…

  1. move your project cache to shared storage, and

  2. Modify your build pipelines to refer to that new shared location.