Document toolboxDocument toolbox

Organising your Git repositories

Common (and best) practise for organising your Git repositories is to establish the following remote repositories (on an on-premises or cloud-based Git hosting service):

  • A Compliance repository for use as your organisation’s default set of DataStage coding standards across projects.

  • One DataStage solution repository per DataStage project.

Each of these, and options for differing structures, are described below.

The Compliance Repository

Most organisations will use a single Compliance repository for use as their organisation’s default set of DataStage coding standards across all projects. This repository is build from the one that ships with MettleCI, but its rules will normally be edited/curated to meet you organisation’s requirements. Editing the rules related to Job and Stage naming standards, for example, will be one example where most organisations will want to customise the supplied rules.

Using different Compliance rules for different DataStage Projects/Teams

If different DataStage teams want to use different sets of Compliance rules then there are a number of ways of achieving this:

  1. Use a single, dedicated Compliance repository with all teams sharing a common set of Compliance Rules

  2. Use a single, dedicated Compliance repository with each team’s Compliance Rules residing in different folders of that repository

  3. Use a single, dedicated Compliance repository with each team’s Compliance Rules residing in different branches of that repository

  4. Host each DataStage Team’s project-specific Compliance Rules within its own dedicated repository

  5. Host each DataStage Team’s project-specific Compliance Rules within a sub-folder of their Data Solution repository

Other approaches could be employed, but of these options we find that the first one (shared Compliance Rules across the organisation) is the easiest to set up and maintain. It also delivers the benefits of enforcing consistent coding standards across teams.

DataStage Solution Repositories

You will normally establish one DataStage solution repository per DataStage Project. Each repository is best created from one of the templates supplied with MettleCI. You should choose the template best matching the build platform you are planning on using for orchestrating your Continuous Integration and Deployment pipeline. Templates are currently supplied for Jenkins, GitLab, Bamboo, and Azure DevOps.

Many Projects, One Repository

You can map multiple DataStage projects to a single repository, but this is an unusual requirement for which there is rarely a rational justification.To avoid naming collisions you should ensure that each project is mapped to a distinct Path when configuring it in Workbench.

A simple solution would be to map each DataStage Project to a Path that is the same name as the Project. Give due consideration to the many negative implications of having multiple DataStage projects sharing a single Git repository before adopting this approach. We won't catalogue all those implications here as we’d prefer that you just don’t do it.

One Project, Many Repositories

MettleCI does not permit you to map a single DataStage Project to multiple Git repositories. Some customers have multiple DataStage solutions co-resident in a single DataStage Project. In these cases customers should split their work into separate DataStage Projects representing their distinct solutions before establishing an association with the Git repository for each.

© 2015-2024 Data Migrators Pty Ltd.