Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Info
titleQuestion

Why the ISX format was chosen to be kept in GIT and not a non-­binary format (DSX, XML, pjb/bin)?

As there is a custom script running against an asset maybe the script could unzip the ISX and get XML/DSX content which could be put to Git?

Why ISX?

...

ISX's are compressed, and Git only stores deltas so the disk footprint is minimised

...

Information Server File Formats

Unlike traditional programming languages the source code of DataStage assets is expressed using a set of proprietary export formats. DataStage assets can currently be exported in three possible formats:

  1. DSX – The older Information Server export format that supports only a subset of all current Information Server asset types. NOTE: This format is also available in a ‘7-bit encoding’; variant in which any ASCII character above DEL (127) is encoded as \nnn where nnn is the relevant character’s ASCII number. This is useful when transmitting the resulting DSX via constrained communications channels.

  2. XML (referred to in the DataStage documentation as ‘Legacy XML')

  3. ISX – A compressed XML (binary) based format which supports all Information Server asset types. To add further complexity to this picture the ISX format exists in multiple variations:

    1. The ‘default’ format, useable by all IBM tools to represent individual DataStage jobs,

    2. An Information Server Manager-­specific ISX format, and

    3. An istool-specific version, use when creating ‘releases’ and which supports multiple assets.

Taking a look at MettleCI’s requirements we see that MettleCI provides the mechanisms for…

  • committing DataStage artefacts to a Git repository,

  • testing them against Compliance Rules,

  • testing them against Unit Tests,

  • deploying them to downstream environments, and optionally

  • swapping Jobs' parameter values to those appropriate to each target environment.

Many of these operations relay on the unique capabilities provided by the ISX format which is the principal reason MettleCI uses the 'default' ISX format (3a., above) with each file containing a single asset.

For DataStage Jobs, both the DSX and ISX formats allow you to choose whether you want your export to include Job design information only, encoded-binary executable information, or both. When you choose to create an export incorporating only Job design information then that file will need to be re-compiled in any target project into which it is subsequently imported.

...

When submitting assets to Git MettleCI commits and pushes files containing only design information. Executable information is deliberately excluded but can be reconstituted further down the delivery pipeline for organisations who have a mandate which prevents the compilation of code in Production environments. See Deploying DataStage Binaries for more details.

Why does MettleCI use ISX files?

There are a number of good reasons to select ISX as the management format for Information Server artefacts:

  • ISX supports

  • ISX is the only format which has support for all Information Server asset types

...

  • , not just those

...

There are multiple ISX format variations:

  1. Vanilla ­ Useable as a flexible, single ­job ­version by all tools

  2. Information Server Manager-­specific ISX format.

  3. ISTool releases (multiple job versions)

We use the 'vanilla' ISX format (1) which contains a single version of the job which can be exported/imported without tooling restrictions.

...

  • applicable to DataStage.

  • Git will attempt a merge on non-­binary (ASCII text) files such as DSX’s which would result in the corruption of the asset (i.e. it would not be re-importable.) ISX’s binary format means that it will not suffer this problem.

  • Each available format occupies varying amounts of disk space, with the design-only ISX format used by MettleCI occupying the least. By way of example here’s a comparison of the disk footprints of the same job exported using various DataStage export formats:

    Image Added

Files formats are Opaque

The proprietary formats in which Microsoft Word or Excel, for example, stores their assets is not questioned by customers because those tools provide all the file-management facilities necessary to achieve the desired business outcome. Customers have no need to resort to external tools to inspect or manipulate those files and so the files can be considered an opaque storage layer, and their respective formats a private implementation detail.

Similarly, the format in which DataStage artefacts are exported and stored in Git only matters to customers when they perceive a need to do something with those artefacts beyond the capabilities of DataStage or MettleCI. Many users have an historical attachment to DSX files because they believe DSX’s are…

  • Transparent: The ‘transparency’ of the DSX’s ASCII-based format is sometimes seen as a way of satisfying stringent corporate governance requirements. This governance is only as strong as the processes used to enforce it and it is these processes which are uniquely supported by MettleCI’s capabilities.

  • Inspectable: The DSX file’s plain ASCII format is often perceived as easily 'inspectable' and able to be read by humans to verify its content. While DSX’s may be human readable they are certainly not human parsable and there is virtually no value in inspecting a DataStage Job's source code outside of a DataStage user interface.

  • Searchable: Some customers take the view that the ASCII nature of DSX’s means that they can be searched to identify, for example, instances of coding standards violations, anti-patterns, or security vulnerabilities. MettleCI provides a library of Compliance rules to scan for coding anti-patterns, standards non-compliance, security violations, and code maintainability issues. These rules are easily modified or extended as necessary, and can be applied at design time or automatically during a Continuous Integration build.

  • Modifiable: Customers have sometimes created homegrown scripts to modify DSX’s, commonly to split DSX’s into separate files (typically one file per asset) or to modify parameter values to make them specific to a particular deployment target. These responsibilities are now handled automatically by MettleCI.

  • Comparable: A DSX’s ASCII format leads some to believe they can be subjected to traditional test-based diffing techniques techniques to identify the changes between code version. Both ISX and DSX formats include a lot more information than Job designs. They also contain a lot of non-­functional information such as link label positions, view port position, zoom levels, and snap to grid settings. DataStage Job exports also describe Job Stages in a non-­deterministic order, meaning that two successive exports of an unaltered job may

...

  • exhibit significant

...

  • differences when compared using a traditional diff tool. An external diff tool will be unable to cope with these challenges and so will be unable to usefully identify relevant differences between two DataStage Jobs. The only way to read and understand a

...

  • Job, or compare differences between two

...

  • Jobs, is to let DataStage load them into memory and present a logical representation

...

  • in a human parseable form. This process remains the

...

  • responsibility of

...

Info
titleQuestion

Do you normalise the XML in the ISX file, so an export of the same twice job looks equal? Equality checking is easy done with DSX exports (just leave out the header) ­ XML apparently needs normalisation first.

Both ISX and DSX formats include a lot more information that changing update, export and compile dates. They also contain view port position, zoom levels and snap to grid. They also contain a lot of "non­functional" information such as link label positions. Normalising the export might cut down some noise but does not provide a robust way to determining if a given job has changed between check­ins.

Earlier iterations of MettleCI did not provide the quick and easy check­in process that we have already demonstrated. Checking in jobs manually was tedious and time consuming, so check­ins were usually performed in bulk. Since there was so much time between a change being made and a developer performing the check­in, there were a high amount of "false" check­ins. We did use normalisation to reduce some of the noise but it wasn't robust enough to identify all unchanged jobs.

Since introducing the MettleCI check­in process, it is very rare for developers to check­in a job even though they haven't made any changes. On the odd occasion that it does occur, there isn't really any impact ­ the job will be retested in the CI build and a little more space (kilobytes worth) will be used up by Git storage.

...

  • the DataStage user interface.

  • Mergeable: While the DSX format might be text based like traditional source code, both DSX and ISX formats represent each DataStage job as an acyclic graph with complex relationships and extensive metadata properties. The graph-based nature of these files means that even though Git might be able to merge text based DSX files, the resulting output will contain a large number of both textual and semantic conflicts.

  • Future-proof: DataStage file formats can change over time, so building a software delivery toolchain based on the details of a specific (current) file format is strongly discouraged. DSX files were superseded by ISX files many years ago and ISX files will themselves be superseded in the near future by the introduction of DataStage Next Gen. When this happens MettleCI will be fully conversant with the new file format.