Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. DSX – The older Information Server export format that supports only a subset of all current Information Server asset types. NOTE: This format is also available in a ‘7-bit encoding’; variant in which any ASCII character above DEL (127) is encoded as \nnn where nnn is the relevant character’s ASCII number. This is useful when transmitting the resulting DSX via constrained communications channels.

  2. XML (referred to in the DataStage documentation as ‘Legacy XML')

  3. ISX – A compressed XML (binary) based format which supports all Information Server asset types. To add further complexity to this picture the ISX format exists in multiple variations:

    1. The ‘default’ format, useable by all IBM tools to represent individual DataStage jobs,

    2. An Information Server Manager-­specific ISX format, and

    3. An istool-specific version, use when creating ‘releases’ and which supports multiple assets.

Many of these operations relay on the unique capabilities provided by the ISX format which is the principal reason MettleCI uses the 'default' ISX format (3a. in the list above) with each file containing a single asset. For DataStage Jobs, the DSX and ISX formats allow you to choose whether you want your export to include Job design information only, encoded-binary executable information, or both. When you choose to create an export incorporating only Job design information then that file will need to be re-compiled in any target project into which it is subsequently imported.

...

Each format occupies varying amounts of disk space, with the design-only ISX format occupying the least. By way of example here is a set of exports (of a simple 3-stage Parallel Job) using various formats, order by increasing file size:

...

Why does MettleCI use ISX files?

MettleCI provides the mechanisms for for…

  • committing DataStage artefacts to a Git repository,

  • testing them against Compliance Rules

...

  • ,

  • testing them against Unit Tests,

...

  • deploying them to downstream environments

...

...

There are a number of good reasons to select ISX as the management format for Information Server artefacts:

  • ISX is the only format which has support for all Information Server asset types (not just those in DataStage) so is more useful

  • Git will attempt a merge on non-­binary (ASCII text) files such as DSX’s which would almost certainly result in the corruption of the asset (i.e. it would not be re-importable.) ISX’s binary format means that it will not suffer this problem.

  • Each available format occupies varying amounts of disk space, with the design-only ISX format used by MettleCI occupying the least. By way of example here’s a comparison of the disk footprints of the same job exported using various DataStage export formats:

    Image Added

When submitting assets to Git MettleCI commits and pushes files containing only contain design information and which use the ‘default’ flavour of the ISX format. Executable information is deliberately excluded but can be reconstituted further down the delivery pipeline for organisations who have a mandate which prevents the compilation of code in Production environments. See Deploying DataStage Binaries for more details.

The format in which DataStage artefacts are stored in Git only matters to a customer when they want to do something with those artefacts beyond what can be achieved using MettleCI’s capabilities, most commonly accessed using its Command Line Interface. Many users have an historical attachment to DSX files for a variety of reasons:

...

Benefit

...

Reason

...

MettleCI Mitigation

...

ASCII is easily 'inspectable'

...

ASCII format means DSX files can be subjected to traditional test-based diffing techniques to identify the changes between code versions

...

While DSX’s may be human readable they are certainly not human parsable.

...

MettleCI


...

Can be modified by scripts (DSX Split?)

...

Can be modified by scripts (Parameter updates)

...

Can be parsed for anti-patterns or security violations

...

MettleCI provides a library of Compliance Rules to scan for coding anti-patterns, standards non-compliance, security violations, and code maintainability issues. these rules are easily modified or extended as necessary.

The perception that ISX files are opaque binary files to which existing build and deployment processes and can include the compiled version of the job.

File Contents

Treat the export as opaque

While the DSX format might be text based like traditional source code, both DSX and ISX formats represent each DataStage job as an acyclic graph with complex relationships and extensive metadata properties. The graph-based nature of these files means that even though Git might be able to merge text based DSX files, the resulting output will contain a large number of both textual and semantic conflicts. To make matters worse, a developer could open the exported files in a text editor but, since these are not intended as human readable, it is virtually impossible to fully understand the DataStage asset they represent, making conflict resolution near impossible. For all practical purposes DataStage exports should be considered as binary files within the context of a version. control system.

Git will still be able to perform a merge, but if two different versions of the same DataStage export are detected Git will report a conflict and the versions will need to be manually merged. Since DataStage does not include any tools for merging two DataStage exports the only way to resolve a conflict is for a developer to ‘eye-ball’ each version in the DataStage user interface and manually construct the merged version. This process is time consuming, tedious, and error prone.

Conclusions

MettleCI means that the file format used to store your jobs in Git version control doesn’t matter. This is because:

Your build and deployment process won’t be relying on custom text-processing scripts to modify the controlled assets.

You won’t be relying on traditional test-based diffing techniques to identify the changes between code versions.

MettleCI uses the 'default' ISX format (1) which contains a single version of the job which can be exported and imported without imposing restrictions about the tooling which ca be used.

The DSX format (regardless of encoding) is a proprietary ASCII-based format.

The ISX format

wrapped XML are ASCII readable text, they are Common objections to ISX files

The ISX format isn effectively a compressed XML file which is not easily 'human readable', and there is little to no value inspecting a DataStage Job's source code outside of the DataStage designer itself.

...

DataStage Jobs cannot be loaded into an external diff tool to identify the changes between them, and DataStage job exports contain stages in a non­deterministic order, meaning that two successive exports of an unaltered job may mistakenly identify significant change when compared using a traditional diff tool. The only way to read and understand a job or compare differences between two jobs is to let DataStage load them into memory and present a logical representation which is human readable. This process remains the same regardless of whether the export is encoded using binary or ASCII.

There are a number of good reasons to select ISX as the management format for Information Server artefacts:

...

ISX's are compressed, and Git only stores deltas so the disk footprint is minimised

...

Git will attempt a merge on non-­binary files, which would likely result in the corruption of the asset

...


Both ISX and DSX formats include a lot more information that changing update, export and compile dates. They also contain view port position, zoom levels and snap to grid. They also contain a lot of "non­functional" information such as link label positions. Normalising the export might cut down some noise but does not provide a robust way to determining if a given job has changed between check­ins.

...

If, for some reason, developers are regularly unsure of what has changed, then the team should question why there is such a long gap between modifying a job and checking it in.

JMcC:

ISX file size wouldn’t be a valid reason to object. The differences between the design-time ISX and DSX export formats are negligible with respect to per-repository storage limits for Git server platforms like Github. If they start using MettleCI’s Unit Test feature, the per-test data files (input and expect output) will likely eclipse the storage footprint of their Jobs before long, anyway. (edited)


Justin McCamish:speech_balloon: 4 hours agoFiles formats are Opaque

The proprietary formats in which Microsoft Word or Excel, for example, stores their assets is not questioned by customers because those tools provide all the facilities necessary to manipulate those files in order to achieve a business outcome. Customers have no need to resort to external tools to inspect or manipulate those files, and so the files can be considered an opaque storage layer, and their respective formats an implementation detail.

Similarly, the format in which DataStage artefacts are stored in Git only matters to a customer when they perceive a need to do something with those artefacts beyond what can be achieved using DataStage’s or MettleCI’s capabilities. Many users have an historical attachment to DSX files for various reasons:

  • Inspectable: The DSX file’s plain ASCII format is often perceived as easily 'inspectable', and able to be read by humans to verify its content.

  • Transparent: Also, the ‘transparency’ of an ASCII-based format satisfies stringent governance requirements.

  • Searchable: Can be parsed for anti-patterns or security violations

  • Modifiable: Can be modified by scripts (DSX Split?, Parameter updates)

  • Testable: ASCII format means DSX files can be subjected to traditional test-based diffing techniques to identify the changes between code version

Where we’ve encountered initial anxiety about adopting ISXs in the past, it usually boils down to either:

...

A misunderstanding about ISX files being intrinsically “binary” as opposed to containing a less-human-readable-but-more-detailed representation of a Job design than DSX. As you know, both formats offer the optional inclusion of a compiled binary payload but MettleCI doesn’t need or use that in any its functions.

In summary, there are no sound software delivery process which require the direct interrogation or modification of code in the source code repository. Indeed, this is not the role of a source code repository and undermines the good governance.

File formats can (and will) change in the near future, so building a software delivery toolchain based on the details of a specific current file format is strongly discouraged.

will be founded on the need to manually inspect a non-human parseable proprietary file format such as DataStage DSX files.

While DSX’s may be human readable they are certainly not human parsable.

MettleCI

There is no idnetifable requirement to modifiDSX’s whcih is not already handled for you using the standardised MettleCI capabilities.

MettleCI provides a library of Compliance Rules to scan for coding anti-patterns, standards non-compliance, security violations, and code maintainability issues. these rules are easily modified or extended as necessary.

The perception that ISX files are opaque binary files to which existing build and deployment processes and can include the compiled version of the job.

File Contents

While the DSX format might be text based like traditional source code, both DSX and ISX formats represent each DataStage job as an acyclic graph with complex relationships and extensive metadata properties. The graph-based nature of these files means that even though Git might be able to merge text based DSX files, the resulting output will contain a large number of both textual and semantic conflicts. To make matters worse, a developer could open the exported files in a text editor but, since these are not intended as human readable, it is virtually impossible to fully understand the DataStage asset they represent, making conflict resolution near impossible. For all practical purposes DataStage exports should be considered as binary files within the context of a version. control system.

Git will still be able to perform a merge, but if two different versions of the same DataStage export are detected Git will report a conflict and the versions will need to be manually merged. Since DataStage does not include any tools for merging two DataStage exports the only way to resolve a conflict is for a developer to ‘eye-ball’ each version in the DataStage user interface and manually construct the merged version. This process is time consuming, tedious, and error prone.

Conclusions

MettleCI means that the file format used to store your jobs in Git version control doesn’t matter. This is because:

Your build and deployment process won’t be relying on custom text-processing scripts to modify the controlled assets.

You won’t be relying on traditional test-based diffing techniques to identify the changes between code versions.

JMcC:

ISX file size wouldn’t be a valid reason to object. The differences between the design-time ISX and DSX export formats are negligible with respect to per-repository storage limits for Git server platforms like Github. If they start using MettleCI’s Unit Test feature, the per-test data files (input and expect output) will likely eclipse the storage footprint of their Jobs before long, anyway. (edited)