Info | ||
---|---|---|
| ||
Why the ISX format was chosen to be kept in GIT and not a non-binary format (DSX, XML, pjb/bin)? As there is a custom script running against an asset maybe the script could unzip the ISX and get XML/DSX content which could be put to Git? |
Why ISX?
ISX's are compressed, and Git only stores deltas so the disk footprint is minimised
Git will attempt a merge on nonbinary files, which would likely result in the corruption of the asset
ISX is the only format which has support for all Information Server asset types (not just those in DataStage) so is therefore more useful, and futureproof.
There are multiple ISX format variations:
Vanilla Useable as a flexible,
single job version by all tools
Information Server
Manager-specific ISX format.
ISTool releases (multiple job versions)
We use the 'vanilla' ISX format (1) which contains a single version of the job which can be exported/imported without tooling restrictions.
Although DSX and the
ISX wrapped XML are ASCII readable text, they are not easily 'human readable', and there is not, in our view, any value in inspecting a job's source code outside of the DataStage designer. DataStage jobs cannot be loaded into an external diff tool to identify the changes between them, and DataStage job exports contain stages in a nondeterministic order, meaning that two successive exports of an unaltered job may mistakenly identify significant change when compared using a traditional diff tool. The only way to read and understand a job or compare differences between two jobs is to let DataStage load them into memory and present a logical representation which is human readable. This process remains the same regardless of whether the export is encoded using binary or ASCII.
Info | ||
---|---|---|
| ||
Do you normalise the XML in the ISX file, so an export of the same twice job looks equal? Equality checking is easy done with DSX exports (just leave out the header) XML apparently needs normalisation first. |
Both ISX and DSX formats include a lot more information that changing update, export and compile dates. They also contain view port position, zoom levels and snap to grid. They also contain a lot of "nonfunctional" information such as link label positions. Normalising the export might cut down some noise but does not provide a robust way to determining if a given job has changed between checkins.
Earlier iterations of MettleCI did not provide the quick and easy checkin process that we have already demonstrated. Checking in jobs manually was tedious and time consuming, so checkins were usually performed in bulk. Since there was so much time between a change being made and a developer performing the checkin, there were a high amount of "false" checkins. We did use normalisation to reduce some of the noise but it wasn't robust enough to identify all unchanged jobs.
Since introducing the MettleCI checkin process, it is very rare for developers to checkin a job even though they haven't made any changes. On the odd occasion that it does occur, there isn't really any impact the job will be retested in the CI build and a little more space (kilobytes worth) will be used up by Git storage.
If, for some reason, developers are regularly unsure of what has changed, then the team should question why there is such a long gap between modifying a job and checking it in.