Document toolboxDocument toolbox

What is S2PX?

Data Migrators have created a tool called S2PX (what?) that assists in the migration of legacy DataStage Server jobs to modern DataStage jobs capable of executing on the DataStage Parallel (PX) engine.

S2PX Design Principles

It may surprise some users to discover that often S2PX does not generate the same optimal Parallel Job design that an experienced Developer would create given the the original Server Job as a model.

S2PX is a transpiler which takes the designed behaviours of your Server Jobs and makes them available in a Parallel execution environment while simultaneously replicating the complexity and (often unexpected) foibles, quirks and inconsistencies of the Server Job operating environment. It’s this need to replicate the often unintuitive behaviour of the Server environment that can lead those with limited Server Job design experience to question some of S2PX’s conversion decisions. Rest assured that all of those seemingly unusual decisions are founded on sound reasoning and a substantial body of evidence of how Server Jobs are used in the real world.

The core principles driving S2PX’s design, in no particular order, are…

  • S2PX prioritises the generation of a functionally-accurate Parallel Jobs above all else.

  • S2PX aims to support the Parallel implementation of as much Server Job functionality as possible.

  • S2PX aims to generate Parallel Job designs with a visual appearance that, as far as possible, matches that of their Server equivalents.

  • S2PX design decisions have been informed by a detailed analysis of >150K real-world Server jobs submitted by a broad range of IBM’s DataStage customers from around the world. This evidence alone is used to form our view of the DataStage Server Job landscape.

  • S2PX does not automatically convert DataStage BASIC routines into Parallel engine compatible routines, although it does provide facilities to make that processes easier.

  • S2PX uses your solution’s design-time information to inform its conversion. Runtime logs are not used to inform conversion decision.

  • S2PX generates Parallel Jobs with every stage set to run in sequential mode (for good reasons)

  • S2PX converts Hashed files to DRS stages (for good reasons)

  • S2PX runs in situ, meaning it exports Server jobs from your existing DataStage environment and delivers automatically-generated Parallel jobs into that same environment. Migrating those newly-generated Parallel jobs to an upgraded environment (e.g. DataStage v11.5 to v11.7, or even to Cloud Pak for Data) is achieved using other tools.

Converting a Job manually

If you were to undertake the conversion of a Server Job to a Parallel Job without the use of an automation capability the list of steps you would need to follow would be daunting:

  1. Generate a new Parallel Project with the necessary settings

  2. For each Server Job…

    1. Decompose the Server Job into ‘sections’ by identifying synchronisation stages where the Server Job must be split

    2. Create a new Parallel Job for each identified Server Job section.

    3. Create a Job Sequence to orchestrate the execution of the new Parallel Jobs, respecting appropriate ordering and trigger conditions

  3. For each new Parallel Job…

    1. Ensure Parameters and Parameter Set Values have been propagated from the source Server Job to the Parallel Job

    2. Identify Server-only design patterns in the source Job and transcribe into their Parallel equivalents. e.g.

      1. Server Transformer Lookups to Parallel Lookups Stages,

      2. Server Transformer Rejects to Parallel reject from downstream stage(s),

      3. etc.

    3. Identify whether the job’s design patterns will need to be implemented as multiple Parallel jobs, and if so…

      1. Identify the interfacing stage(s) between each of those Parallel jobs

      2. Ensure each Parallel job is defined with the necessary job parameters specified in the original Server job

      3. Create an orchestrating sequence to invoke the Parallel jobs in the appropriate order

      4. Ensure the sequence has the same job parameters and the original Server job, and that it propagates those parameters to the sequence Job Activity stages appropriately

    4. For each Stage

      1. Add Parallel Stages equivalent to the Server versions

      2. Add appropriate Stage Links, retaining Link ordering where appropriate

      3. Remap Server data types to appropriate Parallel equivalents

      4. Manually configure stage properties to Parallel equivalents, where possible

    5. For Transformer Stages

      1. Convert Server Transformer derivations into Parallel-compatible equivalents

      2. Identify any referenced Custom Routines and replace with a reference to a new Parallel Routine definition

      3. if not already present, create the new Parallel Routine definition for each identified Custom Routine referencing an object file containing a Parallel Routine implementation.

    6. Review and update unit tests

All of these operations are automated by Data Migrators' S2PX.

S2PX Scope

Within technical limits, S2PX will…

  • enable, via commands in the MettleCI CLI, the individual or bulk conversion of Server Jobs into functionally equivalent Parallel jobs

  • generate Parallel Jobs that are inspectable and editable in the source platform’s DataStage Designer client

  • replicate the parameters, naming, layout and functional behaviour of your legacy Server jobs as far as technically possible

  • provide a report describing which components of an existing Server solution will require manual configuration/rework before they can compile and/or execute in their target Parallel representation

  • convert MettleCI Server Unit Tests from pre-conversion Server Jobs into tests executable against post-conversion Parallel Jobs, enabling users to easily and automatically detect regression errors

  • enable users to monitor S2PX conversion progress and outcomes on a per-Job basis in an easily understood format

  • generate utility DataStage jobs required to import legacy Server Hashed Files into the Parallel equivalent storage technology (DRS Stage-compatible DB2, Oracle, or ODBC-compliant data stores)

  • provide utilities to identify which features of your Server jobs cannot be supported by the DataStage Parallel engine.

  • provide utilities to identify which features of your Server jobs will. not be supported by the S2PX conversion tool.

S2PX does not …

  • convert DataStage BASIC into another Parallel-compatible language.

  • generate the same optimised Parallel job that an experienced DataStage developer would generate to fulfil the business requirements addressed by the original Server job.

  • generate jobs which run in Parallel mode by default. All generated Parallel jobs have their stages set to run in Sequential mode by default, although this is easily changed.

  • provide any guarantees of the performance of the migrated jobs.

  • support the conversion of deprecated stages. These will be pre-processed using IBM’s Common Connector Migration Tool (CCMT) prior to conversion by S2PX.

  • support the conversion of the following Server Enterprise Applications stages:

    • JD Edwards Enterprise One Stage

    • Oracle Applications Direct Access Stage

    • Oracle Applications Hierarchy Stage

    • Siebel Business Component Access Stage

    • Siebel Direct Access Stage

    • Siebel Stage

    • ABAP Extract Stage

    • BAPI Stage


See Also

© 2015-2024 Data Migrators Pty Ltd.