WDL2CWL converter: Converting WDL to CWL

WDL2CWL converter: Converting WDL to CWL

"You can never understand one language until you understand at least two."

Quick Notes:.

A language is a system of communicating ideas and passing information.

A scientific workflow is the description of a process for accomplishing a scientific objective, usually expressed in terms of tasks and their dependencies.

A workflow language is a standard for describing computational data-analysis workflows.

There are many motivation for creating scientific workflows. These include:

  • providing an easy-to-use environment for individual application scientists themselves to create their own workflows.
  • providing interactive tools for the scientists enabling them to execute their workflows and view their results in real-time.
  • simplifying the process of sharing and reusing workflows between the scientists. enabling scientists to track the provenance of the workflow execution results and the workflow creation steps.

According to Wikipedia, more than 280 scientific workflows have been identified but not all workflows are the same.

55026190-e3822c00-4fd8-11e9-811e-e5c9d013b7f3.png

WDL

The Workflow Description Language (WDL) is a way to specify data processing workflows with a human-readable and write-able syntax. WDL makes it straightforward to define analysis tasks, chain them together in workflows, and parallelize their execution.

WDL was originally developed for genome analysis pipelines by the Broad Institute. The OpenWDL community was formed to steward the WDL language specification and advocate its adoption in the spirit of Open Source for all.

CWL

The Common Workflow Language (CWL) is an open standard for describing analysis workflows and tools in a way that makes them portable and scalable across a variety of software and hardware environments, from workstations to cluster, cloud, and high performance computing (HPC) environments.

CWL is designed to meet the needs of data-intensive science, such as Bioinformatics, Medical Imaging, Astronomy, High Energy Physics, and Machine Learning. It produces free and open standards for describing command-line tool based workflows.

Standardizing Computational Reuse and
Portability with the Common Workflow Language

The CWL Command Line Tool Description Standard is to describe how a particular command line tool works: what are the inputs and parameters and their types; how to add the correct flags and switches to the command line invocation; and where to find the output files.

The Divide

While both WDL and CWL allow for portable specification of scientific workflows for processing data, CWL excels at:

  • Providing a separation of concerns between workflow authors and workflow platforms.
  • Supporting critical workflow concepts like automation, scalability, abstraction, provenance, portability, and re-usability.
  • Using declarative syntax, which facilitates polylingual workflow tasks.
  • Providing expressions in an already familiar format (JavaScript syntax)

The Converter

The wdl2cwl converter is a work in progress translator that converts OpenWDL v1.1 to CWL v1.2. The goal of this project is to develop a translator that takes a WDL workflow and produces an equivalent workflow in CWL. When executed with the same input, the translated workflow should produce equivalent results to the original workflow. This was originally

A more Object Oriented Converter using miniWDL

miniWDL (pronounced mini-wi-dle) is a python package for running WDL locally. It converts the input WDL file into a miniWDL object that is made-up of trees and nodes. This object provides a level of abstraction that can be used to extract the various workflows, tasks, inputs, outputs, commands and other components of the inputted WDL file.

The project and I

As part of extending the functionality of the wdl2cwl converter my mentor, Michael R. Crusoe, insisted that I use the miniWDL object to implement an object oriented version of the converter. This actually allows for a more robust adaptation of WDL functionalities.

With this project, the converter should be able to use traverse the tree generated by the miniWDL object to translate any WDL component into it's CWL inputs, outputs and requirements equivalent.

More to come

Currently the new implementation is still in the earliest stage. The idea is to approach the new implementation one file at a time. With the help from my mentors, Michael R. Crusoe and Bruno P. Kinoshita, the next few days will be spent converting more files to reveal the true form of the Converter class.