pyFlow is a tool to manage tasks in the context of a task dependency graph. It has some similarities to make. pyFlow is not a program – it is a python module, and workflows are defined using pyFlow by writing regular python code with the pyFlow API.
pyFlow has been optimized to be lightweight and simple to use for prototype/RD workflows.
- Define workflows as python code
- Run workflows on localhost or sge
- Continue workflows which have partially completed
- Task resource management: Specify number of threads and memory required for each task
- Recursive workflow specification: take any existing pyFlow object and use it as a task in another pyFlow.
- Dynamic workflow specification: define a wait on task specification rather than just tasks,
so that tasks can be defined based on the results of upstream tasks (note: recursive workflows are an even better way to do this)
- Detects and reports all failed tasks with consistent workflow-level logging.
- Task-level logging: All task stderr is logged and decorated, eg. [time][host][workflow_run][taskid]
- Task timing: Task wrapper function provides wall time for every task
- Task priority: Tasks which are simultanously eligable to run can be assigned relative priorities to be run or queued first.
- Change environment variables or working directory for each task.
- Email notification on job completion/error/exception
- Provide ongoing task summary report at specified intervals
- Specify additional external scheduler arguments (e.g. specify queue name to SGE)
- Output task graph in dot format
All release tarballs are distributed on the pyflow releases page
pyflow's only requirement is python. pyflow is supported on python 2 versions 2.4+, except note that python 2.7.2 should not be used due to a critical multithread bug in the python interpreter which impacts many pyflow runs.
To use an existing pyflow workflow or develop a new one, you may need to download or generate the latest pyflow installation tarball (see top-level README.txt on git repository)
To develop a new pyflow workflow:
Start by downloading the latest pyflow tarball (from version history section below).
Look at the demo programs. If new to pyflow the recommended order is:
- helloWorld – simplest workflow
- simpleDemo – a basic feature sandbox
- subWorkflow – shows how recursive workflow invocation works
- runOptionsDemo – shows an example of how workflow run options can be acquired from command-line arguments.
- cwdDemo – a simple demonstration of how the 'cwd' option is used on task calls.
- memoryDemo – a simple demonstration illustrating the effect of task memory requirement settings.
This demo shows a workflow which takes a bcl basecalls directory, converts to fastq and aligns with BWA, consolidating each aligned sample into a single sample BAM.
Example Workflow Graph:
Example workflow_task graph (rendered to pdf)