src.pyflow.WorkflowRunner

run(self, mode=`'local'`, dataDirRoot=`'.'`, isContinue=False, isForceContinue=False, nCores=None, memMb=None, isDryRun=False, retryMax=2, retryWait=90, retryWindow=360, retryMode=`'nonlocal'`, mailTo=None, updateInterval=60, schedulerArgList=None, isQuiet=False, warningLogFile=None, errorLogFile=None, successMsg=None, startFromTasks=None, ignoreTasksAfter=None, resetTasks=None)

Call this method to execute the workflow() method overridden in a child class and specify the resources available for the workflow to run.

Task retry behavior: Retry attempts will be made per the arguments below for distributed workflow runs (eg. sge run mode). Note this means that retries will be attempted for tasks with an 'isForceLocal' setting during distributed runs.

Task error behavior: When a task error occurs the task manager stops submitting new tasks and allows all currently running tasks to complete. Note that in this case 'task error' means that the task could not be completed after exhausting attempted retries.

Workflow exception behavior: Any exceptions thrown from the python code of classes derived from WorkflowRunner will be logged and trigger notification (e.g. email). The exception will not come down to the client's stack. In sub-workflows the exception is handled exactly like a task error (ie. task submission is shut-down and remaining tasks are allowed to complete). An exception in the master workflow will lead to workflow termination without waiting for currently running tasks to finish.

Parameters:

mode - Workflow run mode. Current options are (local|sge)
dataDirRoot - All workflow data is written to {dataDirRoot}/pyflow.data/ These include workflow/task logs, persistent task state data, and summary run info. Two workflows cannot simultaneously use the same dataDir.
isContinue - If True, continue workflow from a previous incomplete run based on the workflow data files. You must use the same dataDirRoot as a previous run for this to work. Set to 'Auto' to have the run continue only if the previous dataDir exists. (default: False)
isForceContinue - Only used if isContinue is not False. Normally when isContinue is run, the commands of completed tasks are checked to ensure they match. When isForceContinue is true, failing this check is reduced from an error to a warning
nCores - Total number of cores available, or 'unlimited', sge is currently configured for a maximum job count of 128, any value higher than this in sge mode will be reduced to the maximum. (default: 1 for local mode, 128 for sge mode)
memMb - Total memory available (in megabytes), or 'unlimited', Note that this value will be ignored in non-local modes (such as sge), because in this case total memory available is expected to be known by the scheduler for each node in its cluster. (default: 2048*nCores for local mode, 'unlimited' for sge mode)
isDryRun - List the commands to be executed without running them. Note that recursive and dynamic workflows will potentially have to account for the fact that expected files will be missing -- here 'recursive workflow' refers to any workflow which uses the addWorkflowTask() method, and 'dynamic workflow' refers to any workflow which uses the waitForTasks() method. These types of workflows can query this status with the isDryRun() to make accomadations. (default: False)
retryMax - Maximum number of task retries
retryWait - Delay (in seconds) before resubmitting task
retryWindow - Maximum time (in seconds) after the first task submission in which retries are allowed. A value of zero or less puts no limit on the time when retries will be attempted. Retries are always allowed (up to retryMax times), for failed make jobs.
retryMode - Modes are 'nonlocal' and 'all'. For 'nonlocal' retries are not attempted in local run mode. For 'all' retries are attempted for any run mode. The default mode is 'nonolocal'.
mailTo - An email address or container of email addresses. Notification will be sent to each email address when either (1) the run successfully completes (2) the first task error occurs or (3) an unhandled exception is raised. The intention is to send one status message per run() indicating either success or the reason for failure. This should occur for all cases except a host hardware/power failure. Note that mail comes from 'pyflow-bot@csaunders-ubuntu64' (configurable), which may be classified as junk-mail by your system.
updateInterval - How often (in minutes) should pyflow log a status update message summarizing the run status. Set this to zero or less to turn the update off.
schedulerArgList - A list of arguments can be specified to be passed on to an external scheduler when non-local modes are used (e.g. in sge mode you could pass schedulerArgList=['-q','work.q'] to put the whole pyflow job into the sge work.q queue)
isQuiet - Don't write any logging output to stderr (but still write log to pyflow_log.txt)
warningLogFile - Replicate all warning messages to the specified file. Warning messages will still appear in the standard logs, this file will contain a subset of the log messages pertaining to warnings only.
errorLogFile - Replicate all error messages to the specified file. Error messages will still appear in the standard logs, this file will contain a subset of the log messages pertaining to errors only. It should be empty for a successful run.
successMsg - Provide a string containing a custom message which will be prepended to pyflow's standard success notification. This message will appear in the log and any configured notifications (e.g. email). The message may contain linebreaks.
startFromTasks (A single string, or set, tuple or list of strings) - A task label or container of task labels. Any tasks which are not in this set or descendants of this set will be marked as completed.
ignoreTasksAfter (A single string, or set, tuple or list of strings) - A task label or container of task labels. All descendants of these task labels will be ignored.
resetTasks (A single string, or set, tuple or list of strings) - A task label or container of task labels. These tasks and all of their descendants will be reset to the "waiting" state to be re-run. Note this option will only affect a workflow which has been continued from a previous run. This will not override any nodes altered by the startFromTasks setting in the case that both options are used together.

Returns:

0 if all tasks completed successfully and 1 otherwise

Class WorkflowRunner

addTask(self, label, command=None, cwd=None, env=None, nCores=1, memMb=2048, dependencies=None, priority=0, isForceLocal=False, isCommandMakePath=False, isTaskStable=True, mutex=None, retryMax=None, retryWait=None, retryWindow=None, retryMode=None)

addWorkflowTask(self, label, workflowRunnerInstance, dependencies=None)

waitForTasks(self, labels=None)

isTaskComplete(self, taskLabel)

isTaskDone(self, taskLabel)

cancelTaskTree(self, taskLabel)

getRunMode(self)

getNCores(self)

limitNCores(self, nCores)

getMemMb(self)

limitMemMb(self, memMb)

isDryRun(self)

runModeDefaultCores(mode) Static Method

flowLog(self, msg, logState=1)

workflow(self)

_isTaskCompleteCore(self, namespace, taskLabel)

_setRunning(self, *args, **kw)

_getRunning(self, *args, **kw)

runModeDefaultCores(mode)
Static Method