TaskList Handlers

TaskList Handlers are responsible for parsing the contents of a tasklist into a set of commands to execute (tasks). This reduces the amount of clutter in each task definition and common tasks (ie: athena job options) reusable across tasklists.

The desired tasklist handler is selected using the --tasklist option to the PyTaskFarmer program.

The ListTaskList handler is always available under the name default. See Provided TaskList Handlers for the list of tasklists handlers shipped with PyTaskFarmer.

Defining TaskList Handlers

TaskList Handler definitions are loaded from pytaskfarmer/tasklists.d and the current working directory. All files ending in .ini are loaded and are expected to be the INI format.

The following scheme is expected:

[tasklisthandlername]
TaskList = tasklist.python.class
Arg0 = value0
Arg1 = value1

The extra arguments are passed to the TaskList constructor as keyword arguments.

Provided TaskList Handlers

All TaskList Hander contructions take path and workdir as the two positional arguments. They are automatically set by the PyTaskFarmer program and should not be specified by the user.

Generic Handlers

class taskfarmer.task.ListTaskList(path, workdir)

A list of tasks is defined using a file containing a task per line, with supporting status files defined using a suffix. The task ID is defined as the line number (starting at 0) inside the main task file.

Subclasses can implement the __getitem__ function to futher modify the task definitions. The original line/task content is stored in the tasks member variable. By default, the tasks[taskid] is returned unmodified.

All supporting status files are stored inside the workdir. The used files are:

  • toprocess: List of tasks that still need to be processed. The format is taskID task.

  • finished: List of tasks that succesfully finished (return code 0). The format is taskID task.

  • failed: List of tasks that finished unsuccesfully (return code not 0). The format is taskID task.

The list and corresponding operations are defined in a process-safe manner using the supporting files to synchronize the state. This means that multiple ListTaskLists can be created for a single tasklist (even on multiple machines with a shared filesystem).

__init__(path, workdir)
Parameters
pathstr

Path to tasklist.

workdirstr

Path to work directory.

ATLAS Handlers

class taskfarmer.atlas.TransformTaskList(path, workdir, transform, input, output, maxEventsPerJob=None, **kwargs)

Run an ATLAS transform on input ROOT files.

See the __init__ function on details how to configure this tasklist handler. A simple example for running no pileup digitization is below.

[digi]
TaskList = taskfarmer.atlas.TransformTaskList
transform = Reco_tf.py
input = HITS
output = RDO
autoConfiguration = everything
digiSteeringConf = StandardInTimeOnlyTruth
conditionsTag = default:OFLCOND-MC16-SDR-RUN2-06
geometryVersion = default:ATLAS-R2-2016-01-00-01
postInclude = default:PyJobTransforms/UseFrontier.py
preInclude = HITtoRDO:Campaigns/MC16NoPileUp.py
preExec = all:from ParticleBuilderOptions.AODFlags import AODFlags; AODFlags.ThinGeantTruth.set_Value_and_Lock(False);' 'HITtoRDO:from Digitization.DigitizationFlags import digitizationFlags; digitizationFlags.OldBeamSpotZSize = 42

The TransformTaskList supports splitting each input file into multiple tasks, based on a maximum number of events. However, when practical, it is recommeded to use AthenaMP for parallelizing event processing. This has a reduced memory footprint. AthenaMP can enabled by including the following in your tasklist handler defintion.

athenaopt = all:--nprocs=64

or by setting the ATHENA_PROC_NUMBER environmental variable.

The transform output is stored in the current working directory. It is then copied to the workdir using rsync. This two stage process is required due to how AthenaMP determines its temporary outputs. The implication is that the runner needs to run the command using bash.

__init__(path, workdir, transform, input, output, maxEventsPerJob=None, **kwargs)

The kwargs are interpreted as arguments to the transform command. For example, having an kwarg of kwargs['postInclude']="HITtoRDO:Campaigns/MC16NoPileUp.py" translates into a transform argument of --postInclude='HITtoRDO:Campaigns/MC16NoPileUp.py'. Note the automatic wrapping of the value string inside singlue quotes. These are automatically by the added by this tasklist handler.

Parameters
pathstr

Path to tasklist

workdirstr

Path to work directory

transformstr

Name of transform (ie: Sim_tf.py)

inputstr

Type of input file (ie: EVNT)

outputstr

Type of output file (ie: HITS)

maxEventsPerJobstr, optional

Maximum number of events per task

kwargs

Arguments passed to athena as --key='value'.

class taskfarmer.atlas.AthenaTaskList(path, workdir, jobOptions, output, maxEventsPerJob=None, **kwargs)

Run an athena job on input ROOT files.

See the __init__ function on details how to configure this tasklist handler. A simple example for running no pileup digitization is below.

The job options need to use the built-in athena support for input files (ie: --filesInput).

The AthenaTaskList supports splitting each input file into multiple tasks, based on a maximum number of events. However, when practical, it is recommeded to use AthenaMP for parallelizing event processing. This has a reduced memory footprint. AthenaMP can enabled by including the following in your tasklist handler defintion.

nprocs = 64

or by setting the ATHENA_PROC_NUMBER environmental variable.

The output file name is set as the output setting. The handler looks for it in the current working directory and then copies it to the workdir using rsync. This two stage process is required due to how AthenaMP determines its temporary outputs. The implication is that the runner needs to run the command using bash.

__init__(path, workdir, jobOptions, output, maxEventsPerJob=None, **kwargs)

The kwargs are interpreted as arguments the the athena command. For kwargs['postInclude']="HITtoRDO:Campaigns/MC16NoPileUp.py" translates into an athena argument of --postInclude='HITtoRDO:Campaigns/MC16NoPileUp.py'. Note the automatic wrapping of the value string inside singlue quotes. These are automatically added by this tasklist handler.

Parameters
pathstr

Path to tasklist.

workdirstr

Path to work directory.

jobOptionsstr

Name of jobOptions file to execute.

outputstr

Expected name of output file.

maxEventsPerJobstr, optional

Maximum number of events per task.

kwargs

Arguments passed to athena as --key='value'.