TaskList Handlers
TaskList Handlers are responsible for parsing the contents of a tasklist into a set of commands to execute (tasks). This reduces the amount of clutter in each task definition and common tasks (ie: athena job options) reusable across tasklists.
The desired tasklist handler is selected using the --tasklist option to
the PyTaskFarmer program.
The ListTaskList handler is always available under the name
default. See Provided TaskList Handlers for the list of tasklists
handlers shipped with PyTaskFarmer.
Defining TaskList Handlers
TaskList Handler definitions are loaded from pytaskfarmer/tasklists.d and the current working directory. All files ending in .ini are loaded and are expected to be the INI format.
The following scheme is expected:
[tasklisthandlername]
TaskList = tasklist.python.class
Arg0 = value0
Arg1 = value1
The extra arguments are passed to the TaskList constructor as keyword arguments.
Provided TaskList Handlers
All TaskList Hander contructions take path and workdir as the
two positional arguments. They are automatically set by the PyTaskFarmer program
and should not be specified by the user.
Generic Handlers
- class taskfarmer.task.ListTaskList(path, workdir)
A list of tasks is defined using a file containing a task per line, with supporting status files defined using a suffix. The task ID is defined as the line number (starting at 0) inside the main task file.
Subclasses can implement the
__getitem__function to futher modify the task definitions. The original line/task content is stored in the tasks member variable. By default, the tasks[taskid] is returned unmodified.All supporting status files are stored inside the workdir. The used files are:
toprocess: List of tasks that still need to be processed. The format is
taskID task.finished: List of tasks that succesfully finished (return code 0). The format is
taskID task.failed: List of tasks that finished unsuccesfully (return code not 0). The format is
taskID task.
The list and corresponding operations are defined in a process-safe manner using the supporting files to synchronize the state. This means that multiple ListTaskLists can be created for a single tasklist (even on multiple machines with a shared filesystem).
- __init__(path, workdir)
- Parameters
- pathstr
Path to tasklist.
- workdirstr
Path to work directory.
ATLAS Handlers
- class taskfarmer.atlas.TransformTaskList(path, workdir, transform, input, output, maxEventsPerJob=None, **kwargs)
Run an ATLAS transform on input ROOT files.
See the
__init__function on details how to configure this tasklist handler. A simple example for running no pileup digitization is below.[digi] TaskList = taskfarmer.atlas.TransformTaskList transform = Reco_tf.py input = HITS output = RDO autoConfiguration = everything digiSteeringConf = StandardInTimeOnlyTruth conditionsTag = default:OFLCOND-MC16-SDR-RUN2-06 geometryVersion = default:ATLAS-R2-2016-01-00-01 postInclude = default:PyJobTransforms/UseFrontier.py preInclude = HITtoRDO:Campaigns/MC16NoPileUp.py preExec = all:from ParticleBuilderOptions.AODFlags import AODFlags; AODFlags.ThinGeantTruth.set_Value_and_Lock(False);' 'HITtoRDO:from Digitization.DigitizationFlags import digitizationFlags; digitizationFlags.OldBeamSpotZSize = 42
The
TransformTaskListsupports splitting each input file into multiple tasks, based on a maximum number of events. However, when practical, it is recommeded to use AthenaMP for parallelizing event processing. This has a reduced memory footprint. AthenaMP can enabled by including the following in your tasklist handler defintion.athenaopt = all:--nprocs=64
or by setting the
ATHENA_PROC_NUMBERenvironmental variable.The transform output is stored in the current working directory. It is then copied to the workdir using rsync. This two stage process is required due to how AthenaMP determines its temporary outputs. The implication is that the runner needs to run the command using bash.
- __init__(path, workdir, transform, input, output, maxEventsPerJob=None, **kwargs)
The kwargs are interpreted as arguments to the transform command. For example, having an kwarg of
kwargs['postInclude']="HITtoRDO:Campaigns/MC16NoPileUp.py"translates into a transform argument of--postInclude='HITtoRDO:Campaigns/MC16NoPileUp.py'. Note the automatic wrapping of the value string inside singlue quotes. These are automatically by the added by this tasklist handler.- Parameters
- pathstr
Path to tasklist
- workdirstr
Path to work directory
- transformstr
Name of transform (ie:
Sim_tf.py)- inputstr
Type of input file (ie:
EVNT)- outputstr
Type of output file (ie:
HITS)- maxEventsPerJobstr, optional
Maximum number of events per task
- kwargs
Arguments passed to athena as
--key='value'.
- class taskfarmer.atlas.AthenaTaskList(path, workdir, jobOptions, output, maxEventsPerJob=None, **kwargs)
Run an athena job on input ROOT files.
See the
__init__function on details how to configure this tasklist handler. A simple example for running no pileup digitization is below.The job options need to use the built-in athena support for input files (ie:
--filesInput).The
AthenaTaskListsupports splitting each input file into multiple tasks, based on a maximum number of events. However, when practical, it is recommeded to use AthenaMP for parallelizing event processing. This has a reduced memory footprint. AthenaMP can enabled by including the following in your tasklist handler defintion.nprocs = 64
or by setting the
ATHENA_PROC_NUMBERenvironmental variable.The output file name is set as the
outputsetting. The handler looks for it in the current working directory and then copies it to the workdir using rsync. This two stage process is required due to how AthenaMP determines its temporary outputs. The implication is that the runner needs to run the command using bash.- __init__(path, workdir, jobOptions, output, maxEventsPerJob=None, **kwargs)
The kwargs are interpreted as arguments the the athena command. For
kwargs['postInclude']="HITtoRDO:Campaigns/MC16NoPileUp.py"translates into an athena argument of--postInclude='HITtoRDO:Campaigns/MC16NoPileUp.py'. Note the automatic wrapping of the value string inside singlue quotes. These are automatically added by this tasklist handler.- Parameters
- pathstr
Path to tasklist.
- workdirstr
Path to work directory.
- jobOptionsstr
Name of jobOptions file to execute.
- outputstr
Expected name of output file.
- maxEventsPerJobstr, optional
Maximum number of events per task.
- kwargs
Arguments passed to athena as
--key='value'.