AiiDA schedulers¶

Supported schedulers¶

The list below describes the supported schedulers, i.e. the batch job schedulers that manage the job queues and execution on any given computer.

PBSPro¶

The PBSPro scheduler is supported (and it has been tested with version 12.1).

All the main features are supported with this scheduler.

The JobResource class to be used when setting the job resources is the NodeNumberJobResource (PBS-like)

SLURM¶

Note

The Slurm plugin referenced below is available in the EPFL version.

The SLURM scheduler is supported (and it has been tested with version 2.5.4).

All the main features are supported with this scheduler.

The JobResource class to be used when setting the job resources is the NodeNumberJobResource (PBS-like)

SGE¶

Note

The SGE plugin referenced below is available in the EPFL version.

The SGE scheduler (Sun Grid Engine, now called Oracle Grid Engine) is supported (and it has been tested with version GE 6.2u3), together with some of the main variants/forks.

All the main features are supported with this scheduler.

The JobResource class to be used when setting the job resources is the ParEnvJobResource (SGE-like)

PBS/Torque & Loadleveler¶

PBS/Torque and Loadleveler are not fully supported yet, even if their support is one of our top priorities. For the moment, you can try the PBSPro plugin instead of PBS/Torque, that may also work for PBS/Torque (even if there will probably be some small issues).

Job resources¶

When asking a scheduler to allocate some nodes/machines for a given job, we have to specify some job resources (that typically include information as, for instance, the number of required nodes or the numbers of MPI processes per node).

Unfortunately, the way of specifying this piece of information is different on different clusters. Instead of having one only abstract class, we chose to adopt different subclasses, keeping in this way the specification of the resources as similar as possible to what the user would do when writing a scheduler script. Note that only one subclass can be used, given a specific scheduler.

The base class, from which all job resource subclasses inherit, is aiida.scheduler.datastructures.JobResource. All classes define at least one method, get_tot_num_mpiprocs(), that returns the total number of MPI processes requested.

Note

to load a specific job resource subclass, you can load it manually by directly loading the correct class, e..g.:

from aiida.scheduler.datastructures import NodeNumberJobResource

However, in general, you will pass the fields to set directly to the set_resources() method of a JobCalculation object. For instance:

calc = JobCalculation(computer=...) # select here a given computer configured
                                 # in AiiDA

# This assumes that the computer is configured to use a scheduler with
# job resources of type NodeNumberJobResource
calc.set_resources({"num_machines": 4, "num_mpiprocs_per_machine": 16})

NodeNumberJobResource (PBS-like)¶

This is the way of specifying the job resources in PBS and SLURM. The class is aiida.scheduler.datastructures.NodeNumberJobResource.

Once an instance of the class is obtained, you have the following fields that you can set:

res.num_machines: specify the number of machines (also called nodes) on which the code should run
res.num_mpiprocs_per_machine: number of MPI processes to use on each machine
res.tot_num_mpiprocs: the total number of MPI processes that this job is requesting

Note that you need to specify only two among the three fields above, for instance:

res = NodeNumberJobResource()
res.num_machines = 4
res.num_mpiprocs_per_machine = 16

asks the scheduler to allocate 4 machines, with 16 MPI processes on each machine. This will automatically ask for a total of 4*16=64 total number of MPI processes.

The same can be achieved passing the fields directly to the constructor:

res = NodeNumberJobResource(num_machines=4, num_mpiprocs_per_machine=16)

or, even better, directly calling the set_resources() method of the JobCalculation class (assuming here that calc is your calculation object):

calc.set_resources({"num_machines": 4, "num_mpiprocs_per_machine": 16})

Note

If you specify all three fields (not recommended), make sure that they satisfy:

res.num_machines * res.num_mpiprocs_per_machine = res.tot_num_mpiprocs

Moreover, if you specify res.tot_num_mpiprocs, make sure that this is a multiple of res.num_machines and/or res.num_mpiprocs_per_machine.

Note

When creating a new computer, you will be asked for a default_mpiprocs_per_machine. If you specify it, then you can avoid to specify num_mpiprocs_per_machine when creating the resources for that computer, and the default number will be used.

Of course, all the requirements between num_machines, num_mpiprocs_per_machine and tot_num_mpiprocs still apply.

Moreover, you can explicitly specify num_mpiprocs_per_machine if you want to use a value different from the default one.

ParEnvJobResource (SGE-like)¶

In SGE and similar schedulers, one has to specify a parallel environment and the total number of CPUs requested. The class is aiida.scheduler.datastructures.ParEnvJobResource.

Once an instance of the class is obtained, you have the following fields that you can set:

res.parallel_env: specify the parallel environment in which you want to run your job (a string)
res.tot_num_mpiprocs: the total number of MPI processes that this job is requesting

Remember to always specify both fields. No checks are done on the consistency between the specified parallel environment and the total number of MPI processes requested (for instance, some parallel environments may have been configured by your cluster administrator to run on a single machine). It is your responsibility to make sure that the information is valid, otherwise the submission will fail.

Some examples:

setting the fields one by one:

res = ParEnvJobResource()
res.parallel_env = 'mpi'
res.tot_num_mpiprocs = 64

setting the fields directly in the class constructor:

res = ParEnvJobResource(parallel_env='mpi', tot_num_mpiprocs=64)

even better, directly calling the set_resources() method of the JobCalculation class (assuming here that calc is your calculation object):
```
calc.set_resources({"parallel_env": 'mpi', "tot_num_mpiprocs": 64})
```