Module for classes and utilities to interact with cluster schedulers.
aiida.schedulers.
JobState
Bases: enum.Enum
enum.Enum
Enumeration of possible scheduler states of a CalcJob.
There is no FAILED state as every completed job is put in DONE, regardless of success.
DONE
QUEUED
QUEUED_HELD
RUNNING
SUSPENDED
UNDETERMINED
__module__
JobResource
Bases: aiida.common.extendeddicts.DefaultFieldsAttributeDict
aiida.common.extendeddicts.DefaultFieldsAttributeDict
Data structure to store job resources.
Each Scheduler implementation must define the _job_resource_class attribute to be a subclass of this class. It should at least define the get_tot_num_mpiprocs method, plus a constructor to accept its set of variables.
Typical attributes are:
num_machines
num_mpiprocs_per_machine
or (e.g. for SGE)
tot_num_mpiprocs
parallel_env
The constructor should take care of checking the values. The init should raise only ValueError or TypeError on invalid parameters.
__abstractmethods__
_abc_impl
_default_fields
accepts_default_mpiprocs_per_machine
Return True if this subclass accepts a default_mpiprocs_per_machine key, False otherwise.
get_tot_num_mpiprocs
Return the total number of cpus of this job resource.
get_valid_keys
Return a list of valid keys to be passed to the constructor.
validate_resources
Validate the resources against the job resource class of this scheduler.
kwargs – dictionary of values to define the job resources
ValueError – if the resources are invalid or incomplete
optional tuple of parsed resource settings
JobTemplate
A template for submitting jobs to a scheduler.
This contains all required information to create the job header.
The required fields are: working_directory, job_name, num_machines, num_mpiprocs_per_machine, argv.
Fields:
shebang line: The first line of the submission script submit_as_hold: if set, the job will be in a ‘hold’ status right after the submission rerunnable: if the job is rerunnable (boolean) job_environment: a dictionary with environment variables to set before the execution of the code. working_directory: the working directory for this job. During submission, the transport will first do a ‘chdir’ to this directory, and then possibly set a scheduler parameter, if this is supported by the scheduler. email: an email address for sending emails on job events. email_on_started: if True, ask the scheduler to send an email when the job starts. email_on_terminated: if True, ask the scheduler to send an email when the job ends. This should also send emails on job failure, when possible. job_name: the name of this job. The actual name of the job can be different from the one specified here, e.g. if there are unsupported characters, or the name is too long. sched_output_path: a (relative) file name for the stdout of this job sched_error_path: a (relative) file name for the stdout of this job sched_join_files: if True, write both stdout and stderr on the same file (the one specified for stdout) queue_name: the name of the scheduler queue (sometimes also called partition), on which the job will be submitted. account: the name of the scheduler account (sometimes also called projectid), on which the job will be submitted. qos: the quality of service of the scheduler account, on which the job will be submitted. job_resource: a suitable JobResource subclass with information on how many nodes and cpus it should use. It must be an instance of the aiida.schedulers.Scheduler.job_resource_class class. Use the Scheduler.create_job_resource method to create it. num_machines: how many machines (or nodes) should be used num_mpiprocs_per_machine: how many MPI procs should be used on each machine (or node). priority: a priority for this job. Should be in the format accepted by the specific scheduler. max_memory_kb: The maximum amount of memory the job is allowed to allocate ON EACH NODE, in kilobytes max_wallclock_seconds: The maximum wall clock time that all processes of a job are allowed to exist, in seconds custom_scheduler_commands: a string that will be inserted right after the last scheduler command, and before any other non-scheduler command; useful if some specific flag needs to be added and is not supported by the plugin prepend_text: a (possibly multi-line) string to be inserted in the scheduler script before the main execution line append_text: a (possibly multi-line) string to be inserted in the scheduler script after the main execution line import_sys_environment: import the system environment variables codes_info: a list of aiida.common.datastructures.CalcInfo objects. Each contains the information necessary to run a single code. At the moment, it can contain: cmdline_parameters: a list of strings with the command line arguments of the program to run. This is the main program to be executed. NOTE: The first one is the executable name. For MPI runs, this will probably be “mpirun” or a similar program; this has to be chosen at a upper level. stdin_name: the (relative) file name to be used as stdin for the program specified with argv. stdout_name: the (relative) file name to be used as stdout for the program specified with argv. stderr_name: the (relative) file name to be used as stderr for the program specified with argv. join_files: if True, stderr is redirected on the same file specified for stdout. codes_run_mode: sets the run_mode with which the (multiple) codes have to be executed. For example, parallel execution: mpirun -np 8 a.x & mpirun -np 8 b.x & wait The serial execution would be without the &’s. Values are given by aiida.common.datastructures.CodeRunMode.
shebang line: The first line of the submission script
shebang line
submit_as_hold: if set, the job will be in a ‘hold’ status right after the submission
submit_as_hold
rerunnable: if the job is rerunnable (boolean)
rerunnable
job_environment: a dictionary with environment variables to set before the execution of the code.
job_environment
working_directory: the working directory for this job. During submission, the transport will first do a ‘chdir’ to this directory, and then possibly set a scheduler parameter, if this is supported by the scheduler.
working_directory
email: an email address for sending emails on job events.
email
email_on_started: if True, ask the scheduler to send an email when the job starts.
email_on_started
email_on_terminated: if True, ask the scheduler to send an email when the job ends. This should also send emails on job failure, when possible.
email_on_terminated
job_name: the name of this job. The actual name of the job can be different from the one specified here, e.g. if there are unsupported characters, or the name is too long.
job_name
sched_output_path: a (relative) file name for the stdout of this job
sched_output_path
sched_error_path: a (relative) file name for the stdout of this job
sched_error_path
sched_join_files: if True, write both stdout and stderr on the same file (the one specified for stdout)
sched_join_files
queue_name: the name of the scheduler queue (sometimes also called partition), on which the job will be submitted.
queue_name
account: the name of the scheduler account (sometimes also called projectid), on which the job will be submitted.
account
qos: the quality of service of the scheduler account, on which the job will be submitted.
qos
job_resource: a suitable JobResource subclass with information on how many nodes and cpus it should use. It must be an instance of the aiida.schedulers.Scheduler.job_resource_class class. Use the Scheduler.create_job_resource method to create it.
job_resource
aiida.schedulers.Scheduler.job_resource_class
num_machines: how many machines (or nodes) should be used
num_mpiprocs_per_machine: how many MPI procs should be used on each machine (or node).
priority: a priority for this job. Should be in the format accepted by the specific scheduler.
priority
max_memory_kb: The maximum amount of memory the job is allowed to allocate ON EACH NODE, in kilobytes
max_memory_kb
max_wallclock_seconds: The maximum wall clock time that all processes of a job are allowed to exist, in seconds
max_wallclock_seconds
custom_scheduler_commands: a string that will be inserted right after the last scheduler command, and before any other non-scheduler command; useful if some specific flag needs to be added and is not supported by the plugin
custom_scheduler_commands
prepend_text: a (possibly multi-line) string to be inserted in the scheduler script before the main execution line
prepend_text
append_text: a (possibly multi-line) string to be inserted in the scheduler script after the main execution line
append_text
import_sys_environment: import the system environment variables
import_sys_environment
codes_info: a list of aiida.common.datastructures.CalcInfo objects. Each contains the information necessary to run a single code. At the moment, it can contain:
codes_info
cmdline_parameters: a list of strings with the command line arguments of the program to run. This is the main program to be executed. NOTE: The first one is the executable name. For MPI runs, this will probably be “mpirun” or a similar program; this has to be chosen at a upper level.
cmdline_parameters
stdin_name: the (relative) file name to be used as stdin for the program specified with argv.
stdin_name
stdout_name: the (relative) file name to be used as stdout for the program specified with argv.
stdout_name
stderr_name: the (relative) file name to be used as stderr for the program specified with argv.
stderr_name
join_files: if True, stderr is redirected on the same file specified for stdout.
join_files
codes_run_mode: sets the run_mode with which the (multiple) codes have to be executed. For example, parallel execution:
codes_run_mode
mpirun -np 8 a.x & mpirun -np 8 b.x & wait
The serial execution would be without the &’s. Values are given by aiida.common.datastructures.CodeRunMode.
JobInfo
Contains properties for a job in the queue. Most of the fields are taken from DRMAA v.2.
Note that default fields may be undefined. This is an expected behavior and the application must cope with this case. An example for instance is the exit_status for jobs that have not finished yet; or features not supported by the given scheduler.
job_id: the job ID on the scheduler title: the job title, as known by the scheduler exit_status: the exit status of the job as reported by the operating system on the execution host terminating_signal: the UNIX signal that was responsible for the end of the job. annotation: human-readable description of the reason for the job being in the current state or substate. job_state: the job state (one of those defined in aiida.schedulers.datastructures.JobState) job_substate: a string with the implementation-specific sub-state allocated_machines: a list of machines used for the current job. This is a list of aiida.schedulers.datastructures.MachineInfo objects. job_owner: the job owner as reported by the scheduler num_mpiprocs: the total number of requested MPI procs num_cpus: the total number of requested CPUs (cores) [may be undefined] num_machines: the number of machines (i.e., nodes), required by the job. If allocated_machines is not None, this number must be equal to len(allocated_machines). Otherwise, for schedulers not supporting the retrieval of the full list of allocated machines, this attribute can be used to know at least the number of machines. queue_name: The name of the queue in which the job is queued or running. account: The account/projectid in which the job is queued or running in. qos: The quality of service in which the job is queued or running in. wallclock_time_seconds: the accumulated wallclock time, in seconds requested_wallclock_time_seconds: the requested wallclock time, in seconds cpu_time: the accumulated cpu time, in seconds submission_time: the absolute time at which the job was submitted, of type datetime.datetime dispatch_time: the absolute time at which the job first entered the ‘started’ state, of type datetime.datetime finish_time: the absolute time at which the job first entered the ‘finished’ state, of type datetime.datetime
job_id: the job ID on the scheduler
job_id
title: the job title, as known by the scheduler
title
exit_status: the exit status of the job as reported by the operating system on the execution host
exit_status
terminating_signal: the UNIX signal that was responsible for the end of the job.
terminating_signal
annotation: human-readable description of the reason for the job being in the current state or substate.
annotation
job_state: the job state (one of those defined in aiida.schedulers.datastructures.JobState)
job_state
aiida.schedulers.datastructures.JobState
job_substate: a string with the implementation-specific sub-state
job_substate
allocated_machines: a list of machines used for the current job. This is a list of aiida.schedulers.datastructures.MachineInfo objects.
allocated_machines
aiida.schedulers.datastructures.MachineInfo
job_owner: the job owner as reported by the scheduler
job_owner
num_mpiprocs: the total number of requested MPI procs
num_mpiprocs
num_cpus: the total number of requested CPUs (cores) [may be undefined]
num_cpus
num_machines: the number of machines (i.e., nodes), required by the job. If allocated_machines is not None, this number must be equal to len(allocated_machines). Otherwise, for schedulers not supporting the retrieval of the full list of allocated machines, this attribute can be used to know at least the number of machines.
len(allocated_machines)
queue_name: The name of the queue in which the job is queued or running.
account: The account/projectid in which the job is queued or running in.
qos: The quality of service in which the job is queued or running in.
wallclock_time_seconds: the accumulated wallclock time, in seconds
wallclock_time_seconds
requested_wallclock_time_seconds: the requested wallclock time, in seconds
requested_wallclock_time_seconds
cpu_time: the accumulated cpu time, in seconds
cpu_time
submission_time: the absolute time at which the job was submitted, of type datetime.datetime
submission_time
dispatch_time: the absolute time at which the job first entered the ‘started’ state, of type datetime.datetime
dispatch_time
finish_time: the absolute time at which the job first entered the ‘finished’ state, of type datetime.datetime
finish_time
_deserialize_date
Deserialise a date :param value: The date vlue :return: The deserialised date
_deserialize_job_state
Return an instance of JobState from the job_state string.
_serialize_date
Serialise a data value :param value: The value to serialise :return: The serialised value
_serialize_job_state
Return the serialized value of the JobState instance.
_special_serializers
deserialize_field
Deserialise the value of a particular field with a type :param value: The value :param field_type: The field type :return: The deserialised value
get_dict
Serialise the current data into a dictionary that is JSON-serializable.
A dictionary
load_from_dict
Create a new instance loading the values from serialised data in dictionary form
data – The dictionary with the data to load from
load_from_serialized
Create a new instance loading the values from JSON-serialised data as a string
data – The string with the JSON-serialised data to load from
serialize
Serialize the current data (as obtained by self.get_dict()) into a JSON string.
self.get_dict()
A string with serialised representation of the current data.
serialize_field
Serialise a particular field value
value – The value to serialise
field_type – The field type
The serialised value
NodeNumberJobResource
Bases: aiida.schedulers.datastructures.JobResource
aiida.schedulers.datastructures.JobResource
JobResource for schedulers that support the specification of a number of nodes and cpus per node.
__init__
Initialize the job resources from the passed arguments.
attribute dictionary with the parsed parameters populated
ParEnvJobResource
JobResource for schedulers that support the specification of a parallel environment and number of MPI procs.
Initialize the job resources from the passed arguments (the valid keys can be obtained with the function self.get_valid_keys()).
MachineInfo
Similarly to what is defined in the DRMAA v.2 as SlotInfo; this identifies each machine (also called ‘node’ on some schedulers) on which a job is running, and how many CPUs are being used. (Some of them could be undefined)
name: name of the machine
name
num_cpus: number of cores used by the job on this machine
num_mpiprocs: number of MPI processes used by the job on this machine
Scheduler
Bases: object
object
Base class for a job scheduler.
__dict__
Initialize self. See help(type(self)) for accurate signature.
__weakref__
list of weak references to the object (if defined)
_features
_get_detailed_job_info_command
Return the command to run to get detailed information for a given job.
This is typically called after the job has finished, to retrieve the most detailed information possible about the job. This is done because most schedulers just make finished jobs disappear from the qstat command, and instead sometimes it is useful to know some more detailed information about the job exit status, etc.
aiida.common.exceptions.FeatureNotAvailable
_get_joblist_command
Return the command to get the most complete description possible of currently active jobs.
Note
Typically one can pass only either jobs or user, depending on the specific plugin. The choice can be done according to the value returned by self.get_feature(‘can_query_by_user’)
jobs – either None to get a list of all jobs in the machine, or a list of jobs.
user – either None, or a string with the username (to show only jobs of the specific user).
_get_kill_command
Return the command to kill the job with specified jobid.
_get_run_line
Return a string with the line to execute a specific code with specific arguments.
codes_info – a list of aiida.common.datastructures.CodeInfo objects. Each contains the information needed to run the code. I.e. cmdline_params, stdin_name, stdout_name, stderr_name, join_files. See the documentation of JobTemplate and CodeInfo.
codes_run_mode – instance of aiida.common.datastructures.CodeRunMode contains the information on how to launch the multiple codes.
string with format: [executable] [args] {[ < stdin ]} {[ < stdout ]} {[2>&1 | 2> stderr]}
_get_submit_command
Return the string to execute to submit a given script.
Warning
the submit_script should already have been bash-escaped
submit_script – the path of the submit script relative to the working directory.
the string to execute to submit a given script.
_get_submit_script_footer
Return the submit script final part, using the parameters from the job template.
job_tmpl – a JobTemplate instance with relevant parameters set.
_get_submit_script_header
Return the submit script header, using the parameters from the job template.
_job_resource_class
_logger
_parse_joblist_output
Parse the joblist output as returned by executing the command returned by _get_joblist_command method.
list of JobInfo objects, one of each job each with at least its default params implemented.
_parse_kill_output
Parse the output of the kill command.
True if everything seems ok, False otherwise.
_parse_submit_output
Parse the output of the submit command returned by calling the _get_submit_command command.
a string with the job ID.
create_job_resource
Create a suitable job resource from the kwargs specified.
get_detailed_job_info
Return the detailed job info.
This will be a dictionary with the return value, stderr and stdout content returned by calling the command that is returned by _get_detailed_job_info_command.
job_id – the job identifier
dictionary with retval, stdout and stderr.
get_detailed_jobinfo
Return a string with the output of the detailed_jobinfo command.
Deprecated since version 1.1.0: Will be removed in v2.0.0, use aiida.schedulers.scheduler.Scheduler.get_detailed_job_info() instead.
aiida.schedulers.scheduler.Scheduler.get_detailed_job_info()
At the moment, the output text is just retrieved and stored for logging purposes, but no parsing is performed.
get_feature
get_jobs
Return the list of currently active jobs.
typically, only either jobs or user can be specified. See also comments in _get_joblist_command.
jobs (list) – a list of jobs to check; only these are checked
user (str) – a string with a user: only jobs of this user are checked
as_dict (list) – if False (default), a list of JobInfo objects is returned. If True, a dictionary is returned, having as key the job_id and as value the JobInfo object.
list of active jobs
get_short_doc
Return the first non-empty line of the class docstring, if available.
get_submit_script
Return the submit script as a string.
job_tmpl – a aiida.schedulers.datastrutures.JobTemplate instance.
The plugin returns something like
#!/bin/bash <- this shebang line is configurable to some extent scheduler_dependent stuff to choose numnodes, numcores, walltime, … prepend_computer [also from calcinfo, joined with the following?] prepend_code [from calcinfo] output of _get_script_main_content postpend_code postpend_computer
get_valid_schedulers
Return all available scheduler plugins.
Deprecated since version 1.3.0: Will be removed in 2.0.0, use aiida.plugins.entry_point.get_entry_point_names instead
job_resource_class
kill
Kill a remote job and parse the return value of the scheduler to check if the command succeeded.
..note:
On some schedulers, even if the command is accepted, it may take some seconds for the job to actually disappear from the queue.
jobid – the job ID to be killed
logger
Return the internal logger.
preprocess_resources
Pre process the resources.
Add the num_mpiprocs_per_machine key to the resources if it is not already defined and it cannot be deduced from the num_machines and tot_num_mpiprocs being defined. The value is also not added if the job resource class of this scheduler does not accept the num_mpiprocs_per_machine keyword. Note that the changes are made in place to the resources argument passed.
set_transport
Set the transport to be used to query the machine or to submit scripts.
This class assumes that the transport is open and active.
submit_from_script
Submit the submission script to the scheduler.
return a string with the job ID in a valid format to be used for querying.
transport
Return the transport set for this scheduler.
resources – keyword arguments to define the job resources
SchedulerError
Bases: aiida.common.exceptions.AiidaException
aiida.common.exceptions.AiidaException
SchedulerParsingError
Bases: aiida.schedulers.scheduler.SchedulerError
aiida.schedulers.scheduler.SchedulerError
Data structures used by Scheduler instances.
In particular, there is the definition of possible job states (job_states), the data structure to be filled for job submission (JobTemplate), and the data structure that is returned when querying for jobs in the scheduler (JobInfo).
aiida.schedulers.datastructures.
Implementation of Scheduler base class.
aiida.schedulers.scheduler.