Usage#
Note
This chapter assumes knowledge of the previous section on the basic concept of processes.
This section will explain the aspects of working with processes that apply to all processes. Details that only pertain to a specific sub type of process, will be documented in their respective sections:
Defining processes#
Process specification#
How a process defines the inputs that it requires or can optionally take, depends on the process type.
The inputs of CalcJob
and WorkChain
are given by the ProcessSpec
class, which is defined though the define()
method.
For process functions, the ProcessSpec
is dynamically generated by the engine from the signature of the decorated function.
Therefore, to determine what inputs a process takes, one simply has to look at the process specification in the define
method or the function signature.
For the CalcJob
and WorkChain
there is also the concept of the process builder, which will allow one to inspect the inputs with tab-completion and help strings in the shell.
The three most important attributes of the ProcessSpec
are:
inputs
outputs
exit_codes
Through these attributes, one can define what inputs a process takes, what outputs it will produce and what potential exit codes it can return in case of errors.
Just by looking at a process specification then, one will know exactly what will happen, just not how it will happen.
The inputs
and outputs
attributes are namespaces that contain so called ports, each one of which represents a specific input or output.
The namespaces can be arbitrarily nested with ports and so are called port namespaces.
The port and port namespace are implemented by the Port
and PortNamespace
class, respectively.
Ports and Port namespaces#
To define an input for a process specification, we only need to add a port to the inputs
port namespace, as follows:
spec = ProcessSpec()
spec.input('parameters')
The input
method, will create an instance of InputPort
, a sub class of the base Port
, and will add it to the inputs
port namespace of the spec.
Creating an output is just as easy, but one should use the output()
method instead:
spec = ProcessSpec()
spec.output('result')
This will cause an instance of CalcJobOutputPort
, also a sub class of the base Port
, to be created and to be added to the outputs
specifcation attribute.
Recall, that the inputs
and output
are instances of a PortNamespace
, which means that they can contain any port.
But the PortNamespace
itself is also a port itself, so it can be added to another port namespace, allowing one to create nested port namespaces.
Creating a new namespace in for example the inputs namespace is as simple as:
spec = ProcessSpec()
spec.input_namespace('namespace')
This will create a new PortNamespace
named namespace
in the inputs
namespace of the spec.
You can create arbitrarily nested namespaces in one statement, by separating them with a .
as shown here:
spec = ProcessSpec()
spec.input_namespace('nested.namespace')
This command will result in the PortNamespace
name namespace
to be nested inside another PortNamespace
called nested
.
Note
Because the period is reserved to denote different nested namespaces, it cannot be used in the name of terminal input and output ports as that could be misinterpreted later as a port nested in a namespace.
Graphically, this can be visualized as a nested dictionary and will look like the following:
'inputs': {
'nested': {
'namespace': {}
}
}
The outputs
attribute of the ProcessSpec
is also a PortNamespace
just as the inputs
, with the only different that it will create OutputPort
instead of InputPort
instances.
Therefore the same concept of nesting through PortNamespaces
applies to the outputs of a ProcessSpec
.
Validation and defaults#
In the previous section, we saw that the ProcessSpec
uses the PortNamespace
, InputPort
and OutputPort
to define the inputs and outputs structure of the Process
.
The underlying concept that allows this nesting of ports is that the PortNamespace
, InputPort
and OutputPort
, are all a subclass of Port
.
And as different subclasses of the same class, they have more properties and attributes in common, for example related to the concept of validation and default values.
All three have the following attributes (with the exception of the OutputPort
not having a default
attribute):
default
required
valid_type
validator
These attributes can all be set upon construction of the port or after the fact, as long as the spec has not been sealed, which means that they can be altered without limit as long as it is within the define
method of the corresponding Process
.
An example input port that explicitly sets all these attributes is the following:
spec.input('positive_number', required=False, default=lambda: Int(1), valid_type=(Int, Float), validator=is_number_positive)
Here we define an input named positive_number
that should be of type Int
or Float
and should pass the test of the is_number_positive
validator.
If no value is passed, the default will be used.
Warning
In python, it is good practice to avoid mutable defaults for function arguments, since they are instantiated at function definition and reused for each invocation. This can lead to unexpected results when the default value is changed between function calls. In the context of AiiDA, nodes (both stored and unstored) are considered mutable and should therefore not be used as default values for process ports. However, it is possible to use a lambda that returns a node instance as done in the example above. This will return a new instance of the node with the given value, each time the process is instantiated.
Note that the validator is nothing more than a free function which takes a single argument, being the value that is to be validated. If nothing is returned, the value is considered to be valid. To signal that the value is invalid and to have a validation error raised, simply return a string with the validation error message, for example:
def is_number_positive(number):
if number < 0:
return 'The number has to be greater or equal to zero'
The valid_type
can define a single type, or a tuple of valid types.
New in version 2.1: Optional ports can now accept None
If a port is marked as optional through required=False
and defines valid_type
, the port will also accept None
as values, whereas before this would raise validation error.
This is accomplished by automatically adding the NoneType
to the valid_type
tuple.
Ports that do not define a valid_type
are not affected.
Note
Note that by default all ports are required, but specifying a default value implies that the input is not required and as such specifying required=False
is not necessary in that case.
It was added to the example above simply for clarity.
The validation of input or output values with respect to the specification of the corresponding port, happens at the instantiation of the process and when it is finalized, respectively.
If the inputs are invalid, a corresponding exception will be thrown and the process instantiation will fail.
When the outputs fail to be validated, likewise an exception will be thrown and the process state will be set to Excepted
.
Dynamic namespaces#
In the previous section we described the various attributes related to validation and claimed that all the port variants share those attributes, yet we only discussed the InputPort
and OutputPort
explicitly.
The statement, however, is still correct and the PortNamespace
has the same attributes.
You might then wonder what the meaning is of a valid_type
or default
for a PortNamespace
if all it does is contain InputPorts
, OutputPorts
or other PortNamespaces
.
The answer to this question lies in the PortNamespace
attribute dynamic
.
Often when designing the specification of a Process
, we cannot know exactly which inputs we want to be able to pass to the process.
However, with the concept of the InputPort
and OutputPort
one does need to know exactly, how many values one expects at least, as they do have to be defined.
This is where the dynamic
attribute of the PortNamespace
comes in.
By default this is set to False
, but by setting it to True
, one indicates that that namespace can take a number of values that is unknown at the time of definition of the specification.
This now explains the meaning of the valid_type
, validator
and default
attributes in the context of the PortNamespace
.
If you do mark a namespace as dynamic, you may still want to limit the set of values that are acceptable, which you can do by specifying the valid type and or validator.
The values that will eventually be passed to the port namespace will then be validated according to these rules exactly as a value for a regular input port would be.
Non storable inputs#
In principle, the only valid types for inputs and outputs should be instances of a Data
node, or one of its sub classes, as that is the only data type that can be recorded in the provenance graph as an input or output of a process.
However, there are cases where you might want to pass an input to a process, whose provenance you do not care about and therefore would want to pass a non-database storable type anyway.
Note
AiiDA allows you to break the provenance as to be not too restrictive, but always tries to urge you and guide you in a direction to keep the provenance. There are legitimate reasons to break it regardless, but make sure you think about the implications and whether you are really willing to lose the information.
For this situation, the InputPort
has the attribute non_db
.
By default this is set to False
, but by setting it to True
we can indicate that the values that are passed to the port should not be stored as a node in the provenance graph and linked to the process node.
This allows one to pass any normal value that one would also be able to pass to a normal function.
Automatic input serialization#
Quite often, inputs which are given as Python data types need to be cast to the corresponding AiiDA type before passing them to a process.
Doing this manually can be cumbersome, so you can define a function when defining the process specification, which does the conversion automatically.
This function, passed as serializer
parameter to spec.input
, is invoked if the given input is not None
and not already an AiiDA type.
For inputs which are stored in the database (non_db=False
), the serialization function should return an AiiDA data type.
For non_db
inputs, the function must be idempotent because it might be applied more than once.
The following example work chain takes three inputs a
, b
, c
, and simply returns the given inputs.
The to_aiida_type()
function is used as serialization function.
from aiida.engine import WorkChain
from aiida.orm import to_aiida_type
class SerializeWorkChain(WorkChain):
@classmethod
def define(cls, spec):
super().define(spec)
spec.input('a', serializer=to_aiida_type)
spec.input('b', serializer=to_aiida_type)
spec.input('c', serializer=to_aiida_type)
spec.outline(cls.echo)
def echo(self):
self.out('a', self.inputs.a)
self.out('b', self.inputs.b)
self.out('c', self.inputs.c)
This work chain can now be called with native Python types, which will automatically be converted to AiiDA types by the to_aiida_type()
function.
Note that the module which defines the corresponding AiiDA type must be loaded for it to be recognized by to_aiida_type()
.
#!/usr/bin/env runaiida
from aiida.engine import run
from serialize_workchain import SerializeWorkChain
if __name__ == '__main__':
print(run(SerializeWorkChain, a=1, b=1.2, c=True))
# Result: {'a': 1, 'b': 1.2, 'c': True}
Of course, you can also use the serialization feature to perform a more complex serialization of the inputs.
Exit codes#
Any Process
most likely will have one or multiple expected failure modes.
To clearly communicate to the caller what went wrong, the Process
supports setting its exit_status
.
This exit_status
, a positive integer, is an attribute of the process node and by convention, when it is zero means the process was successful, whereas any other value indicates failure.
This concept of an exit code, with a positive integer as the exit status, is a common concept in programming and a standard way for programs to communicate the result of their execution.
Potential exit codes for the Process
can be defined through the ProcessSpec
, just like inputs and outputs.
Any exit code consists of a positive non-zero integer, a string label to reference it and a more detailed description of the problem that triggers the exit code.
Consider the following example:
spec = ProcessSpec()
spec.exit_code(418, 'ERROR_I_AM_A_TEAPOT', 'the process had an identity crisis')
This defines an exit code for the Process
with exit status 418
and exit message the work chain had an identity crisis
.
The string ERROR_I_AM_A_TEAPOT
is a label that the developer can use to reference this particular exit code somewhere in the Process
code itself.
Whenever a Process
exits through a particular error code, the caller will be able to introspect it through the exit_status
and exit_message
attributes of the node.
Assume for example that we ran a Process
that threw the exit code described above, the caller would be able to do the following:
in[1] node = load_node(<pk>)
in[2] node.exit_status
out[2] 418
in[2] node.exit_message
out[2] 'the process had an identity crisis'
This is useful, because the caller can now programmatically, based on the exit_status
, decide how to proceed.
This is an infinitely more robust way of communicating specific errors to a non-human than parsing text-based logs or reports.
Additionally, the exit codes make it very easy to query for failed processes with specific error codes.
See also
Additional documentation, specific to certain process types, can be found in the following sections:
Exit code conventions#
In principle, the only restriction on the exit status of an exit code is that it should be a positive integer or zero. However, to make effective use of exit codes, there are some guidelines and conventions as to decide what integers to use. Note that since the following rules are guidelines you can choose to ignore them and currently the engine will not complain, but this might change in the future. Regardless, we advise you to follow the guidelines since it will improve the interoperability of your code with other existing plugins. The following integer ranges are reserved or suggested:
0 - 99: Reserved for internal use by aiida-core
100 - 199: Reserved for errors parsed from scheduler output of calculation jobs (note: this is not yet implemented)
200 - 299: Suggested to be used for process input validation errors
300 - 399: Suggested for critical process errors
For any other exit codes, one can use the integers from 400 and up.
Process metadata#
Each process, in addition to the normal inputs defined through its process specification, can take optional ‘metadata’. These metadata differ from inputs in the sense that they are not nodes that will show up as inputs in the provenance graph of the executed process. Rather, these are inputs that slightly modify the behavior of the process or allow to set attributes on the process node that represents its execution. The following metadata inputs are available for all process classes:
label
: will set the label on theProcessNode
description
: will set the description on theProcessNode
store_provenance
: boolean flag, by defaultTrue
, that when set toFalse
, will ensure that the execution of the process is not stored in the provenance graph
Sub classes of the Process
class can specify further metadata inputs, refer to their specific documentation for details.
To pass any of these metadata options to a process, simply pass them in a dictionary under the key metadata
in the inputs when launching the process.
How a process can be launched is explained the following section.
Launching processes#
Any process can be launched by ‘running’ or ‘submitting’ it. Running means to run the process in the current python interpreter in a blocking way, whereas submitting means to send it to a daemon worker over RabbitMQ. For long running processes, such as calculation jobs or complex workflows, it is best advised to submit to the daemon. This has the added benefit that it will directly return control to your interpreter and allow the daemon to save intermediate progress during checkpoints and reload the process from those if it has to restart. Running processes can be useful for trivial computational tasks, such as simple calcfunctions or workfunctions, or for debugging and testing purposes.
Process launch#
To launch a process, one can use the free functions that can be imported from the aiida.engine
module.
There are four different functions:
As the name suggest, the first three will ‘run’ the process and the latter will ‘submit’ it to the daemon. Running means that the process will be executed in the same interpreter in which it is launched, blocking the interpreter, until the process is terminated. Submitting to the daemon, in contrast, means that the process will be sent to the daemon for execution, and the interpreter is released straight away.
All functions have the exact same interface launch(process, inputs)
where:
process
is the process class or process function to launchinputs
the inputs dictionary to pass to the process.
Changed in version 2.5: Before AiiDA v2.5, the inputs could only be passed as keyword arguments.
This behavior is still supported, e.g., one can launch a process as launch(process, **inputs)
or launch(process, input_a=value_a, input_b=value_b)
.
However, the recommended approach is now to use an input dictionary passed as the second positional argument.
The reason is that certain launchers define arguments themselves which can overlap with inputs of the process.
For example, the submit
method defines the wait
keyword.
If the process being launched also defines an input named wait
, the launcher method cannot tell them apart.
What inputs can be passed depends on the exact process class that is to be launched.
For example, when we want to run an instance of the ArithmeticAddCalculation
process, which takes two Int
nodes as inputs under the name x
and y
[1], we would do the following:
from aiida import orm, plugins
from aiida.engine import submit
ArithmeticAddCalculation = plugins.CalculationFactory('core.arithmetic.add')
node = submit(ArithmeticAddCalculation, x=orm.Int(1), y=orm.Int(2))
The function will submit the calculation to the daemon and immediately return control to the interpreter, returning the node that is used to represent the process in the provenance graph.
Warning
For a process to be submittable, the class or function needs to be importable in the daemon environment by a) giving it an associated entry point or b) including its module path in the PYTHONPATH
that the daemon workers will have.
New in version 2.5: Waiting on a process
Use wait=True
when calling submit
to wait for the process to complete before returning the node.
This can be useful for tutorials and demos in interactive notebooks where the user should not continue before the process is done.
One could of course also use run
(see below), but then the process would be lost if the interpreter gets accidentally shut down.
By using submit
, the process is run by the daemon which takes care of saving checkpoints so it can always be restarted in case of problems.
If you need to launch multiple processes in parallel and want to wait for all of them to be finished, simply use submit
with the default wait=False
and collect the returned nodes in a list.
You can then pass them to aiida.engine.launch.await_processes()
which will return once all processes have terminated:
from aiida.engine import submit, await_processes
nodes = []
for i in range(5):
node = submit(...)
nodes.append(node)
await_processes(nodes, wait_interval=10)
The await_processes
function will loop every wait_interval
seconds and check whether all processes (represented by the ProcessNode
in the nodes
list) have terminated.
The run
function is called identically:
from aiida import orm, plugins
from aiida.engine import run
ArithmeticAddCalculation = plugins.CalculationFactory('core.arithmetic.add')
result = run(ArithmeticAddCalculation, x=orm.Int(1), y=orm.Int(2))
except that it does not submit the process to the daemon, but executes it in the current interpreter, blocking it until the process is terminated.
The return value of the run
function is also not the node that represents the executed process, but the results returned by the process, which is a dictionary of the nodes that were produced as outputs.
If you would still like to have the process node or the pk of the process node you can use one of the following variants:
from aiida import orm, plugins
from aiida.engine import run_get_node, run_get_pk
ArithmeticAddCalculation = plugins.CalculationFactory('core.arithmetic.add')
result, node = run_get_node(ArithmeticAddCalculation, x=orm.Int(1), y=orm.Int(2))
result, pk = run_get_pk(ArithmeticAddCalculation, x=orm.Int(1), y=orm.Int(2))
Finally, the run()
launcher has two attributes get_node
and get_pk
that are simple proxies to the run_get_node()
and run_get_pk()
methods.
This is a handy shortcut, as now you can choose to use any of the three variants with just a single import:
from aiida import orm, plugins
from aiida.engine import run
ArithmeticAddCalculation = plugins.CalculationFactory('core.arithmetic.add')
result = run(ArithmeticAddCalculation, x=orm.Int(1), y=orm.Int(2))
result, node = run.get_node(ArithmeticAddCalculation, x=orm.Int(1), y=orm.Int(2))
result, pk = run.get_pk(ArithmeticAddCalculation, x=orm.Int(1), y=orm.Int(2))
If you want to launch a process class that takes a lot more inputs, often it is useful to define them in a dictionary and use the python syntax **
that automatically expands it into keyword argument and value pairs.
The examples used above would look like the following:
from aiida import orm, plugins
from aiida.engine import submit
ArithmeticAddCalculation = plugins.CalculationFactory('core.arithmetic.add')
inputs = {'x': orm.Int(1), 'y': orm.Int(2)}
node = submit(ArithmeticAddCalculation, inputs)
Process functions, i.e. calculation functions and work functions, can be launched like any other process as explained above. Process functions have two additional methods of being launched:
Simply calling the function
Using the internal run method attributes
Using a calculation function to add two numbers as an example, these two methods look like the following:
from aiida.engine import calcfunction
from aiida.orm import Int
@calcfunction
def add(x, y):
return x + y
x = Int(1)
y = Int(2)
result = add(x, y)
result, node = add.run_get_node(x, y)
result, pk = add.run_get_pk(x, y)
Process builder#
As explained in a previous section, the inputs for a CalcJob
and WorkChain
are defined in the define()
method.
To know what inputs they take, one would have to read the implementation, which can be annoying if you are not a developer.
To simplify this process, these two process classes provide a utility called the ‘process builder’.
The process builder is essentially a tool that helps you build the inputs for the specific process class that you want to run.
To get a builder for a particular CalcJob
or a WorkChain
implementation, all you need is the class itself, which can be loaded through the CalculationFactory
and WorkflowFactory
, respectively.
Let’s take the ArithmeticAddCalculation
as an example:
ArithmeticAddCalculation = CalculationFactory('core.arithmetic.add')
builder = ArithmeticAddCalculation.get_builder()
The string core.arithmetic.add
is the entry point of the ArithmeticAddCalculation
and passing it to the CalculationFactory
will return the corresponding class.
Calling the get_builder
method on that class will return an instance of the ProcessBuilder
class that is tailored for the ArithmeticAddCalculation
.
The builder will help you in defining the inputs that the ArithmeticAddCalculation
requires and has a few handy tools to simplify this process.
To find out which inputs the builder exposes, you can simply use tab completion.
In an interactive python shell, by simply typing builder.
and hitting the tab key, a complete list of all the available inputs will be shown.
Each input of the builder can also show additional information about what sort of input it expects.
In an interactive shell, you can get this information to display as follows:
builder.code?
Type: property
String form: <property object at 0x7f04c8ce1c00>
Docstring:
"name": "code",
"required": "True"
"non_db": "False"
"valid_type": "<class 'aiida.orm.nodes.data.code.abstract.AbstractCode'>"
"help": "The Code to use for this job.",
In the Docstring
you will see a help
string that contains more detailed information about the input port.
Additionally, it will display a valid_type
, which when defined shows which data types are expected.
If a default value has been defined, that will also be displayed.
The non_db
attribute defines whether that particular input will be stored as a proper input node in the database, if the process is submitted.
Defining an input through the builder is as simple as assigning a value to the attribute.
The following example shows how to set the parameters
input, as well as the description
and label
metadata inputs:
builder.metadata.label = 'This is my calculation label'
builder.metadata.description = 'An example calculation to demonstrate the process builder'
builder.x = Int(1)
builder.y = Int(2)
If you evaluate the builder
instance, simply by typing the variable name and hitting enter, the current values of the builder’s inputs will be displayed:
builder
{
'metadata': {
'description': 'An example calculation to demonstrate the process builder',
'label': 'This is my calculation label',
'options': {},
},
'x': Int<uuid='a1798492-bbc9-4b92-a630-5f54bb2e865c' unstored>,
'y': Int<uuid='39384da4-6203-41dc-9b07-60e6df24e621' unstored>
}
In this example, you can see the value that we just set for the description
and the label
.
In addition, it will also show any namespaces, as the inputs of processes support nested namespaces, such as the metadata.options
namespace in this example.
Note that nested namespaces are also all autocompleted, and you can traverse them recursively with tab-completion.
All that remains is to fill in all the required inputs and we are ready to launch the process builder.
When all the inputs have been defined for the builder, it can be used to actually launch the Process
.
The process can be launched by passing the builder to any of the free functions launch
module, just as you would do a normal process as described above, i.e.:
from aiida import orm, plugins
from aiida.engine import submit
ArithmeticAddCalculation = plugins.CalculationFactory('core.arithmetic.add')
builder = ArithmeticAddCalculation.get_builder()
builder.x = orm.Int(1)
builder.y = orm.Int(2)
node = submit(builder)
Note that the process builder is in principle designed to be used in an interactive shell, as there is where the tab-completion and automatic input documentation really shines. However, it is perfectly possible to use the same builder in scripts where you simply use it as an input container, instead of a plain python dictionary.
Monitoring processes#
When you have launched a process, you may want to investigate its status, progression and the results. The verdi command line tool provides various commands to do just this.
verdi process list#
Your first point of entry will be the verdi
command verdi process list
.
This command will print a list of all active processes through the ProcessNode
stored in the database that it uses to represent its execution.
A typical example may look something like the following:
PK Created State Process label Process status
---- ---------- ------------ -------------------------- ----------------------
151 3h ago ⏵ Running ArithmeticAddCalculation
156 1s ago ⏹ Created ArithmeticAddCalculation
Total results: 2
The ‘State’ column is a concatenation of the process_state
and the exit_status
of the ProcessNode
.
By default, the command will only show active items, i.e. ProcessNodes
that have not yet reached a terminal state.
If you want to also show the nodes in a terminal states, you can use the -a
flag and call verdi process list -a
:
PK Created State Process label Process status
---- ---------- --------------- -------------------------- ----------------------
143 3h ago ⏹ Finished [0] add
146 3h ago ⏹ Finished [0] multiply
151 3h ago ⏵ Running ArithmeticAddCalculation
156 1s ago ⏹ Created ArithmeticAddCalculation
Total results: 4
For more information on the meaning of the ‘state’ column, please refer to the documentation of the process state.
The -S
flag let’s you query for specific process states, i.e. issuing verdi process list -S created
will return:
PK Created State Process label Process status
---- ---------- ------------ -------------------------- ----------------------
156 1s ago ⏹ Created ArithmeticAddCalculation
Total results: 1
To query for a specific exit status, one can use verdi process list -E 0
:
PK Created State Process label Process status
---- ---------- ------------ -------------------------- ----------------------
143 3h ago ⏹ Finished [0] add
146 3h ago ⏹ Finished [0] multiply
Total results: 2
This simple tool should give you a good idea of the current status of running processes and the status of terminated ones. For a complete list of all the available options, please refer to the documentation of verdi process.
If you are looking for information about a specific process node, the following three commands are at your disposal:
verdi process report
gives a list of the log messages attached to the processverdi process status
print the call hierarchy of the process and status of all its nodesverdi process show
print details about the status, inputs, outputs, callers and callees of the process
In the following sections, we will explain briefly how the commands work.
For the purpose of example, we will show the output of the commands for a completed PwBaseWorkChain
from the aiida-quantumespresso
plugin, which simply calls a PwCalculation
.
verdi process report#
The developer of a process can attach log messages to the node of a process through the report()
method.
The verdi process report
command will display all the log messages in chronological order:
2018-04-08 21:18:51 [164 | REPORT]: [164|PwBaseWorkChain|run_calculation]: launching PwCalculation<167> iteration #1
2018-04-08 21:18:55 [164 | REPORT]: [164|PwBaseWorkChain|inspect_calculation]: PwCalculation<167> completed successfully
2018-04-08 21:18:56 [164 | REPORT]: [164|PwBaseWorkChain|results]: work chain completed after 1 iterations
2018-04-08 21:18:56 [164 | REPORT]: [164|PwBaseWorkChain|on_terminated]: remote folders will not be cleaned
The log message will include a timestamp followed by the level of the log, which is always REPORT
.
The second block has the format pk|class name|function name
detailing information about, in this case, the work chain itself and the step in which the message was fired.
Finally, the message itself is displayed.
Of course how many messages are logged and how useful they are is up to the process developer.
In general they can be very useful for a user to understand what has happened during the execution of the process, however, one has to realize that each entry is stored in the database, so overuse can unnecessarily bloat the database.
verdi process status#
This command is most useful for WorkChain
instances, but also works for CalcJobs
.
One of the more powerful aspects of work chains, is that they can call CalcJobs
and other WorkChains
to create a nested call hierarchy.
If you want to inspect the status of a work chain and all the children that it called, verdi process status
is the go-to tool.
An example output is the following:
PwBaseWorkChain <pk=164> [ProcessState.FINISHED] [4:results]
└── PwCalculation <pk=167> [FINISHED]
The command prints a tree representation of the hierarchical call structure, that recurses all the way down.
In this example, there is just a single PwBaseWorkChain
which called a PwCalculation
, which is indicated by it being indented one level.
In addition to the call tree, each node also shows its current process state and for work chains at which step in the outline it is.
This tool can be very useful to inspect while a work chain is running at which step in the outline it currently is, as well as the status of all the children calculations it called.
verdi process show#
Finally, there is a command that displays detailed information about the ProcessNode
, such as its inputs, outputs and the optional other processes it called and or was called by.
An example output for a PwBaseWorkChain
would look like the following:
Property Value
------------- ------------------------------------
type WorkChainNode
pk 164
uuid 08bc5a3c-da7d-44e0-a91c-dda9ddcb638b
label
description
ctime 2018-04-08 21:18:50.850361+02:00
mtime 2018-04-08 21:18:50.850372+02:00
process state ProcessState.FINISHED
exit status 0
code pw-v6.1
Inputs PK Type
-------------- ---- -------------
parameters 158 Dict
structure 140 StructureData
kpoints 159 KpointsData
pseudo_family 161 Str
max_iterations 163 Int
clean_workdir 160 Bool
options 162 Dict
Outputs PK Type
----------------- ---- -------------
output_band 170 BandsData
remote_folder 168 RemoteData
output_parameters 171 Dict
output_array 172 ArrayData
Called PK Type
-------- ---- -------------
CALL 167 PwCalculation
Log messages
---------------------------------------------
There are 4 log messages for this calculation
Run 'verdi process report 164' to see them
This overview should give you all the information if you want to inspect a process’ inputs and outputs in closer detail as it provides you their pk’s.
Manipulating processes#
To understand how one can manipulate running processes, one has to understand the principles of the process/node distinction and a process’ lifetime first, so be sure to have read those sections first.
verdi process pause/play/kill#
The verdi
command line interface provides three commands to interact with ‘live’ processes.
verdi process pause
verdi process play
verdi process kill
The first pauses a process temporarily, the second resumes any paused processes and the third one permanently kills them. The sub command names might seem to tell you this already and it might look like that is all there is to know, but the functionality underneath is quite complicated and deserves additional explanation nonetheless.
As the section on the distinction between the process and the node explained, manipulating a process means interacting with the live process instance that lives in the memory of the runner that is running it. By definition, these runners will always run in a different system process than the one from which you want to interact, because otherwise, you would be the runner, given that there can only be a single runner in an interpreter and if it is running, the interpreter would be blocked from performing any other operations. This means that in order to interact with the live process, one has to interact with another interpreter running in a different system process. This is once again facilitated by the RabbitMQ message broker. When a runner starts to run a process, it will also add listeners for incoming messages that are being sent for that specific process over RabbitMQ.
Note
This does not just apply to daemon runners, but also local runners. If you were to launch a process in a local runner, that interpreter will be blocked, but it will still setup the listeners for that process on RabbitMQ. This means that you can manipulate the process from another terminal, just as you would do with a process that is being run by a daemon runner.
In the case of ‘pause’, ‘play’ and ‘kill’, one is sending what is called a Remote Procedure Call (RPC) over RabbitMQ. The RPC will include the process identifier for which the action is intended and RabbitMQ will send it to whoever registered itself to be listening for that specific process, in this case the runner that is running the process. This immediately reveals a potential problem: the RPC will fall on deaf ears if there is no one listening, which can have multiple causes. For example, as explained in the section on a process’ lifetime, this can be the case for a submitted process, where the corresponding task is still queued, as all available process slots are occupied. But even if the task were to be with a runner, it might be too busy to respond to the RPC and the process appears to be unreachable. Whenever a process is unreachable for an RPC, the command will return an error:
Error: Process<100> is unreachable
Depending on the cause of the process being unreachable, the problem may resolve itself automatically over time and one can try again at a later time, as for example in the case of the runner being too busy to respond. To minimize these issues, the runner has been designed to have the communication happen over a separate thread and to schedule callbacks for any necessary actions on the main thread, which performs all the heavy lifting. Unfortunately, there is no easy way of telling what the actual problem is for the process not being reachable. The problem will manifest itself identically if the runner just could not respond in time or if the task has accidentally been lost forever due to a bug, even though these are two completely separate situations.
This brings us to another potential unintuitive aspect of interacting with processes. The previous paragraph already mentioned it in passing, but when a remote procedure call is sent, it first needs to be answered by the responsible runner, if applicable, but it will not directly execute the call. This is because the call will be incoming on the communication thread which is not allowed to have direct access to the process instance, but instead it will schedule a callback on the main thread which can perform the action. The callback will however not necessarily be executed directly, as there may be other actions waiting to be performed. So when you pause, play or kill a process, you are not doing so directly, but rather you are scheduling a request to do so. If the runner has successfully received the request and scheduled the callback, the command will therefore show something like the following:
Success: scheduled killing Process<100>
The ‘scheduled’ indicates that the actual killing might not necessarily have happened just yet.
This means that even after having called verdi process kill
and getting the success message, the corresponding process may still be listed as active in the output of verdi process list
.
By default, the pause
, play
and kill
commands will only ask for the confirmation of the runner that the request has been scheduled and not actually wait for the command to have been executed.
To change this behavior, you can use the --wait
flag to actually wait for the action to be completed.
If workers are under heavy load, it may take some time for them to respond to the request and for the command to finish.
If you know that your daemon runners may be experiencing a heavy load, you can also increase the time that the command waits before timing out, with the -t/--timeout
flag.
Footnotes
The processes API#
The functionality of verdi process
to play
, pause
and kill
is now made available through the aiida.engine.processes.control()
module.
Processes can be played, paused or killed through the play_processes()
, pause_processes()
, and kill_processes()
, respectively:
from aiida.engine.processes import control
processes = [load_node(<PK1>), load_node(<PK2>)]
pause_processes(processes) # Pause the processes
play_processes(processes) # Play them again
kill_processes(processes) # Kill the processes
Instead of specifying an explicit list of processes, the functions also take the all_entries
keyword argument:
pause_processes(all_entries=True) # Pause all running processes