How to write a plugin for an external code#
Tip
Before starting to write a new plugin, check the aiida plugin registry. If a plugin for your code is already available, you can skip straight to How to run external codes.
Tip
This how to walks you through all logical steps of how AiiDA interacts with an external code. If you already know the basics and would like to get started with a new plugin package quickly, check out How to package plugins.
To run an external code with AiiDA, you need a corresponding calculation plugin, which tells AiiDA how to:
Prepare the required input files.
Run the code with the correct command line parameters.
Finally, you will probably want a parser plugin, which tells AiiDA how to:
Parse the output of the code.
This how-to takes you through the process of creating a calculation plugin, using it to run the code, and writing a parser for its outputs.
In this example, our AbstractCode
will be the diff
executable that “computes” the difference between two “input files” and prints the difference to standard output:
$ cat file1.txt
file with content
content1
$ cat file2.txt
file with content
content2
$ diff file1.txt file2.txt
2c2
< content1
---
> content2
We are using diff
here since it is available on almost every UNIX system by default, and it takes both command line arguments (the two files) and command line options (e.g. -i
for case-insensitive matching).
This is similar to how the executables of many scientific simulation codes work, making it easy to adapt this example to your use case.
We will run diff
as:
$ diff file1.txt file2.txt > diff.patch
thus writing difference between file1.txt and file2.txt to diff.patch.
Interfacing external codes#
Start by creating a file calculations.py
and subclass the CalcJob
class:
from aiida.common import datastructures
from aiida.engine import CalcJob
from aiida.orm import SinglefileData
class DiffCalculation(CalcJob):
"""AiiDA calculation plugin wrapping the diff executable."""
In the following, we will tell AiiDA how to run our code by implementing two key methods:
Defining the spec#
The define
method tells AiiDA which inputs the CalcJob
expects and which outputs it produces (exit codes will be discussed later).
This is done through an instance of the CalcJobProcessSpec
class, which is passed as the spec
argument to the define
method.
For example:
@classmethod
def define(cls, spec):
"""Define inputs and outputs of the calculation."""
super(DiffCalculation, cls).define(spec)
# new ports
spec.input('file1', valid_type=SinglefileData, help='First file to be compared.')
spec.input('file2', valid_type=SinglefileData, help='Second file to be compared.')
spec.output('diff', valid_type=SinglefileData, help='diff between file1 and file2.')
spec.input('metadata.options.output_filename', valid_type=str, default='patch.diff')
spec.inputs['metadata']['options']['resources'].default = {
'num_machines': 1,
'num_mpiprocs_per_machine': 1,
}
spec.inputs['metadata']['options']['parser_name'].default = 'diff-tutorial'
spec.exit_code(
300, 'ERROR_MISSING_OUTPUT_FILES', message='Calculation did not produce all expected output files.'
)
The first line of the method calls the define
method of the CalcJob
parent class.
This necessary step defines the inputs and outputs that are common to all CalcJob
’s.
Next, we use the input()
method in order to define our two input files file1
and file2
of type SinglefileData
.
Further reading
When using SinglefileData
, AiiDA keeps track of the inputs as files.
This is very flexible but has the downside of making it difficult to query for information contained in those files and ensuring that the inputs are valid.
Exercise - Support command-line options shows how to use the Dict
class to represent the diff
command line options as a python dictionary.
The aiida-diff demo plugin goes further and adds automatic validation.
We then use output()
to define the only output of the calculation with the label diff
.
AiiDA will attach the outputs defined here to a (successfully) finished calculation using the link label provided.
Finally, we set a few default options
, such as the name of the parser (which we will implement later), the name of input and output files, and the computational resources to use for such a calculation.
These options
have already been defined on the spec
by the super().define(spec)
call, and they can be accessed through the inputs
attribute, which behaves like a dictionary.
There is no return
statement in define
: the define
method directly modifies the spec
object it receives.
Note
One more input required by any CalcJob
is which external executable to use.
External executables are represented by AbstractCode
instances that contain information about the computer they reside on, their path in the file system and more.
They are passed to a CalcJob
via the code
input, which is defined in the CalcJob
base class, so you don’t have to:
spec.input('code', valid_type=orm.AbstractCode, help='The `Code` to use for this job.')
Further reading
For more details on setting up your inputs and outputs (covering validation, dynamic number of inputs, etc.) see the Defining Processes topic.
Preparing for submission#
The prepare_for_submission()
method has two jobs:
Creating the input files in the format the external code expects and returning a CalcInfo
object that contains instructions for the AiiDA engine on how the code should be run.
For example:
def prepare_for_submission(self, folder):
"""Create input files.
:param folder: an `aiida.common.folders.Folder` where the plugin should temporarily place all files needed by
the calculation.
:return: `aiida.common.datastructures.CalcInfo` instance
"""
codeinfo = datastructures.CodeInfo()
codeinfo.cmdline_params = [self.inputs.file1.filename, self.inputs.file2.filename]
codeinfo.code_uuid = self.inputs.code.uuid
codeinfo.stdout_name = self.metadata.options.output_filename
# Prepare a `CalcInfo` to be returned to the engine
calcinfo = datastructures.CalcInfo()
calcinfo.codes_info = [codeinfo]
calcinfo.local_copy_list = [
(self.inputs.file1.uuid, self.inputs.file1.filename, self.inputs.file1.filename),
(self.inputs.file2.uuid, self.inputs.file2.filename, self.inputs.file2.filename),
]
calcinfo.retrieve_list = [self.metadata.options.output_filename]
return calcinfo
All inputs provided to the calculation are validated against the spec
before prepare_for_submission()
is called.
Therefore, when accessing the inputs
attribute, you can safely assume that all required inputs have been set and that all inputs have a valid type.
We start by creating a CodeInfo
object that lets AiiDA know how to run the code, i.e. here:
$ diff file1.txt file2.txt > diff.patch
This includes the command line parameters (here: the names of the files that we would like to diff
) and the UUID of the AbstractCode
to run.
Since diff
writes directly to standard output, we redirect standard output to the specified output filename.
Next, we create a CalcInfo
object that lets AiiDA know which files to copy back and forth.
In our example, the two input files are already stored in the AiiDA file repository and we can use the local_copy_list
to pass them along.
Note
In other use cases you may need to create new files on the fly.
This is what the folder
argument of prepare_for_submission()
is for:
with folder.open("filename", 'w') as handle:
handle.write("file content")
Any files and directories created in this sandbox folder will automatically be transferred to the compute resource where the actual calculation takes place.
The retrieve_list
on the other hand tells the engine which files to retrieve from the directory where the job ran after it has finished.
All files listed here will be store in a FolderData
node that is attached as an output node to the calculation with the label retrieved
.
Finally, we pass the CodeInfo
to a CalcInfo
object.
One calculation job can involve more than one executable, so codes_info
is a list.
If you have more than one executable in your codes_info
, you can set codes_run_mode
to specify the mode with which these will be executed (CodeRunMode.SERIAL by default).
We define the retrieve_list
of filenames that the engine should retrieve from the directory where the job ran after it has finished.
The engine will store these files in a FolderData
node that will be attached as an output node to the calculation with the label retrieved
.
Further reading
There are other file lists available that allow you to easily customize how to move files to and from the remote working directory in order to prevent the creation of unnecessary copies.
For more details on the CalcJob
class, refer to the Topics section on defining calculations.
Parsing the outputs#
Parsing the output files produced by a code into AiiDA nodes is optional, but it can make your data queryable and therefore easier to access and analyze.
To create a parser plugin, subclass the Parser
class in a file called parsers.py
.
from aiida.engine import ExitCode
from aiida.orm import SinglefileData
from aiida.parsers.parser import Parser
from aiida.plugins import CalculationFactory
DiffCalculation = CalculationFactory('diff-tutorial')
class DiffParser(Parser):
Before the parse()
method is called, two important attributes are set on the Parser
instance:
self.retrieved
: An instance ofFolderData
, which points to the folder containing all output files that theCalcJob
instructed to retrieve, and provides the means toopen()
any file it contains.self.node
: TheCalcJobNode
representing the finished calculation, which, among other things, provides access to all of its inputs (self.node.inputs
).
Now implement its parse()
method as
def parse(self, **kwargs):
"""Parse outputs, store results in database."""
output_filename = self.node.get_option('output_filename')
# add output file
self.logger.info(f"Parsing '{output_filename}'")
with self.retrieved.open(output_filename, 'rb') as handle:
output_node = SinglefileData(file=handle)
self.out('diff', output_node)
return ExitCode(0)
The get_option()
convenience method is used to get the filename of the output file.
Finally, the out()
method is used return the output file as the diff
output of the calculation:
The first argument is the name to be used as the label for the link that connects the calculation and data node.
The second argument is the node that should be recorded as an output.
Note
The outputs and their types need to match those from the process specification of the corresponding CalcJob
(or an exception will be raised).
In this minimalist example, there isn’t actually much parsing going on – we are simply passing along the output file as a SinglefileData
node.
If your code produces output in a structured format, instead of just returning the file you may want to parse it e.g. to a python dictionary (Dict
node) to make the results easily searchable.
Exercise
Consider the different output files produced by your favorite simulation code. Which information would you want to:
parse into the database for querying (e.g. as
Dict
,StructureData
, …)?store in the AiiDA file repository for safe-keeping (e.g. as
SinglefileData
, …)?leave on the computer where the calculation ran (e.g. recording their remote location using
RemoteData
or simply ignoring them)?
Once you know the answers to these questions, you are ready to start writing a parser for your code.
In order to request automatic parsing of a CalcJob
(once it has finished), users can set the metadata.options.parser_name
input when launching the job.
If a particular parser should be used by default, the CalcJob
define
method can set a default value for the parser name as was done in the previous section:
@classmethod
def define(cls, spec):
...
spec.inputs['metadata']['options']['parser_name'].default = 'diff-tutorial'
Note that the default is not set to the Parser
class itself, but to the entry point string under which the parser class is registered.
We will register the entry point for the parser in a bit.
Handling parsing errors#
So far, we have not spent much attention on dealing with potential errors that can arise when running external codes.
However, there are lots of ways in which codes can fail to execute nominally.
A Parser
can play an important role in detecting and communicating such errors, where workflows can then decide how to proceed, e.g., by modifying input parameters and resubmitting the calculation.
Parsers communicate errors through exit codes, which are defined in the spec
of the CalcJob
they parse.
The DiffCalculation
example, defines the following exit code:
spec.exit_code(300, 'ERROR_MISSING_OUTPUT_FILES', message='Calculation did not produce all expected output files.')
An exit_code
defines:
an exit status (a positive integer, following the Exit code conventions),
a label that can be used to reference the code in the
parse
method (through theself.exit_codes
property, as shown below), anda message that provides a more detailed description of the problem.
In order to inform AiiDA about a failed calculation, simply return from the parse
method the exit code that corresponds to the detected issue.
Here is a more complete version of the example Parser
presented in the previous section:
def parse(self, **kwargs):
"""Parse outputs, store results in database.
:returns: non-zero exit code, if parsing fails
"""
output_filename = self.node.get_option('output_filename')
# Check that folder content is as expected
files_retrieved = self.retrieved.list_object_names()
files_expected = [output_filename]
# Note: set(A) <= set(B) checks whether A is a subset of B
if not set(files_expected) <= set(files_retrieved):
self.logger.error(f"Found files '{files_retrieved}', expected to find '{files_expected}'")
return self.exit_codes.ERROR_MISSING_OUTPUT_FILES
# add output file
self.logger.info(f"Parsing '{output_filename}'")
with self.retrieved.open(output_filename, 'rb') as handle:
output_node = SinglefileData(file=handle)
self.out('diff', output_node)
return ExitCode(0)
This simple check makes sure that the expected output file diff.patch
is among the files retrieved from the computer where the calculation was run.
Production plugins will often scan further aspects of the output (e.g. the standard error, the output file, etc.) for any issues that may indicate a problem with the calculation and return a corresponding exit code.
AiiDA stores the exit code returned by the parse
method on the calculation node that is being parsed, from where it can then be inspected further down the line (see the defining processes topic for more details).
Note that some scheduler plugins can detect issues at the scheduler level (by parsing the job scheduler output) and set an exit code.
The Topics section on scheduler exit codes explains how these can be inspected inside a parser and how they can optionally be overridden.
Customizations#
Process label#
Each time a Process
is run, a ProcessNode
is stored in the database to record the execution.
A human-readable label is stored in the process_label
attribute.
By default, the name of the process class is used as this label.
If this default is not informative enough, it can be customized by overriding the _build_process_label()
: method:
class SomeProcess(Process):
def _build_process_label(self):
return 'custom_process_label'
Nodes created through executions of this process class will have node.process_label == 'custom_process_label'
.
Registering entry points#
Entry points are the preferred method of registering new calculation, parser and other plugins with AiiDA.
With your calculations.py
and parsers.py
files at hand, let’s register entry points for the plugins they contain:
Move your two scripts into a subfolder
aiida_diff_tutorial
:
$ mkdir aiida_diff_tutorial
$ mv calculations.py parsers.py aiida_diff_tutorial/
$ touch aiida_diff_tutorial/__init__.py
You have just created an aiida_diff_tutorial
Python package!
Add a minimal set of metadata for your package by writing a
pyproject.toml
file:
[build-system]
# build the package with [flit](https://flit.readthedocs.io)
requires = ["flit_core >=3.4,<4"]
build-backend = "flit_core.buildapi"
[project]
# See https://www.python.org/dev/peps/pep-0621/
name = "aiida-diff-tutorial"
version = "0.1.0"
description = "AiiDA demo plugin"
dependencies = [
"aiida-core>=2.0,<3",
]
[project.entry-points."aiida.calculations"]
"diff-tutorial" = "aiida_diff_tutorial.calculations:DiffCalculation"
[project.entry-points."aiida.parsers"]
"diff-tutorial" = "aiida_diff_tutorial.parsers:DiffParser"
[tool.flit.module]
name = "aiida_diff_tutorial"
Note
This allows for the project metadata to be fully specified in the pyproject.toml file, using the PEP 621 format.
Install your new
aiida-diff-tutorial
plugin package.
$ pip install -e . # install package in "editable mode"
See the How to install plugins section for details.
After this, you should see your plugins listed:
$ verdi plugin list aiida.calculations
$ verdi plugin list aiida.calculations diff-tutorial
$ verdi plugin list aiida.parsers
Running a calculation#
With the entry points set up, you are ready to launch your first calculation with the new plugin:
If you haven’t already done so, set up your computer. In the following we assume it to be the localhost:
$ verdi computer setup -L localhost -H localhost -T core.local -S core.direct -w `echo $PWD/work` -n
$ verdi computer configure core.local localhost --safe-interval 5 -n
Create the input files for our calculation
$ echo -e "File with content\ncontent1" > file1.txt
$ echo -e "File with content\ncontent2" > file2.txt
$ mkdir input_files
$ mv file1.txt file2.txt input_files
Write a
launch.py
script:
"""Launch a calculation using the 'diff-tutorial' plugin"""
from pathlib import Path
from aiida import engine, orm
from aiida.common.exceptions import NotExistent
INPUT_DIR = Path(__file__).resolve().parent / 'input_files'
# Create or load code
computer = orm.load_computer('localhost')
try:
code = orm.load_code('diff@localhost')
except NotExistent:
# Setting up code via python API (or use "verdi code setup")
code = orm.InstalledCode(
label='diff', computer=computer, filepath_executable='/usr/bin/diff', default_calc_job_plugin='diff-tutorial'
)
# Set up inputs
builder = code.get_builder()
builder.file1 = orm.SinglefileData(file=INPUT_DIR / 'file1.txt')
builder.file2 = orm.SinglefileData(file=INPUT_DIR / 'file2.txt')
builder.metadata.description = 'Test job submission with the aiida_diff_tutorial plugin'
# Run the calculation & parse results
result = engine.run(builder)
computed_diff = result['diff'].get_content()
print(f'Computed diff between files:\n{computed_diff}')
Note
The launch.py
script sets up an AiiDA AbstractCode
instance that associates the /usr/bin/diff
executable with the DiffCalculation
class (through its entry point diff
).
This code is automatically set on the code
input port of the builder and passed as an input to the calculation plugin.
Launch the calculation:
$ verdi run launch.py
If everything goes well, this should print the results of your calculation, something like:
$ verdi run launch.py
Computed diff between files:
2c2
< content1
---
> content2
Tip
If you encountered a parsing error, it can be helpful to make a Dry run, which allows you to inspect the input folder generated by AiiDA before any calculation is launched.
Finally instead of running your calculation in the current shell, you can submit your calculation to the AiiDA daemon:
(Re)start the daemon to update its Python environment:
$ verdi daemon restart
Update your launch script to use:
# Submit calculation to the aiida daemon
node = engine.submit(builder)
print("Submitted calculation {}".format(node))
Note
node
is the CalcJobNode
representing the state of the underlying calculation process (which may not be finished yet).
Launch the calculation:
$ verdi run launch.py
This should print the UUID and the PK of the submitted calculation.
You can use the verdi command line interface to monitor this processes:
$ verdi process list -a -p1
This should show the processes of both calculations you just ran.
Use verdi calcjob outputcat <pk>
to check the output of the calculation you submitted to the daemon.
Congratulations - you can now write plugins for external simulation codes and use them to submit calculations!
If you still have time left, consider going through the optional exercise below.
Writing importers for existing computations#
New users to your plugin may often have completed many previous computations without the use of AiiDA, which they wish to import into AiiDA.
In these cases, it is possible to write an importer for their inputs/outputs, which generates the provenance nodes for the corresponding CalcJob
.
The importer must be written as a subclass of CalcJobImporter
,
for an example see aiida.calculations.importers.arithmetic.add.ArithmeticAddCalculationImporter
.
To associate the importer with the CalcJob
class, the importer must be registered with an entry point in the group aiida.calculations.importers
.
[project.entry-points."aiida.calculations.importers"]
"core.arithmetic.add" = "aiida.calculations.importers.arithmetic.add:ArithmeticAddCalculationImporter"
Note
Note that the entry point name can be any valid entry point name.
If the importer plugin is provided by the same package as the corresponding CalcJob
plugin, it is recommended that the entry point name of the importer and CalcJob
plugin are the same.
This will allow the get_importer()
method to automatically fetch the associated importer.
If the entry point names differ, the entry point name of the desired importer implementation needs to be passed to get_importer()
as an argument.
Users can then import their calculations via the get_importer()
method:
from aiida.plugins import CalculationFactory
ArithmeticAddCalculation = CalculationFactory('arithmetic.add')
importer = ArithmeticAddCalculation.get_importer()
remote_data = RemoteData('/some/absolute/path', computer=load_computer('computer'))
inputs = importer.parse_remote_data(remote_data)
results, node = run.get_node(ArithmeticAddCalculation, **inputs)
assert node.is_imported
See also
AEP 004: Infrastructure to import completed calculation jobs, for the design considerations around this feature.
Exercise - Support command-line options#
As discussed before, diff
knows a couple of command-line options:
$ diff --help
Usage: diff [OPTION]... FILES
Compare files line by line.
...
-i, --ignore-case ignore case differences in file contents
-E, --ignore-tab-expansion ignore changes due to tab expansion
-b, --ignore-space-change ignore changes in the amount of white space
-w, --ignore-all-space ignore all white space
-B, --ignore-blank-lines ignore changes where lines are all blank
-I, --ignore-matching-lines=RE ignore changes where all lines match RE
...
For simplicity let’s focus on the excerpt of options shown above and allow the user of our plugin to pass these along.
Notice that one of the options (--ignore-matching-lines
) requires the user to pass a regular expression string, while the other options don’t require any value.
One way to represent a set of command line options like
diff --ignore-case --ignore-matching-lines='.*ABC.*'
would be using a python dictionary:
parameters = {
'ignore-case': True,
'ignore-space-change': False,
'ignore-matching-lines': '.*ABC.*'
}
Here is a simple code snippet for translating the dictionary to a list of command line options:
def cli_options(parameters):
"""Return command line options for parameters dictionary.
:param dict parameters: dictionary with command line parameters
"""
options = []
for key, value in parameters.items():
# Could validate: is key a known command-line option?
if isinstance(value, bool) and value:
options.append(f'--{key}')
elif isinstance(value, str):
# Could validate: is value a valid regular expression?
options.append(f'--{key}')
options.append(value)
return options
Note
When passing parameters along to your simulation code, try validating them. This detects errors directly at submission of the calculation and thus prevents calculations with malformed inputs from ever entering the queue of your HPC system.
For the sake of brevity we are not performing validation here but there are numerous python libraries, such as voluptuous (used by aiida-diff, see example), marshmallow or pydantic, that help you define a schema to validate input against.
Let’s open our previous calculations.py
file and start modifying the DiffCalculation
class:
In the
define
method, add a newinput
to thespec
with label'parameters'
and typeDict
(from aiida.orm import Dict
)In the
prepare_for_submission
method run thecli_options
function from above onself.inputs.parameters.get_dict()
to get the list of command-line options. Add them to thecodeinfo.cmdline_params
.
Solution
For 1. add the following line to the define
method:
spec.input('parameters', valid_type=Dict, help='diff command-line parameters')
For 2. copy the cli_options
snippet at the end of calculations.py
and set the cmdline_params
to:
codeinfo.cmdline_params = cli_options(self.inputs.parameters.get_dict()) + [ self.inputs.file1.filename, self.inputs.file2.filename]
That’s it. Let’s now open the launch.py
script and pass along our command line parameters:
...
builder.parameters = orm.Dict(dict={'ignore-case': True})
...
Change the capitalization of one of the characters in the first line of file1.txt
.
Then, restart the daemon and submit the new calculation:
$ verdi daemon restart
$ verdi run launch.py
If everything worked as intended, the capitalization difference in the first line should be ignored (and thus not show up in the output).
This marks the end of this how-to.
The CalcJob
and Parser
plugins are still rather basic and the aiida-diff-tutorial
plugin package is missing a number of useful features, such as package metadata, documentation, tests, CI, etc.
Continue with How to package plugins in order to learn how to quickly create a feature-rich new plugin package from scratch.