How to write a plugin for an external code#

Tip

Before starting to write a new plugin, check the aiida plugin registry. If a plugin for your code is already available, you can skip straight to How to run external codes.

Tip

This how to walks you through all logical steps of how AiiDA interacts with an external code. If you already know the basics and would like to get started with a new plugin package quickly, check out How to package plugins.

To run an external code with AiiDA, you need a corresponding calculation plugin, which tells AiiDA how to:

  1. Prepare the required input files.

  2. Run the code with the correct command line parameters.

Finally, you will probably want a parser plugin, which tells AiiDA how to:

  1. Parse the output of the code.

This how-to takes you through the process of creating a calculation plugin, using it to run the code, and writing a parser for its outputs.

In this example, our Code will be the diff executable that “computes” the difference between two “input files” and prints the difference to standard output:

$ cat file1.txt
file with content
content1

$ cat file2.txt
file with content
content2

$ diff file1.txt file2.txt
2c2
< content1
---
> content2

We are using diff here since it is available on almost every UNIX system by default, and it takes both command line arguments (the two files) and command line options (e.g. -i for case-insensitive matching). This is similar to how the executables of many scientific simulation codes work, making it easy to adapt this example to your use case.

We will run diff as:

$ diff file1.txt file2.txt > diff.patch

thus writing difference between file1.txt and file2.txt to diff.patch.

Interfacing external codes#

Start by creating a file calculations.py and subclass the CalcJob class:

from aiida.common import datastructures
from aiida.engine import CalcJob
from aiida.orm import SinglefileData

class DiffCalculation(CalcJob):
    """AiiDA calculation plugin wrapping the diff executable."""

In the following, we will tell AiiDA how to run our code by implementing two key methods:

Defining the spec#

The define method tells AiiDA which inputs the CalcJob expects and which outputs it produces (exit codes will be discussed later). This is done through an instance of the CalcJobProcessSpec class, which is passed as the spec argument to the define method. For example:

    @classmethod
    def define(cls, spec):
        """Define inputs and outputs of the calculation."""
        # yapf: disable
        super(DiffCalculation, cls).define(spec)

        # new ports
        spec.input('file1', valid_type=SinglefileData, help='First file to be compared.')
        spec.input('file2', valid_type=SinglefileData, help='Second file to be compared.')
        spec.output('diff', valid_type=SinglefileData, help='diff between file1 and file2.')

        spec.input('metadata.options.output_filename', valid_type=str, default='patch.diff')
        spec.inputs['metadata']['options']['resources'].default = {
                                            'num_machines': 1,
                                            'num_mpiprocs_per_machine': 1,
                                            }
        spec.inputs['metadata']['options']['parser_name'].default = 'diff-tutorial'

        spec.exit_code(300, 'ERROR_MISSING_OUTPUT_FILES',
                message='Calculation did not produce all expected output files.')

The first line of the method calls the define method of the CalcJob parent class. This necessary step defines the inputs and outputs that are common to all CalcJob’s.

Next, we use the input() method in order to define our two input files file1 and file2 of type SinglefileData.

Further reading

When using SinglefileData, AiiDA keeps track of the inputs as files. This is very flexible but has the downside of making it difficult to query for information contained in those files and ensuring that the inputs are valid. Exercise - Support command-line options shows how to use the Dict class to represent the diff command line options as a python dictionary. The aiida-diff demo plugin goes further and adds automatic validation.

We then use output() to define the only output of the calculation with the label diff. AiiDA will attach the outputs defined here to a (successfully) finished calculation using the link label provided.

Finally, we set a few default options, such as the name of the parser (which we will implement later), the name of input and output files, and the computational resources to use for such a calculation. These options have already been defined on the spec by the super().define(spec) call, and they can be accessed through the inputs attribute, which behaves like a dictionary.

There is no return statement in define: the define method directly modifies the spec object it receives.

Note

One more input required by any CalcJob is which external executable to use.

External executables are represented by Code instances that contain information about the computer they reside on, their path in the file system and more. They are passed to a CalcJob via the code input, which is defined in the CalcJob base class, so you don’t have to:

spec.input('code', valid_type=orm.Code, help='The `Code` to use for this job.')

Further reading

For more details on setting up your inputs and outputs (covering validation, dynamic number of inputs, etc.) see the Defining Processes topic.

Preparing for submission#

The prepare_for_submission() method has two jobs: Creating the input files in the format the external code expects and returning a CalcInfo object that contains instructions for the AiiDA engine on how the code should be run. For example:

    def prepare_for_submission(self, folder):
        """
        Create input files.

        :param folder: an `aiida.common.folders.Folder` where the plugin should temporarily place all files needed by
            the calculation.
        :return: `aiida.common.datastructures.CalcInfo` instance
        """
        codeinfo = datastructures.CodeInfo()
        codeinfo.cmdline_params = [self.inputs.file1.filename, self.inputs.file2.filename]
        codeinfo.code_uuid = self.inputs.code.uuid
        codeinfo.stdout_name = self.metadata.options.output_filename

        # Prepare a `CalcInfo` to be returned to the engine
        calcinfo = datastructures.CalcInfo()
        calcinfo.codes_info = [codeinfo]
        calcinfo.local_copy_list = [
            (self.inputs.file1.uuid, self.inputs.file1.filename, self.inputs.file1.filename),
            (self.inputs.file2.uuid, self.inputs.file2.filename, self.inputs.file2.filename),
        ]
        calcinfo.retrieve_list = [self.metadata.options.output_filename]

        return calcinfo

All inputs provided to the calculation are validated against the spec before prepare_for_submission() is called. Therefore, when accessing the inputs attribute, you can safely assume that all required inputs have been set and that all inputs have a valid type.

We start by creating a CodeInfo object that lets AiiDA know how to run the code, i.e. here:

$ diff file1.txt file2.txt > diff.patch

This includes the command line parameters (here: the names of the files that we would like to diff) and the UUID of the Code to run. Since diff writes directly to standard output, we redirect standard output to the specified output filename.

Next, we create a CalcInfo object that lets AiiDA know which files to copy back and forth. In our example, the two input files are already stored in the AiiDA file repository and we can use the local_copy_list to pass them along.

Note

In other use cases you may need to create new files on the fly. This is what the folder argument of prepare_for_submission() is for:

with folder.open("filename", 'w') as handle:
    handle.write("file content")

Any files and directories created in this sandbox folder will automatically be transferred to the compute resource where the actual calculation takes place.

The retrieve_list on the other hand tells the engine which files to retrieve from the directory where the job ran after it has finished. All files listed here will be store in a FolderData node that is attached as an output node to the calculation with the label retrieved.

Finally, we pass the CodeInfo to a CalcInfo object. One calculation job can involve more than one executable, so codes_info is a list. If you have more than one executable in your codes_info, you can set codes_run_mode to specify the mode with which these will be executed (CodeRunMode.SERIAL by default). We define the retrieve_list of filenames that the engine should retrieve from the directory where the job ran after it has finished. The engine will store these files in a FolderData node that will be attached as an output node to the calculation with the label retrieved.

Further reading

There are other file lists available that allow you to easily customize how to move files to and from the remote working directory in order to prevent the creation of unnecessary copies. For more details on the CalcJob class, refer to the Topics section on defining calculations.

Parsing the outputs#

Parsing the output files produced by a code into AiiDA nodes is optional, but it can make your data queryable and therefore easier to access and analyze.

To create a parser plugin, subclass the Parser class in a file called parsers.py.

from aiida.engine import ExitCode
from aiida.orm import SinglefileData
from aiida.parsers.parser import Parser
from aiida.plugins import CalculationFactory

DiffCalculation = CalculationFactory('diff-tutorial')


class DiffParser(Parser):

Before the parse() method is called, two important attributes are set on the Parser instance:

  1. self.retrieved: An instance of FolderData, which points to the folder containing all output files that the CalcJob instructed to retrieve, and provides the means to open() any file it contains.

  2. self.node: The CalcJobNode representing the finished calculation, which, among other things, provides access to all of its inputs (self.node.inputs).

Now implement its parse() method as

    def parse(self, **kwargs):
        """
        Parse outputs, store results in database.
        """

        output_filename = self.node.get_option('output_filename')

        # add output file
        self.logger.info(f"Parsing '{output_filename}'")
        with self.retrieved.open(output_filename, 'rb') as handle:
            output_node = SinglefileData(file=handle)
        self.out('diff', output_node)

        return ExitCode(0)

The get_option() convenience method is used to get the filename of the output file.

Finally, the out() method is used return the output file as the diff output of the calculation: The first argument is the name to be used as the label for the link that connects the calculation and data node. The second argument is the node that should be recorded as an output.

Note

The outputs and their types need to match those from the process specification of the corresponding CalcJob (or an exception will be raised).

In this minimalist example, there isn’t actually much parsing going on – we are simply passing along the output file as a SinglefileData node. If your code produces output in a structured format, instead of just returning the file you may want to parse it e.g. to a python dictionary (Dict node) to make the results easily searchable.

Exercise

Consider the different output files produced by your favorite simulation code. Which information would you want to:

  1. parse into the database for querying (e.g. as Dict, StructureData, …)?

  2. store in the AiiDA file repository for safe-keeping (e.g. as SinglefileData, …)?

  3. leave on the computer where the calculation ran (e.g. recording their remote location using RemoteData or simply ignoring them)?

Once you know the answers to these questions, you are ready to start writing a parser for your code.

In order to request automatic parsing of a CalcJob (once it has finished), users can set the metadata.options.parser_name input when launching the job. If a particular parser should be used by default, the CalcJob define method can set a default value for the parser name as was done in the previous section:

@classmethod
def define(cls, spec):
    ...
    spec.inputs['metadata']['options']['parser_name'].default = 'diff-tutorial'

Note that the default is not set to the Parser class itself, but to the entry point string under which the parser class is registered. We will register the entry point for the parser in a bit.

Handling parsing errors#

So far, we have not spent much attention on dealing with potential errors that can arise when running external codes. However, there are lots of ways in which codes can fail to execute nominally. A Parser can play an important role in detecting and communicating such errors, where workflows can then decide how to proceed, e.g., by modifying input parameters and resubmitting the calculation.

Parsers communicate errors through exit codes, which are defined in the spec of the CalcJob they parse. The DiffCalculation example, defines the following exit code:

spec.exit_code(300, 'ERROR_MISSING_OUTPUT_FILES', message='Calculation did not produce all expected output files.')

An exit_code defines:

  • an exit status (a positive integer, following the Exit code conventions),

  • a label that can be used to reference the code in the parse method (through the self.exit_codes property, as shown below), and

  • a message that provides a more detailed description of the problem.

In order to inform AiiDA about a failed calculation, simply return from the parse method the exit code that corresponds to the detected issue. Here is a more complete version of the example Parser presented in the previous section:

    def parse(self, **kwargs):
        """
        Parse outputs, store results in database.

        :returns: non-zero exit code, if parsing fails
        """

        output_filename = self.node.get_option('output_filename')

        # Check that folder content is as expected
        files_retrieved = self.retrieved.list_object_names()
        files_expected = [output_filename]
        # Note: set(A) <= set(B) checks whether A is a subset of B
        if not set(files_expected) <= set(files_retrieved):
            self.logger.error(f"Found files '{files_retrieved}', expected to find '{files_expected}'")
            return self.exit_codes.ERROR_MISSING_OUTPUT_FILES

        # add output file
        self.logger.info(f"Parsing '{output_filename}'")
        with self.retrieved.open(output_filename, 'rb') as handle:
            output_node = SinglefileData(file=handle)
        self.out('diff', output_node)

        return ExitCode(0)

This simple check makes sure that the expected output file diff.patch is among the files retrieved from the computer where the calculation was run. Production plugins will often scan further aspects of the output (e.g. the standard error, the output file, etc.) for any issues that may indicate a problem with the calculation and return a corresponding exit code.

AiiDA stores the exit code returned by the parse method on the calculation node that is being parsed, from where it can then be inspected further down the line (see the defining processes topic for more details). Note that some scheduler plugins can detect issues at the scheduler level (by parsing the job scheduler output) and set an exit code. The Topics section on scheduler exit codes explains how these can be inspected inside a parser and how they can optionally be overridden.

Registering entry points#

Entry points are the preferred method of registering new calculation, parser and other plugins with AiiDA.

With your calculations.py and parsers.py files at hand, let’s register entry points for the plugins they contain:

  • Move your two scripts into a subfolder aiida_diff_tutorial:

    $ mkdir aiida_diff_tutorial
    $ mv calculations.py parsers.py aiida_diff_tutorial/
    

    You have just created an aiida_diff_tutorial Python package!

  • Write a minimalistic setup.py script for your new package:

    from setuptools import setup
    
    setup(
        name='aiida-diff-tutorial',
        packages=['aiida_diff_tutorial'],
        entry_points={
            'aiida.calculations': ["diff-tutorial = aiida_diff_tutorial.calculations:DiffCalculation"],
            'aiida.parsers': ["diff-tutorial = aiida_diff_tutorial.parsers:DiffParser"],
        }
    )
    

    Note

    Strictly speaking, aiida-diff-tutorial is the name of the distribution, while aiida_diff_tutorial is the name of the package. The aiida-core documentation uses the term package a bit more loosely.

  • Install your new aiida-diff-tutorial plugin package.

    $ pip install -e .  # install package in "editable mode"
    

    See the How to install plugins section for details.

After this, you should see your plugins listed:

$ verdi plugin list aiida.calculations
$ verdi plugin list aiida.calculations diff-tutorial
$ verdi plugin list aiida.parsers

Running a calculation#

With the entry points set up, you are ready to launch your first calculation with the new plugin:

  • If you haven’t already done so, set up your computer. In the following we assume it to be the localhost:

    $ verdi computer setup -L localhost -H localhost -T core.local -S core.direct -w `echo $PWD/work` -n
    $ verdi computer configure core.local localhost --safe-interval 5 -n
    
  • Create the input files for our calculation

    $ echo -e "File with content\ncontent1" > file1.txt
    $ echo -e "File with content\ncontent2" > file2.txt
    $ mkdir input_files
    $ mv file1.txt file2.txt input_files
    
  • Write a launch.py script:

    # -*- coding: utf-8 -*-
    """Launch a calculation using the 'diff-tutorial' plugin"""
    from pathlib import Path
    
    from aiida import engine, orm
    from aiida.common.exceptions import NotExistent
    
    INPUT_DIR = Path(__file__).resolve().parent / 'input_files'
    
    # Create or load code
    computer = orm.load_computer('localhost')
    try:
        code = orm.load_code('diff@localhost')
    except NotExistent:
        # Setting up code via python API (or use "verdi code setup")
        code = orm.Code(label='diff', remote_computer_exec=[computer, '/usr/bin/diff'], input_plugin_name='diff-tutorial')
    
    # Set up inputs
    builder = code.get_builder()
    builder.file1 = orm.SinglefileData(file=INPUT_DIR / 'file1.txt')
    builder.file2 = orm.SinglefileData(file=INPUT_DIR / 'file2.txt')
    builder.metadata.description = 'Test job submission with the aiida_diff_tutorial plugin'
    
    # Run the calculation & parse results
    result = engine.run(builder)
    computed_diff = result['diff'].get_content()
    print(f'Computed diff between files:\n{computed_diff}')
    

    Note

    The launch.py script sets up an AiiDA Code instance that associates the /usr/bin/diff executable with the DiffCalculation class (through its entry point diff).

    This code is automatically set on the code input port of the builder and passed as an input to the calculation plugin.

  • Launch the calculation:

    $ verdi run launch.py
    

    If everything goes well, this should print the results of your calculation, something like:

    $ verdi run launch.py
    Computed diff between files:
    2c2
    < content1
    ---
    > content2
    

Tip

If you encountered a parsing error, it can be helpful to make a Dry run, which allows you to inspect the input folder generated by AiiDA before any calculation is launched.

Finally instead of running your calculation in the current shell, you can submit your calculation to the AiiDA daemon:

  • (Re)start the daemon to update its Python environment:

    $ verdi daemon restart --reset
    
  • Update your launch script to use:

    # Submit calculation to the aiida daemon
    node = engine.submit(builder)
    print("Submitted calculation {}".format(node))
    

    Note

    node is the CalcJobNode representing the state of the underlying calculation process (which may not be finished yet).

  • Launch the calculation:

    $ verdi run launch.py
    

    This should print the UUID and the PK of the submitted calculation.

You can use the verdi command line interface to monitor this processes:

$ verdi process list -a -p1

This should show the processes of both calculations you just ran. Use verdi calcjob outputcat <pk> to check the output of the calculation you submitted to the daemon.

Congratulations - you can now write plugins for external simulation codes and use them to submit calculations!

If you still have time left, consider going through the optional exercise below.

Writing importers for existing computations#

New in version 2.0.

New users to your plugin may often have completed many previous computations without the use of AiiDA, which they wish to import into AiiDA. In these cases, it is possible to write an importer for their inputs/outputs, which generates the provenance nodes for the corresponding CalcJob.

The importer must be written as a subclass of CalcJobImporter, for an example see aiida.calculations.importers.arithmetic.add.ArithmeticAddCalculationImporter.

To associate the importer with the CalcJob class, the importer must be registered with an entry point in the group aiida.calculations.importers.

[project.entry-points."aiida.calculations.importers"]
"core.arithmetic.add" = "aiida.calculations.importers.arithmetic.add:ArithmeticAddCalculationImporter"

Note

Note that the entry point name can be any valid entry point name. If the importer plugin is provided by the same package as the corresponding CalcJob plugin, it is recommended that the entry point name of the importer and CalcJob plugin are the same. This will allow the get_importer() method to automatically fetch the associated importer. If the entry point names differ, the entry point name of the desired importer implementation needs to be passed to get_importer() as an argument.

Users can then import their calculations via the get_importer() method:

from aiida.plugins import CalculationFactory

ArithmeticAddCalculation = CalculationFactory('arithmetic.add')
importer = ArithmeticAddCalculation.get_importer()
remote_data = RemoteData('/some/absolute/path', computer=load_computer('computer'))
inputs = importer.parse_remote_data(remote_data)
results, node = run.get_node(ArithmeticAddCalculation, **inputs)
assert node.is_imported

See also

AEP 004: Infrastructure to import completed calculation jobs, for the design considerations around this feature.

Exercise - Support command-line options#

As discussed before, diff knows a couple of command-line options:

$ diff --help
Usage: diff [OPTION]... FILES
Compare files line by line.
...
-i, --ignore-case               ignore case differences in file contents
-E, --ignore-tab-expansion      ignore changes due to tab expansion
-b, --ignore-space-change       ignore changes in the amount of white space
-w, --ignore-all-space          ignore all white space
-B, --ignore-blank-lines        ignore changes where lines are all blank
-I, --ignore-matching-lines=RE  ignore changes where all lines match RE
...

For simplicity let’s focus on the excerpt of options shown above and allow the user of our plugin to pass these along.

Notice that one of the options (--ignore-matching-lines) requires the user to pass a regular expression string, while the other options don’t require any value.

One way to represent a set of command line options like

diff --ignore-case --ignore-matching-lines='.*ABC.*'

would be using a python dictionary:

parameters = {
  'ignore-case': True,
  'ignore-space-change': False,
  'ignore-matching-lines': '.*ABC.*'
 }

Here is a simple code snippet for translating the dictionary to a list of command line options:

def cli_options(parameters):
     """Return command line options for parameters dictionary.

     :param dict parameters: dictionary with command line parameters
     """
     options = []
     for key, value in parameters.items():
         # Could validate: is key a known command-line option?
         if isinstance(value, bool) and value:
             options.append(f'--{key}')
         elif isinstance(value, str):
             # Could validate: is value a valid regular expression?
             options.append(f'--{key}')
             options.append(value)

     return options

Note

When passing parameters along to your simulation code, try validating them. This detects errors directly at submission of the calculation and thus prevents calculations with malformed inputs from ever entering the queue of your HPC system.

For the sake of brevity we are not performing validation here but there are numerous python libraries, such as voluptuous (used by aiida-diff, see example), marshmallow or pydantic, that help you define a schema to validate input against.

Let’s open our previous calculations.py file and start modifying the DiffCalculation class:

  1. In the define method, add a new input to the spec with label 'parameters' and type Dict (from aiida.orm import Dict)

  2. In the prepare_for_submission method run the cli_options function from above on self.inputs.parameters.get_dict() to get the list of command-line options.
    Add them to the codeinfo.cmdline_params.
Solution

For 1. add the following line to the define method:

spec.input('parameters', valid_type=Dict, help='diff command-line parameters')

For 2. copy the cli_options snippet at the end of calculations.py and set the cmdline_params to:

codeinfo.cmdline_params = cli_options(self.inputs.parameters.get_dict()) + [ self.inputs.file1.filename, self.inputs.file2.filename]

That’s it. Let’s now open the launch.py script and pass along our command line parameters:

...
builder.parameters = orm.Dict(dict={'ignore-case': True})
...

Change the capitalization of one of the characters in the first line of file1.txt. Then, restart the daemon and submit the new calculation:

$ verdi daemon restart
$ verdi run launch.py

If everything worked as intended, the capitalization difference in the first line should be ignored (and thus not show up in the output).

This marks the end of this how-to.

The CalcJob and Parser plugins are still rather basic and the aiida-diff-tutorial plugin package is missing a number of useful features, such as package metadata, documentation, tests, CI, etc. Continue with How to package plugins in order to learn how to quickly create a feature-rich new plugin package from scratch.