How to write a plugin for an external code

Tip

Before starting to write a new plugin, check the aiida plugin registry. If a plugin for your code is already available, you can skip straight to How to run external codes.

Tip

This how to walks you through all logical steps of how AiiDA interacts with an external code. If you already know the basics and would like to get started with a new plugin package quickly, check out How to package plugins.

To run an external code with AiiDA, you need a corresponding calculation plugin, which tells AiiDA how to:

  1. Prepare the required input files.

  2. Run the code with the correct command line parameters.

Finally, you will probably want a parser plugin, which tells AiiDA how to:

  1. Parse the output of the code.

This how-to takes you through the process of creating a calculation plugin for a simple executable that sums two numbers, using it to run the code, and writing a parser for its outputs.

In the following, as an example, our Code will be the bash executable, and our “input file” will be a bash script aiida.in that sums two numbers and prints the result:

echo $(( numx + numy ))

We will run this as:

/bin/bash < aiida.in > aiida.out

thus writing the sum of the two numbers numx and numy (provided by the user) to the output file aiida.out.

Interfacing external codes

Start by creating a file calculations.py and subclass the CalcJob class:

from aiida import orm
from aiida.common.datastructures import CalcInfo, CodeInfo
from aiida.common.folders import Folder
from aiida.engine import CalcJob, CalcJobProcessSpec


class ArithmeticAddCalculation(CalcJob):
    """`CalcJob` implementation to add two numbers using bash for testing and demonstration purposes."""

In the following, we will tell AiiDA how to run our code by implementing two key methods:

Defining the spec

The define method tells AiiDA which inputs the CalcJob expects and which outputs it produces (exit codes will be discussed later). This is done through an instance of the CalcJobProcessSpec class, which is passed as the spec argument to the define method. For example:

    @classmethod
    def define(cls, spec: CalcJobProcessSpec):
        """Define the process specification, including its inputs, outputs and known exit codes.

        :param spec: the calculation job process spec to define.
        """
        super().define(spec)
        spec.input('x', valid_type=(orm.Int, orm.Float), help='The left operand.')
        spec.input('y', valid_type=(orm.Int, orm.Float), help='The right operand.')
        spec.output('sum', valid_type=(orm.Int, orm.Float), help='The sum of the left and right operand.')
        # set default options (optional)
        spec.inputs['metadata']['options']['parser_name'].default = 'arithmetic.add'
        spec.inputs['metadata']['options']['input_filename'].default = 'aiida.in'
        spec.inputs['metadata']['options']['output_filename'].default = 'aiida.out'
        spec.inputs['metadata']['options']['resources'].default = {'num_machines': 1, 'num_mpiprocs_per_machine': 1}
        # start exit codes - marker for docs
        spec.exit_code(310, 'ERROR_READING_OUTPUT_FILE', message='The output file could not be read.')
        spec.exit_code(320, 'ERROR_INVALID_OUTPUT', message='The output file contains invalid output.')
        spec.exit_code(410, 'ERROR_NEGATIVE_NUMBER', message='The sum of the operands is a negative number.')

The first line of the method calls the define method of the CalcJob parent class. This necessary step defines the inputs and outputs that are common to all CalcJob’s.

Next, we use the input() method in order to define our two input numbers x and y (we support integers and floating point numbers), and we use output() to define the only output of the calculation with the label sum. AiiDA will attach the outputs defined here to a (successfully) finished calculation using the link label provided.

Note

This holds for required outputs (the default behaviour). Use required=False in order to mark an output as optional.

Tip

For the input parameters and input files of more complex simulation codes, consider using Dict (python dictionary) and SinglefileData (file wrapper) input nodes.

Finally, we set a couple of default options, such as the name of the parser (which we will implement later), the name of input and output files, and the computational resources to use for such a calculation. These options have already been defined on the spec by the super().define(spec) call, and they can be accessed through the inputs attribute, which behaves like a dictionary.

Note

One more important input required by any CalcJob is which external executable to use. External executables are represented by Code instances that contain information about the computer they reside on, their path in the file system and more.

They are passed to a CalcJob via the code input, which is defined in the CalcJob base class, so you don’t have to:

spec.input('code', valid_type=orm.Code, help='The `Code` to use for this job.')

There is no return statement in define: the define method directly modifies the spec object it receives. For more details on setting up your inputs and outputs (covering validation, dynamic number of inputs, etc.) see the Defining Processes topic.

Preparing for submission

The prepare_for_submission() method has two jobs: Creating the input files in the format the external code expects and returning a CalcInfo object that contains instructions for the AiiDA engine on how the code should be run. For example:

    def prepare_for_submission(self, folder: Folder) -> CalcInfo:
        """Prepare the calculation for submission.

        Convert the input nodes into the corresponding input files in the format that the code will expect. In addition,
        define and return a `CalcInfo` instance, which is a simple data structure that contains information for the
        engine, for example, on what files to copy to the remote machine, what files to retrieve once it has completed,
        specific scheduler settings and more.

        :param folder: a temporary folder on the local file system.
        :returns: the `CalcInfo` instance
        """
        with folder.open(self.options.input_filename, 'w', encoding='utf8') as handle:
            handle.write(f'echo $(({self.inputs.x.value} + {self.inputs.y.value}))\n')

        codeinfo = CodeInfo()
        codeinfo.code_uuid = self.inputs.code.uuid
        codeinfo.stdin_name = self.options.input_filename
        codeinfo.stdout_name = self.options.output_filename

        calcinfo = CalcInfo()
        calcinfo.codes_info = [codeinfo]
        calcinfo.retrieve_list = [self.options.output_filename]

        return calcinfo

Note

Unlike the define method, the prepare_for_submission method is implemented from scratch and so there is no super call.

The first step is writing the simple bash script mentioned in the beginning: summing the numbers x and y, using Python’s string interpolation to replace the x and y placeholders with the actual values self.inputs.x and self.inputs.y that were provided as inputs by the caller.

All inputs provided to the calculation are validated against the spec before prepare_for_submission is called. Therefore, when accessing the inputs attribute, you can safely assume that all required inputs have been set and that all inputs have a valid type.

The folder argument (a Folder instance) allows us to write the input file to a sandbox folder, whose contents will be transferred to the compute resource where the actual calculation takes place. In this example, we only create a single input file, but you can create as many as you need, including subfolders if required.

Note

By default, the contents of the sandbox folder are also stored permanently in the file repository of the calculation node for additional provenance guarantees. There are cases (e.g. license issues, file size) where you may want to change this behavior and exclude files from being stored.

After having written the necessary input files, we let AiiDA know how to run the code via the CodeInfo object.

First, we forward the uuid of the Code instance passed by the user via the generic code input mentioned previously (in this example, the code will represent a bash executable).

Second, let’s recall how we want our executable to be run:

#!/bin/bash

'[executable path in code node]' < '[input_filename]' > '[output_filename]'

We want to pass our input file to the executable via standard input, and record standard output of the executable in the output file – this is done using the stdin_name and stdout_name attributes of the CodeInfo.

Tip

Many executables don’t read from standard input but instead require the path to an input file to be passed via command line parameters (potentially including further configuration options). In that case, use the CodeInfo cmdline_params attribute:

codeinfo.cmdline_params = ['--input', self.inputs.input_filename]

Tip

self.options.input_filename is just a shorthand for self.inputs.metadata['options']['input_filename'].

Finally, we pass the CodeInfo to a CalcInfo object (one calculation job can involve more than one executable, so codes_info is a list). We define the retrieve_list of filenames that the engine should retrieve from the directory where the job ran after it has finished. The engine will store these files in a FolderData node that will be attached as an output node to the calculation with the label retrieved. There are other file lists available that allow you to easily customize how to move files to and from the remote working directory in order to prevent the creation of unnecessary copies.

This was an example of how to implement the CalcJob class to interface AiiDA with an external code. For more details on the CalcJob class, refer to the Topics section on defining calculations.

Parsing the outputs

Parsing the output files produced by a code into AiiDA nodes is optional, but it can make your data queryable and therefore easier to access and analyze.

To create a parser plugin, subclass the Parser class (for example in a file called parsers.py) and implement its parse() method. The following is an example of a simple implementation:

class SimpleArithmeticAddParser(Parser):
    """Simple parser for an `ArithmeticAddCalculation` job (for demonstration purposes only)."""

    def parse(self, **kwargs):
        """Parse the contents of the output files stored in the `retrieved` output node."""
        from aiida.orm import Int

        output_folder = self.retrieved

        with output_folder.open(self.node.get_option('output_filename'), 'r') as handle:
            result = int(handle.read())

        self.out('sum', Int(result))

Before the parse() method is called, two important attributes are set on the Parser instance:

  1. self.retrieved: An instance of FolderData, which points to the folder containing all output files that the CalcJob instructed to retrieve, and provides the means to open() any file it contains.

  2. self.node: The CalcJobNode representing the finished calculation, which, among other things, provides access to all of its inputs (self.node.inputs).

The get_option() convenience method is used to get the filename of the output file. Its content is cast to an integer, since the output file should contain the sum produced by the aiida.in bash script.

Finally, the out() method is used to link the parsed sum as an output of the calculation. The first argument is the name of the output, which will be used as the label for the link that connects the calculation and data node, and the second is the node that should be recorded as an output. Note that the type of the output should match the type that is specified by the process specification of the corresponding CalcJob. If any of the registered outputs do not match the specification, the calculation will be marked as failed.

In order to request automatic parsing of a CalcJob (once it has finished), users can set the metadata.options.parser_name input when launching the job. If a particular parser should be used by default, the CalcJob define method can set a default value for the parser name as was done in the previous section:

@classmethod
def define(cls, spec):
    ...
    spec.inputs['metadata']['options']['parser_name'].default = 'arithmetic.add'

Note, that the default is not set to the Parser class itself, but the entry point string under which the parser class is registered. How to register a parser class through an entry point is explained in the how-to section on registering plugins.

Handling parsing errors

So far, we have not spent much attention on dealing with potential errors that can arise when running external codes. However, there are lots of ways in which codes can fail to execute nominally. A Parser can play an important role in detecting and communicating such errors, where workflows can then decide how to proceed, e.g., by modifying input parameters and resubmitting the calculation.

Parsers communicate errors through exit codes, which are defined in the spec of the CalcJob they parse. The ArithmeticAddCalculation example, defines the following exit codes:

spec.exit_code(310, 'ERROR_READING_OUTPUT_FILE', message='The output file could not be read.')
spec.exit_code(320, 'ERROR_INVALID_OUTPUT', message='The output file contains invalid output.')
spec.exit_code(410, 'ERROR_NEGATIVE_NUMBER', message='The sum of the operands is a negative number.')

Each exit_code defines:

  • an exit status (a positive integer),

  • a label that can be used to reference the code in the parse method (through the self.exit_codes property, as shown below), and

  • a message that provides a more detailed description of the problem.

In order to inform AiiDA about a failed calculation, simply return from the parse method the exit code that corresponds to the detected issue. Here is a more complete version of the example Parser presented in the previous section:

class ArithmeticAddParser(Parser):
    """Parser for an `ArithmeticAddCalculation` job."""

    def parse(self, **kwargs):
        """Parse the contents of the output files stored in the `retrieved` output node."""
        from aiida.orm import Int

        try:
            with self.retrieved.open(self.node.get_option('output_filename'), 'r') as handle:
                result = int(handle.read())
        except OSError:
            return self.exit_codes.ERROR_READING_OUTPUT_FILE
        except ValueError:
            return self.exit_codes.ERROR_INVALID_OUTPUT

        self.out('sum', Int(result))

        if result < 0:
            return self.exit_codes.ERROR_NEGATIVE_NUMBER

It checks:

  1. Whether a retrieved folder is present.

  2. Whether the output file can be read (whether open() or read() will throw an OSError).

  3. Whether the output file contains an integer.

  4. Whether the sum is negative.

AiiDA stores the exit code returned by the parse method on the calculation node that is being parsed, from where it can then be inspected further down the line. The Topics section on defining processes provides more details on exit codes. Note that scheduler plugins can also implement parsing of the output generated by the job scheduler and in the case of problems can set an exit code. The Topics section on scheduler exit codes explains how they can be inspected inside an output parser and how they can optionally be overridden.

Registering entry points

Entry points are the preferred method of registering new calculation, parser and other plugins with AiiDA.

With your calculations.py and parsers.py files at hand, let’s register entry points for the plugins they contain:

  • Move your two scripts into a subfolder aiida_add:

    mkdir aiida_add
    mv calculations.py parsers.py aiida_add/
    

    You have just created an aiida_add Python package!

  • Write a minimalistic setup.py script for your new package:

    from setuptools import setup
    
    setup(
        name='aiida-add',
        packages=['aiida_add'],
        entry_points={
            'aiida.calculations': ["add = aiida_add.calculations:ArithmeticAddCalculation"],
            'aiida.parsers': ["add = aiida_add.parsers:ArithmeticAddParser"],
        }
    )
    

    Note

    Strictly speaking, aiida-add is the name of the distribution, while aiida_add is the name of the package. The aiida-core documentation uses the term package a bit more loosely.

  • Install your new aiida-add plugin package. See the How to install plugins section for details.

After this, you should see your plugins listed:

$ verdi plugin list aiida.calculations
$ verdi plugin list aiida.calculations add
$ verdi plugin list aiida.parsers

Running a calculation

With the entry points set up, you are ready to launch your first calculation with the new plugin:

  • If you haven’t already done so, set up your computer. In the following we assume it to be the localhost:

    $ verdi computer setup -L localhost -H localhost -T local -S direct -w `echo $PWD/work` -n
    $ verdi computer configure local localhost --safe-interval 5 -n
    
  • Write a launch.py script:

    from aiida import orm, engine
    from aiida.common.exceptions import NotExistent
    
    # Setting up inputs
    computer = orm.load_computer('localhost')
    try:
        code = load_code('add@localhost')
    except NotExistent:
        # Setting up code via python API (or use "verdi code setup")
        code = orm.Code(label='add', remote_computer_exec=[computer, '/bin/bash'], input_plugin_name='add')
    
    builder = code.get_builder()
    builder.x = Int(4)
    builder.y = Int(5)
    builder.metadata.options.withmpi = False
    builder.metadata.options.resources = {
        'num_machines': 1,
        'num_mpiprocs_per_machine': 1,
    }
    
    # Running the calculation & parsing results
    output_dict, node = engine.run_get_node(builder)
    print("Parsing completed. Result: {}".format(output_dict['sum'].value))
    

    Note

    output_dict is a dictionary containing all the output nodes keyed after their label. In this case: “remote_folder”, “retrieved” and “sum”.

  • Launch the calculation:

    $ verdi run launch.py
    

    If everything goes well, this should print the results of your calculation, something like:

    $ verdi run launch.py
    Parsing completed. Result: 9
    

Tip

If you encountered a parsing error, it can be helpful to make a Dry run, which allows you to inspect the input folder generated by AiiDA before any calculation is launched.

Finally instead of running your calculation in the current shell, you can submit your calculation to the AiiDA daemon:

  • (Re)start the daemon to update its Python environment:

    $ verdi daemon restart --reset
    
  • Update your launch script to use:

    # Submitting the calculation
    node = engine.submit(builder)
    print("Submitted calculation {}".format(node))
    

    Note

    node is the CalcJobNode representing the state of the underlying calculation process (which may not be finished yet).

  • Launch the calculation:

    $ verdi run launch.py
    

    This should print the UUID and the PK of the submitted calculation.

You can use the verdi command line interface to monitor this processes:

$ verdi process list

This marks the end of this how-to.

The CalcJob and Parser plugins are still rather basic and the aiida-add plugin package is missing a number of useful features, such as package metadata, documentation, tests, CI, etc. Continue with How to package plugins in order to learn how to quickly create a feature-rich new plugin package from scratch.