Usage#

Note

This chapter assumes knowledge of the basic concept and difference between calculation functions and calculation jobs is known and when one should use on or the other.

A calculation is a process (see the process section for details) that creates new data. Currently, there are two ways of implementing a calculation process:

This section will provide detailed information and best practices on how to implement these two calculation types.

Calculation functions#

The section on the concept of calculation functions already addressed their aim: automatic recording of their execution with their inputs and outputs in the provenance graph. The section on process functions subsequently detailed the rules that apply when implementing them, all of which to calculation functions, which are a sub type, just like work functions. However, there are some differences given that calculation functions are ‘calculation’-like processes and work function behave like ‘workflow’-like processes. What this entails in terms of intended usage and limitations for calculation functions is the scope of this section.

Creating data#

It has been said many times before: calculation functions, like all ‘calculation’-like processes, create data, but what does create mean exactly? In this context, the term ‘create’ is not intended to refer to the simple creation of a new data node in the graph, in an interactive shell or a script for example. But rather it indicates the creation of a new piece of data from some other data through a computation implemented by a process. This is then exactly what the calculation function does. It takes one or more data nodes as inputs and returns one or more data nodes as outputs, whose content is based on those inputs. As explained in the technical section, outputs are created simply by returning the nodes from the function. The engine will inspect the return value from the function and attach the output nodes to the calculation node that represents the calculation function. To verify that the output nodes are in fact ‘created’, the engine will check that the nodes are not stored. Therefore, it is very important that you do not store the nodes you create yourself, or the engine will raise an exception, as shown in the following example:

# -*- coding: utf-8 -*-
from aiida.engine import calcfunction
from aiida.orm import Int


@calcfunction
def add(x, y):
    result = Int(x + y).store()
    return result

result = add(Int(1), Int(2))

Because the returned node is already stored, the engine will raise the following exception:

ValueError: trying to return an already stored Data node from a @calcfunction, however, @calcfunctions cannot return data.
If you stored the node yourself, simply do not call `store()` yourself.
If you want to return an input node, use a @workfunction instead.

The reason for this strictness is that a node that was stored after being created in the function body, is indistinguishable from a node that was already stored and had simply been loaded in the function body and returned, e.g.:

# -*- coding: utf-8 -*-
from aiida.engine import calcfunction
from aiida.orm import Int


@calcfunction
def add(x, y):
    result = load_node(100)
    return result

result = add(Int(1), Int(2))

The loaded node would also have gotten a create link from the calculation function, even though it was not really created by it at all. It is exactly to prevent this ambiguity that calculation functions require all returned output nodes to be unstored.

Note that work functions have exactly the opposite required and all the outputs that it returns have to be stored, because as a ‘workflow’-like process, it cannot create new data. For more details refer to the work function section.

Calculation jobs#

To explain how a calculation job can be implemented, we will continue with the example presented in the section on the concept of the calculation job. There we described a code that adds two integers, implemented as a simple bash script, and how the CalcJob class can be used to run this code through AiiDA. Since it is a sub class of the Process class, it shares all its properties. It will be very valuable to have read the section on working with generic processes before continuing, because all the concepts explained there will apply also to calculation jobs.

Define#

To implement a calculation job, one simply sub classes the CalcJob process class and implements the define() method. You can pick any name that is a valid python class name. The most important method of the CalcJob class, is the define class method. Here you define, what inputs it takes and what outputs it will generate.

# -*- coding: utf-8 -*-
from aiida.engine import CalcJob


class ArithmeticAddCalculation(CalcJob):
    """Implementation of CalcJob to add two numbers for testing and demonstration purposes."""

    @classmethod
    def define(cls, spec):
        super().define(spec)
        spec.input('x', valid_type=orm.Int, help='The left operand.')
        spec.input('y', valid_type=orm.Int, help='The right operand.')

As the snippet above demonstrates, the class method takes two arguments:

  • cls this is the reference of the class itself and is mandatory for any class method

  • spec which is the ‘specification’

Warning

Do not forget to add the line super().define(spec) as the first line of the define method, where you replace the class name with the name of your calculation job. This will call the define method of the parent class, which is necessary for the calculation job to work properly

As the name suggests, the spec can be used to specify the properties of the calculation job. For example, it can be used to define inputs that the calculation job takes. In our example, we need to be able to pass two integers as input, so we define those in the spec by calling spec.input(). The first argument is the name of the input. This name should be used later to specify the inputs when launching the calculation job and it will also be used as the label for link to connect the data node and the calculation node in the provenance graph. Additionally, as we have done here, you can specify which types are valid for that particular input. Since we expect integers, we specify that the valid type is the database storable Int class.

Note

Since we sub class from CalcJob and call its define method, it will inherit the ports that it declares as well. If you look at the implementation, you will find that the base class CalcJob already defines an input code that takes a Code instance. This will reference the code that the user wants to run when he launches the CalcJob. For this reason, you do not again have to declare this input.

Next we should define what outputs we expect the calculation to produce:

# -*- coding: utf-8 -*-
from aiida.engine import CalcJob


class ArithmeticAddCalculation(CalcJob):
    """Implementation of CalcJob to add two numbers for testing and demonstration purposes."""

    @classmethod
    def define(cls, spec):
        super().define(spec)
        spec.input('x', valid_type=orm.Int, help='The left operand.')
        spec.input('y', valid_type=orm.Int, help='The right operand.')
        spec.output('sum', valid_type=orm.Int, help='The sum of the left and right operand.')

Just as for the inputs, one can specify what node type each output should have. By default a defined output will be ‘required’, which means that if the calculation job terminates and the output has not been attached, the process will be marked as failed. To indicate that an output is optional, one can use required=False in the spec.output call. Note that the process spec, and its input() and output() methods provide a lot more functionality. Fore more details, please refer to the section on process specifications.

Prepare#

We have now defined through the process specification, what inputs the calculation job expects and what outputs it will create. The final remaining task is to instruct the engine how the calculation job should actually be run. To understand what the engine would have to do to accomplish this, let’s consider what one typically does when manually preparing to run a computing job through a scheduler:

  • Prepare a working directory in some scratch space on the machine where the job will run

  • Create the raw input files required by the executable

  • Create a launch script containing scheduler directives, loading of environment variables and finally calling the executable with certain command line parameters.

So all we need to do now is instruct the engine how to accomplish these things for a specific calculation job. Since these instructions will be calculation dependent, we will implement this with the prepare_for_submission() method. The implementation of the ArithmeticAddCalculation that we are considering in the example looks like the following:

# -*- coding: utf-8 -*-
from aiida.common.datastructures import CalcInfo, CodeInfo
from aiida.engine import CalcJob


class ArithmeticAddCalculation(CalcJob):
    """Implementation of CalcJob to add two numbers for testing and demonstration purposes."""

    @classmethod
    def define(cls, spec):
        super().define(spec)
        spec.input('x', valid_type=orm.Int, help='The left operand.')
        spec.input('y', valid_type=orm.Int, help='The right operand.')
        spec.output('sum', valid_type=orm.Int, help='The sum of the left and right operand.')

    def prepare_for_submission(self, folder):
        """Write the input files that are required for the code to run.

        :param folder: an `~aiida.common.folders.Folder` to temporarily write files on disk
        :return: `~aiida.common.datastructures.CalcInfo` instance
        """
        input_x = self.inputs.x
        input_y = self.inputs.y

        # Write the input file based on the inputs that were passed
        with folder.open(self.options.input_filename, 'w', encoding='utf8') as handle:
            handle.write(f'{input_x.value} {input_y.value}\n')

        codeinfo = CodeInfo()
        codeinfo.code_uuid = self.inputs.code.uuid
        codeinfo.stdout_name = self.options.output_filename
        codeinfo.cmdline_params = ['-in', self.options.input_filename]

        calcinfo = CalcInfo()
        calcinfo.codes_info = [codeinfo]
        calcinfo.local_copy_list = []
        calcinfo.remote_copy_list = []
        calcinfo.retrieve_list = []

        return calcinfo

Before we go into the code line-by-line, let’s describe the big picture of what is happening here. The goal of this method is to help the engine accomplish the three steps required for preparing the submission a calculation job, as described above. The raw input files that are required can be written to a sandbox folder that is passed in as the folder argument.

Note

The folder argument points to a temporary sandbox folder on the local file system that can be used to write the input files to. After the prepare_for_submission method returns, the engine will take those contents and copy them to the working directory where the calculation will be run. On top of that, these files will also be written to the file repository of the node that represents the calculation as an additional measure of provenance. Even though the information written there should be a derivation of the contents of the nodes that were passed as input nodes, since it is a derived form we store this explicitly nonetheless. Sometimes, this behavior is undesirable, for example for efficiency or data privacy reasons, so it can be controlled with various lists such as local_copy_list and provenance_exclude_list.

All the other required information, such as the directives of which files to copy and what command line options to use are defined through the CalcInfo datastructure, which should be returned from the method as the only value. In principle, this is what one should do in the prepare_for_submission method:

  • Writing raw inputs files required for the calculation to run to the folder sandbox folder.

  • Use a CalcInfo to instruct the engine which files to copy to the working directory

  • Use a CalcInfo to tell which codes should run, using which command line parameters, such as standard input and output redirection.

Note

The prepare_for_submission does not have to write the submission script itself. The engine will know how to do this, because the codes that are to be used have been configured on a specific computer, which defines what scheduler is to be used. This gives the engine all the necessary information on how to write the launch script such as what scheduler directives to write.

Now that we know what the prepare_for_submission is expected to do, let’s see how the implementation of the ArithmeticAddCalculation accomplishes it line-by-line. The input file required for this example calculation will consist of the two integers that are passed as inputs. The self.inputs attribute returns an attribute dictionary with the parsed and validated inputs, according to the process specification defined in the define method. This means that you do not have to validate the inputs yourself. That is to say, if an input is marked as required and of a certain type, by the time we get to the prepare_for_submission it is guaranteed that the dictionary returned by self.inputs will contain that input and of the correct type.

From the two inputs x and y that will have been passed when the calculation job was launched, we should now generate the input file, that is simply a text file with these two numbers on a single line, separated by a space. We accomplish this by opening a filehandle to the input file in the sandbox folder and write the values of the two Int nodes to the file.

Note

The format of this input file just so happens to be the format that the bash script expects that we are using in this example. The exact number of input files and their content will of course depend on the code for which the calculation job is being written.

With the input file written, we now have to create an instance of CalcInfo that should be returned from the method. This data structure will instruct the engine exactly what needs to be done to execute the code, such as what files should be copied to the remote computer where the code will be executed. In this simple example, we define four simple attributes:

  • codes_info: a list of CodeInfo datastructures, that tell which codes to run consecutively during the job

  • local_copy_list: a list of tuples that instruct what files to copy to the working directory from the local machine

  • remote_copy_list: a list of tuples that instruct what files to copy to the working directory from the machine on which the job will run

  • retrieve_list: a list of tuples instructing which files should be retrieved from the working directory and stored in the local repository after the job has finished

In this example we only need to run a single code, so the codes_info list has a single CodeInfo datastructure. This datastructure needs to define which code it needs to run, which is one of the inputs passed to the CalcJob, and does so by means of its UUID. Through the stdout_name attribute, we tell the engine where the output of the executable should be redirected to. In this example this is set to the value of the output_filename option. What options are available in calculation jobs, what they do and how they can be set will be explained in the section on options. Finally, the cmdline_params attribute takes a list with command line parameters that will be placed after the executable in the launch script. Here we use it to explicitly instruct the executable to read its input from the filename stored in the option input_filename.

Note

Since we instruct the executable should read the input from self.options.input_filename, this is also the filename we used when writing that very input file in the sandbox folder.

Finally, we have to define the various “file lists” that tell what files to copy from where to where and what files to retrieve. Here we will briefly describe their intended goals. The implementation details will be described in full in the file lists section.

The local copy list is useful to instruct the engine to copy over files that you might already have stored in your database, such as instances of SinglefileData nodes, that you can define and pass as inputs of the CalcJob. You could have of course many copied their content to the folder sandbox folder, which will also have caused them to be written to the working directory. The disadvantage of that method, however, is that all the contents written to the sandbox folder will also be stored in the repository of the CalcJobNode that will represent the execution of the CalcJob in the provenance graph. This will cause duplication of the data contained within these data nodes. By not writing them explicitly to the sandbox folder, you avoid this duplication, without losing provenance, because the data node itself will of course be recorded in the provenance graph.

The remote copy list is useful to avoid unnecessary file transfers between the machine where the engine runs and where the calculation jobs are executed. For example, imagine you have already completed a calculation job on a remote cluster and now want to launch a second one, that requires some of the output files of the first run as its inputs. The remote copy list allows you to specify exactly what output files to copy to the remote working directory, without them having to be retrieved to the engine’s machine in between.

The retrieve list, finally, allows you to instruct the engine what files should be retrieved from the working directory after the job has terminated. These files will be downloaded to the local machine, stored in a FolderData data node and attached as an output to the CalcJobNode with the link label retrieved.

Note

We didn’t explicitly define the retrieved folder data node as an output in the example ArithmeticAddCalculation implementation shown above. This is because this is already defined by the CalcJob base class. Just as the code input, the retrieved output is common for all calculation job implementations.

File lists#

Local copy list#

The local copy list takes tuples of length three, each of which represents a file or directory to be copied, defined through the following items:

  • node uuid: the node whose repository contains the file, typically a SinglefileData or FolderData node

  • source relative path: the relative path of the file or directory within the node repository

  • target relative path: the relative path within the working directory to which to copy the file or directory contents

As an example, consider a CalcJob implementation that receives a SinglefileData node as input with the name pseudopotential, to copy its contents one can specify:

calc_info.local_copy_list = [(self.inputs.pseudopotential.uuid, self.inputs.pseudopotential.filename, 'pseudopotential.dat')]

The SinglefileData node only contains a single file by definition, the relative path of which is returned by the filename attribute. If instead, you need to transfer a specific file from a FolderData, you can specify the explicit key of the file, like so:

calc_info.local_copy_list = [(self.inputs.folder.uuid, 'internal/relative/path/file.txt', 'relative/target/file.txt')]

Note that the filenames in the relative source and target path need not be the same. This depends fully on how the files are stored in the node’s repository and what files need to be written to the working directory.

To copy the contents of a directory of the source node, simply define it as the source relative path. For example, imagine we have a FolderData node that is passed as the folder input, which has the following repository virtual hierarchy:

├─ sub
│  └─ file_b.txt
└─ file_a.txt

If the entire content needs to be copied over, specify the local_copy_list as follows:

calc_info.local_copy_list = [(self.inputs.folder.uuid, '.', None)]

The '.' here indicates that the entire contents need to be copied over. Alternatively, one can specify a sub directory, e.g.:

calc_info.local_copy_list = [(self.inputs.folder.uuid, 'sub', None)]

Finally, the target relative path can be used to write the contents of the source repository to a particular sub directory in the working directory. For example, the following statement:

calc_info.local_copy_list = [(self.inputs.folder.uuid, 'sub', 'relative/target')]

will result in the following file hierarchy in the working directory of the calculation:

└─ relative
   └─ target
       └─ file_b.txt

One might think what the purpose of the list is, when one could just as easily use normal the normal API to write the file to the folder sandbox folder. It is true, that in this way the file will be copied to the working directory, however, then it will also be copied into the repository of the calculation node. Since in this case it is merely a direct one-to-one copy of the file that is already part of one of the input nodes (in an unaltered form), this duplication is unnecessary and adds useless weight to the file repository. Using the local_copy_list prevents this unnecessary duplication of file content. It can also be used if the content of a particular input node is privacy sensitive and cannot be duplicated in the repository.

Provenance exclude list#

The local_copy_list allows one to instruct the engine to write files from the input files to the working directory, without them also being copied to the file repository of the calculation node. As discussed in the corresponding section, this is useful in order to avoid duplication or in case where the data of the nodes is proprietary or privacy sensitive and cannot be duplicated arbitrarily everywhere in the file repository. However, the limitation of the local_copy_list is that the it can only target single files in its entirety and cannot be used for arbitrary files that are written to the folder sandbox folder. To provide full control over what files from the folder are stored permanently in the calculation node file repository, the provenance_exclude_list is introduced. This CalcInfo attribute is a list of filepaths, relative to the base path of the folder sandbox folder, which are not stored in the file repository.

Consider the following file structure as written by an implementation of prepare_for_submission to the folder sandbox:

├─ sub
│  ├─ file_b.txt
│  └─ personal.dat
├─ file_a.txt
└─ secret.key

Clearly, we do not want the personal.dat and secret.key files to end up permanently in the file repository. This can be achieved by defining:

calc_info.provenance_exclude_list = ['sub/personal.dat', 'secret.key']

With this specification, the final contents of the repository of the calculation node will contain:

├─ sub
│  └─ file_b.txt
└─ file_a.txt

Remote copy list#

The remote copy list takes tuples of length three, each of which represents a file to be copied on the remote machine where the calculation will run, defined through the following items:

  • computer uuid: this is the UUID of the Computer on which the source file resides. For now the remote copy list can only copy files on the same machine where the job will run.

  • source absolute path: the absolute path of the source file on the remote machine

  • target relative path: the relative path within the working directory to which to copy the file

calc_info.remote_copy_list[(self.inputs.parent_folder.computer.uuid, 'output_folder', 'restart_folder')]

Note that the source path can point to a directory, in which case its contents will be recursively copied in its entirety.

Retrieve list#

The retrieve list is a list of instructions of what files and folders should be retrieved by the engine once a calculation job has terminated. Each instruction should have one of two formats:

  • a string representing a relative filepath in the remote working directory

  • a tuple of length three that allows to control the name of the retrieved file or folder in the retrieved folder

The retrieve list can contain any number of instructions and can use both formats at the same time. The first format is obviously the simplest, however, this requires one knows the exact name of the file or folder to be retrieved and in addition any subdirectories will be ignored when it is retrieved. If the exact filename is not known and glob patterns should be used, or if the original folder structure should be (partially) kept, one should use the tuple format, which has the following format:

  • source relative path: the relative path, with respect to the working directory on the remote, of the file or directory to retrieve.

  • target relative path: the relative path of the directory in the retrieved folder in to which the content of the source will be copied. The string '.' indicates the top level in the retrieved folder.

  • depth: the number of levels of nesting in the source path to maintain when copying, starting from the deepest file.

To illustrate the various possibilities, consider the following example file hierarchy in the remote working directory:

├─ path
|  ├── sub
│  │   ├─ file_c.txt
│  │   └─ file_d.txt
|  └─ file_b.txt
└─ file_a.txt

Below, you will find examples for various use cases of files and folders to be retrieved. Each example starts with the format of the retrieve_list, followed by a schematic depiction of the final file hierarchy that would be created in the retrieved folder.

Explicit file or folder#

Retrieving a single toplevel file or folder (with all its contents) where the final folder structure is not important.

retrieve_list = ['file_a.txt']

└─ file_a.txt
retrieve_list = ['path']

├── sub
│   ├─ file_c.txt
│   └─ file_d.txt
└─ file_b.txt
Explicit nested file or folder#

Retrieving a single file or folder (with all its contents) that is located in a subdirectory in the remote working directory, where the final folder structure is not important.

retrieve_list = ['path/file_b.txt']

└─ file_b.txt
retrieve_list = ['path/sub']

├─ file_c.txt
└─ file_d.txt
Explicit nested file or folder keeping (partial) hierarchy#

The following examples show how the file hierarchy of the retrieved files can be controlled. By changing the depth parameter of the tuple, one can control what part of the remote folder hierarchy is kept. In the given example, the maximum depth of the remote folder hierarchy is 3. The following example shows that by specifying 3, the exact folder structure is kept:

retrieve_list = [('path/sub/file_c.txt', '.', 3)]

└─ path
    └─ sub
       └─ file_c.txt

For depth=2, only two levels of nesting are kept (including the file itself) and so the path folder is discarded.

retrieve_list = [('path/sub/file_c.txt', '.', 2)]

└─ sub
   └─ file_c.txt

The same applies for directories. By specifying a directory for the first element, all its contents will be retrieved. With depth=1, only the first level sub is kept of the folder hierarchy.

retrieve_list = [('path/sub', '.', 1)]

└── sub
    ├─ file_c.txt
    └─ file_d.txt
Pattern matching#

If the exact file or folder name is not known beforehand, glob patterns can be used. In the following examples, all files that match *c.txt in the directory path/sub will be retrieved. Since depth=0 the files will be copied without the path/sub subdirectory.

retrieve_list = [('path/sub/*c.txt', '.', 0)]

└─ file_c.txt

To keep the subdirectory structure, one can set the depth parameter, just as in the previous examples.

retrieve_list = [('path/sub/*c.txt', '.', 2)]

└── sub
    └─ file_c.txt
Specific target directory#

The final folder hierarchy of the retrieved files in the retrieved folder is not only determined by the hierarchy of the remote working directory, but can also be controlled through the second and third elements of the instructions tuples. The final depth element controls what level of hierarchy of the source is maintained, where the second element specifies the base path in the retrieved folder into which the remote files should be retrieved. For example, to retrieve a nested file, maintaining the remote hierarchy and storing it locally in the target directory, one can do the following:

retrieve_list = [('path/sub/file_c.txt', 'target', 3)]

└─ target
    └─ path
        └─ sub
           └─ file_c.txt

The same applies for folders that are to be retrieved:

retrieve_list = [('path/sub', 'target', 1)]

└─ target
    └── sub
        ├─ file_c.txt
        └─ file_d.txt

Note that target here is not used to rename the retrieved file or folder, but indicates the path of the directory into which the source is copied. The target relative path is also compatible with glob patterns in the source relative paths:

retrieve_list = [('path/sub/*c.txt', 'target', 0)]

└─ target
    └─ file_c.txt

Retrieve temporary list#

Recall that, as explained in the ‘prepare’ section, all the files that are retrieved by the engine following the ‘retrieve list’, are stored in the retrieved folder data node. This means that any file you retrieve for a completed calculation job will be stored in your repository. If you are retrieving big files, this can cause your repository to grow significantly. Often, however, you might only need a part of the information contained in these retrieved files. To solve this common issue, there is the concept of the ‘retrieve temporary list’. The specification of the retrieve temporary list is identical to that of the normal retrieve list, but it is added to the calc_info under the retrieve_temporary_list attribute:

calcinfo = CalcInfo()
calcinfo.retrieve_temporary_list = ['relative/path/to/file.txt']

The only difference is that, unlike the files of the retrieve list which will be permanently stored in the retrieved FolderData node, the files of the retrieve temporary list will be stored in a temporary sandbox folder. This folder is then passed under the retrieved_temporary_folder keyword argument to the parse method of the parser, if one was specified for the calculation job:

def parse(self, **kwargs):
    """Parse the retrieved files of the calculation job."""

    retrieved_temporary_folder = kwargs['retrieved_temporary_folder']

The parser implementation can then parse these files and store the relevant information as output nodes.

Important

The type of kwargs['retrieved_temporary_folder'] is a simple str that represents the absolute filepath to the temporary folder. You can access its contents with the os standard library module or convert it into a pathlib.Path.

After the parser terminates, the engine will automatically clean up the sandbox folder with the temporarily retrieved files. The concept of the retrieve_temporary_list is essentially that the files will be available during parsing and will be destroyed immediately afterwards.

Stashing on the remote#

New in version 1.6.0.

The stash option namespace allows a user to specify certain files and/or folders that are created by the calculation job to be stashed somewhere on the remote where the job is run. This can be useful if these need to be stored for a longer time on a machine where the scratch space is cleaned regularly, but they need to be kept on the remote machine and not retrieved. Examples are files that are necessary to restart a calculation but are too big to be retrieved and stored permanently in the local file repository.

The files/folder that need to be stashed are specified through their relative filepaths within the working directory in the stash.source_list option. Using the COPY mode, the target path defines another location (on the same filesystem as the calculation) to copy the files to, and is set through the stash.target_base option, for example:

from aiida.common.datastructures import StashMode

inputs = {
    'code': ....,
    ...
    'metadata': {
        'options': {
            'stash': {
                'source_list': ['aiida.out', 'output.txt'],
                'target_base': '/storage/project/stash_folder',
                'stash_mode': StashMode.COPY.value,
            }
        }
    }
}

Note

In the future, other methods for stashing may be implemented, such as placing all files in a (compressed) tarball or even stash files on tape.

Important

If the stash option namespace is defined for a calculation job, the daemon will perform the stashing operations before the files are retrieved. This means that the stashing happens before the parsing of the output files (which occurs after the retrieving step), such that that the files will be stashed independent of the final exit status that the parser will assign to the calculation job. This may cause files to be stashed for calculations that will later be considered to have failed.

The stashed files and folders are represented by an output node that is attached to the calculation node through the label remote_stash, as a RemoteStashFolderData node. Just like the remote_folder node, this represents a location or files on a remote machine and so is equivalent to a “symbolic link”.

Important

AiiDA does not actually control the files in the remote stash, and so the contents may disappear at some point.

Options#

In addition to the common metadata inputs, such as label and description, that all processes have, the CalcJob has an additonal input called options. These options allow to subtly change the behavior of the calculation job, for example which parser should be used once it is finished and special scheduler directives. The full list of available options are documented below as part of the CalcJob interface:

calcjobaiida.engine.processes.calcjobs.CalcJob

Implementation of the CalcJob process.

Inputs:

  • code, AbstractCode, optional – The Code to use for this job. This input is required, unless the remote_folder input is specified, which means an existing job is being imported and no code will actually be run.
  • metadata, Namespace
    Namespace Ports
    • call_link_label, str, optional, non_db – The label to use for the CALL link if the process is called by another process.
    • computer, Computer, optional, non_db – When using a “local” code, set the computer on which the calculation should be run.
    • description, str, optional, non_db – Description to set on the process node.
    • dry_run, bool, optional, non_db – When set to True will prepare the calculation job for submission but not actually launch it.
    • label, str, optional, non_db – Label to set on the process node.
    • options, Namespace
      Namespace Ports
      • account, str, optional, non_db – Set the account to use in for the queue on the remote computer
      • additional_retrieve_list, (list, tuple), optional, non_db – List of relative file paths that should be retrieved in addition to what the plugin specifies.
      • append_text, str, optional, non_db – Set the calculation-specific append text, which is going to be appended in the scheduler-job script, just after the code execution
      • custom_scheduler_commands, str, optional, non_db – Set a (possibly multiline) string with the commands that the user wants to manually set for the scheduler. The difference of this option with respect to the prepend_text is the position in the scheduler submission file where such text is inserted: with this option, the string is inserted before any non-scheduler command
      • environment_variables, dict, optional, non_db – Set a dictionary of custom environment variables for this calculation
      • environment_variables_double_quotes, bool, optional, non_db – If set to True, use double quotes instead of single quotes to escape the environment variables specified in environment_variables.
      • import_sys_environment, bool, optional, non_db – If set to true, the submission script will load the system environment variables
      • input_filename, str, optional, non_db – Filename to which the input for the code that is to be run is written.
      • max_memory_kb, int, optional, non_db – Set the maximum memory (in KiloBytes) to be asked to the scheduler
      • max_wallclock_seconds, int, optional, non_db – Set the wallclock in seconds asked to the scheduler
      • mpirun_extra_params, (list, tuple), optional, non_db – Set the extra params to pass to the mpirun (or equivalent) command after the one provided in computer.mpirun_command. Example: mpirun -np 8 extra_params[0] extra_params[1] … exec.x
      • output_filename, str, optional, non_db – Filename to which the content of stdout of the code that is to be run is written.
      • parser_name, str, optional, non_db – Set a string for the output parser. Can be None if no output plugin is available or needed
      • prepend_text, str, optional, non_db – Set the calculation-specific prepend text, which is going to be prepended in the scheduler-job script, just before the code execution
      • priority, str, optional, non_db – Set the priority of the job to be queued
      • qos, str, optional, non_db – Set the quality of service to use in for the queue on the remote computer
      • queue_name, str, optional, non_db – Set the name of the queue on the remote computer
      • rerunnable, bool, optional, non_db – Determines if the calculation can be requeued / rerun.
      • resources, dict, required, non_db – Set the dictionary of resources to be used by the scheduler plugin, like the number of nodes, cpus etc. This dictionary is scheduler-plugin dependent. Look at the documentation of the scheduler for more details.
      • scheduler_stderr, str, optional, non_db – Filename to which the content of stderr of the scheduler is written.
      • scheduler_stdout, str, optional, non_db – Filename to which the content of stdout of the scheduler is written.
      • stash, Namespace – Optional directives to stash files after the calculation job has completed.
        Namespace Ports
        • source_list, (tuple, list), optional, non_db – Sequence of relative filepaths representing files in the remote directory that should be stashed.
        • stash_mode, str, optional, non_db – Mode with which to perform the stashing, should be value of `aiida.common.datastructures.StashMode.
        • target_base, str, optional, non_db – The base location to where the files should be stashd. For example, for the copy stash mode, this should be an absolute filepath on the remote computer.
      • submit_script_filename, str, optional, non_db – Filename to which the job submission script is written.
      • withmpi, bool, optional, non_db – Set the calculation to use mpi
    • store_provenance, bool, optional, non_db – If set to False provenance will not be stored in the database.
  • remote_folder, RemoteData, optional – Remote directory containing the results of an already completed calculation job without AiiDA. The inputs should be passed to the CalcJob as normal but instead of launching the actual job, the engine will recreate the input files and then proceed straight to the retrieve step where the files of this RemoteData will be retrieved as if it had been actually launched through AiiDA. If a parser is defined in the inputs, the results are parsed and attached as output nodes as usual.

Outputs:

  • remote_folder, RemoteData, required – Input files necessary to run the process will be stored in this folder node.
  • remote_stash, RemoteStashData, optional – Contents of the stash.source_list option are stored in this remote folder after job completion.
  • retrieved, FolderData, required – Files that are retrieved by the daemon will be stored in this node. By default the stdout and stderr of the scheduler will be added, but one can add more by specifying them in CalcInfo.retrieve_list.

The rerunnable option enables the scheduler to re-launch the calculation if it has failed, for example due to node failure or a failure to launch the job. It corresponds to the --requeue option in SLURM, and the -r option in SGE, LSF, and PBS. The following two conditions must be met in order for this to work well with AiiDA:

  • the scheduler assigns the same job-id to the restarted job

  • the code produces the same results if it has already partially run before (not every scheduler may produce this situation)

Because this depends on the scheduler, its configuration, and the code used, we cannot say conclusively when it will work – do your own testing! It has been tested on a cluster using SLURM, but that does not guarantee other SLURM clusters behave in the same way.

Launch#

Launching a calculation job is no different from launching any other process class, so please refer to the section on launching processes. The only caveat that we should place is that calculation jobs typically tend to take quite a bit of time. The trivial example we used above of course will run very fast, but a typical calculation job that will be submitted to a scheduler will most likely take longer than just a few seconds. For that reason it is highly advisable to submit calculation jobs instead of running them. By submitting them to the daemon, you free up your interpreter straight away and the process will be checkpointed between the various transport tasks that will have to be performed. The exception is of course when you want to run a calculation job locally for testing or demonstration purposes.

Dry run#

The calculation job has one additional feature over all other processes when it comes to launching them. Since an incorrectly configured calculation job can potentially waste computational resources, one might want to inspect the input files that will be written by the plugin, before actually submitting the job. A so-called dry-run is possible by simply specifying it in the metadata of the inputs. If you are using the process builder, it is as simple as:

builder.metadata.dry_run = True

When you now launch the process builder, the engine will perform the entire process of a normal calculation job run, except that it will not actually upload and submit the job to the remote computer. However, the prepare_for_submission method will be called. The inputs that it writes to the input folder will be stored in temporary folder called submit_test that will be created in the current working directory. Each time you perform a dry-run, a new sub folder will be created in the submit_test folder, which you allows you to perform multiple dry-runs without overwriting the previous results.

Moreover, the following applies:

  • when calling run() for a calculation with the dry_run flag set, you will get back its results, being always an empty dictionary {};

  • if you call run_get_node(), you will get back as a node an unstored CalcJobNode. In this case, the unstored CalcJobNode (let’s call it node) will have an additional property node.dry_run_info. This is a dictionary that contains additional information on the dry-run output. In particular, it will have the following keys:

    • folder: the absolute path to the folder within the submit_test folder where the files have been created, e.g.: /home/user/submit_test/20190726-00019

    • script_filename: the filename of the submission script that AiiDA generated in the folder, e.g.: _aiidasubmit.sh

  • if you send a dry-run to the submit() function, this will be just forwarded to run and you will get back the unstored node (with the same properties as above).

Warning

By default the storing of provenance is enabled and this goes also for a dry run. If you do not want any nodes to be created during a dry run, simply set the metadata input store_provenance to False.

Parsing#

The previous sections explained in detail how the execution of an external executable is wrapped by the CalcJob class to make it runnable by AiiDA’s engine. From the first steps of preparing the input files on the remote machine, to retrieving the relevant files and storing them in a FolderData node, that is attached as the retrieved output. This is the last required step for a CalcJob to terminate, but often we would like to parse the raw output and attach them as queryable output nodes to the calculation job node. To automatically trigger the parsing of a calculation job after its output has been retrieved, is to specify the parser name option. If the engine find this option specified, it will load the corresponding parser class, which should be a sub class of Parser and calls its parse() method.

To explain the interface of the Parser class and the parse method, let’s take the ArithmeticAddParser as an example. This parser is designed to parse the output produced by the simple bash script that is wrapped by the ArithmeticAddCalculation discussed in the previous sections.

 1# -*- coding: utf-8 -*-
 2from aiida.orm import Int
 3from aiida.parsers.parser import Parser
 4
 5
 6class ArithmeticAddParser(Parser):
 7
 8    def parse(self, **kwargs):
 9        """Parse the contents of the output files retrieved in the `FolderData`."""
10        output_folder = self.retrieved
11
12        try:
13            with output_folder.open(self.node.get_option('output_filename'), 'r') as handle:
14                result = self.parse_stdout(handle)
15        except (OSError, IOError):
16            return self.exit_codes.ERROR_READING_OUTPUT_FILE
17
18        if result is None:
19            return self.exit_codes.ERROR_INVALID_OUTPUT
20
21        self.out('sum', Int(result))
22
23    @staticmethod
24    def parse_stdout(filelike):
25        """Parse the sum from the output of the ArithmeticAddcalculation written to standard out
26
27        :param filelike: filelike object containing the output
28        :returns: the sum
29        """
30        try:
31            result = int(filelike.read())
32        except ValueError:
33            result = None
34
35        return result

To create a new parser implementation, simply create a new class that sub classes the Parser class. As usual, any valid python class name will work, but the convention is to always use the Parser suffix and to use the same name as the calculation job for which the parser is designed. For example, here we are implementing a parser for the ArithmeticAddCalculation, so therefore we name it ArithmeticAddParser, just replacing the Calculation suffix for Parser. The only method that needs to be implemented is the parse() method. Its signature should include **kwargs, the reason for which will become clear later. The goal of the parse method is very simple:

  • Open and load the content of the output files generated by the calculation job and have been retrieved by the engine

  • Create data nodes out of this raw data that are attached as output nodes

  • Log human-readable warning messages in the case of worrying output

  • Optionally return an exit code to indicate that the results of the calculation was not successful

The advantage of adding the raw output data in different form as output nodes, is that in that form the content becomes queryable. This allows one to query for calculations that produced specific outputs with a certain value, which becomes a very powerful approach for post-processing and analyses of big databases.

The retrieved attribute of the parser will return the FolderData node that should have been attached by the engine containing all the retrieved files, as specified using the retrieve list in the preparation step of the calculation job. This retrieved folder can be used to open and read the contents of the files it contains. In this example, there should be a single output file that was written by redirecting the standard output of the bash script that added the two integers. The parser opens this file, reads its content and tries to parse the sum from it:

12        try:
13            with output_folder.open(self.node.get_option('output_filename'), 'r') as handle:
14                result = self.parse_stdout(handle)
15        except (OSError, IOError):
16            return self.exit_codes.ERROR_READING_OUTPUT_FILE

Note that this parsing action is wrapped in a try-except block to catch the exceptions that would be thrown if the output file could not be read. If the exception would not be caught, the engine will catch the exception instead and set the process state of the corresponding calculation to Excepted. Note that this will happen for any uncaught exception that is thrown during parsing. Instead, we catch these exceptions and return an exit code that is retrieved by referencing it by its label, such as ERROR_READING_OUTPUT_FILE in this example, through the self.exit_codes property. This call will retrieve the corresponding exit code defined on the CalcJob that we are currently parsing. Returning this exit code from the parser will stop the parsing immediately and will instruct the engine to set its exit status and exit message on the node of this calculation job.

The parse_stdout method is just a small utility function to separate the actual parsing of the data from the main parser code. In this case, the parsing is so simple that we might have as well kept it in the main method, but this is just to illustrate that you are completely free to organize the code within the parse method for clarity. If we manage to parse the sum, produced by the calculation, we wrap it in the appropriate Int data node class, and register it as an output through the out method:

21        self.out('sum', Int(result))

Note that if we encountered no problems, we do not have to return anything. The engine will interpret this as the calculation having finished successfully. You might now pose the question: “what part of the raw data should I parse and in what types of data nodes should I store it?”. This not an easy question to answer in the general, because it will heavily depend on the type of raw output that is produced by the calculation and what parts you would like to be queryable. However, we can give you some guidelines:

  • Store data that you might want to query for, in the lightweight data nodes, such as Dict, List and StructureData. The contents of these nodes are stored as attributes in the database, which makes sure that they can be queried for.

  • Bigger data sets, such as large (multi-dimnensional) arrays, are better stored in an ArrayData or one of its sub classes. If you were to store all this data in the database, it would become unnecessarily bloated, because the chances you would have to query for this data are unlikely. Instead these array type data nodes store the bulk of their content in the repository. This way you still keep the data and therewith the provenance of your calculations, while keeping your database lean and fast!

Scheduler errors#

Besides the output parsers, the scheduler plugins can also provide parsing of the output generated by the job scheduler, by implementing the parse_output() method. If the scheduler plugin has implemented this method, the output generated by the scheduler, written to the stdout and stderr file descriptors as well as the output of the detailed job info command, is parsed. If the parser detects a known problem, such as an out-of-memory (OOM) or out-of-walltime (OOW) error, the corresponding exit code will already be set on the calculation job node. The output parser, if defined in the inputs, can inspect the exit status on the node and decide to keep it or override it with a different, potentially more useful, exit code.

class SomeParser(Parser):

    def parse(self, **kwargs):
        """Parse the contents of the output files retrieved in the `FolderData`."""

        # It is probably best to check for explicit exit codes.
        if self.node.exit_status == self.exit_codes.ERROR_SCHEDULER_OUT_OF_WALLTIME.status:
            # The scheduler parser detected an OOW error.
            # By returning `None`, the same exit code will be kept.
            return None

        # It is also possible to just check for any exit status to be set as a fallback.
        if self.node.exit_status is not None:
            # You can still try to parse files before exiting the parsing.
            return None

Note that in the example given above, the parser returns immediately if it detects that the scheduler detected a problem. Since it returns None, the exit code of the scheduler will be kept and will be the final exit code of the calculation job. However, the parser does not have to immediately return. It can still try to parse some of the retrieved output, if there is any. If it finds a more specific problem than the generic scheduler error, it can always return an exit code of itself to override it. The parser can even return ExitCode(0) to have the calculation marked as successfully finished, despite the scheduler having determined that there was a problem. The following table summarizes the possible scenarios of the scheduler parser and output parser returning an exit code and what the final resulting exit code will be that is set on the node:

Scenario

Scheduler result

Retrieved result

Final result

Neither parser found any problem.

None

None

ExitCode(0)

Scheduler parser found an issue, but output parser does not override.

ExitCode(100)

None

ExitCode(100)

Only output parser found a problem.

None

ExitCode(400)

ExitCode(400)

Scheduler parser found an issue, but the output parser overrides with a more specific error code.

ExitCode(100)

ExitCode(400)

ExitCode(400)

Scheduler found issue but output parser overrides saying that despite that the calculation should be considered finished successfully.

ExitCode(100)

ExitCode(0)

ExitCode(0)