AiiDA internals¶
Node¶
The Node
class is the basic class that represents all the possible objects at the AiiDA world. More precisely it is inherited by many classes including (among others) the ProcessNode
class, representing computations that convert data into a different form, the Code
class representing executables and file collections that are used by calculations and the Data
class which represents data that can be input or output of calculations.
Immutability concept¶
A node can store information through attributes. Since AiiDA guarantees a certain level of provenance, these attributes become immutable as soon as the node is stored.
This means that as soon as a node is stored any attempt to alter its attributes, changing its value or deleting it altogether, shall be met with a raised exception.
Certain subclasses of nodes need to adapt this behavior however, as for example in the case of the ProcessNode
class (see calculation updatable attributes), but since the immutability
of stored nodes is a core concept of AiiDA, this behavior is nonetheless enforced on the node level. This guarantees that any subclasses of the Node class will respect this behavior unless it is explicitly overriden.
Node methods¶
clean_value()
takes a value and returns an object which can be serialized for storage in the database. Such an object must be able to be subsequently deserialized without changing value. If a simple datatype is passed (integer, float, etc.), a check is performed to see if it has a value ofnan
orinf
, as these cannot be stored. Otherwise, if a list, tuple, dictionary, etc., is passed, this check is performed for each value it contains. This is done recursively, automatically handling the case of nested objects. It is important to note that iterable type objects are converted to lists during this process, and mappings, such as dictionaries, are converted to normal dictionaries. This cleaning process is used by default when setting node attributes viaset_attribute()
andappend_to_attr()
, although it can be disabled by settingclean=False
. Values are also cleaned when setting extras on a stored node usingset_extras()
orreset_extras()
, but this cannot be disabled.
Node methods & properties¶
In the following sections, the most important methods and properties of the Node
class will be described.
Node subclasses organization¶
The Node
class has two important variables:
~aiida.orm.nodes.Node._plugin_type_string
characterizes the class of the object.~aiida.orm.nodes.Node._query_type_string
characterizes the class and all its subclasses (by pointing to the package or Python file that contain the class).
The convention for all the Node
subclasses is that if a class B
is inherited by a class A
then there should be a package A
under aiida/orm
that has a file __init__.py
and a B.py
in that directory (or a B
package with the corresponding __init__.py
)
An example of this is the ArrayData
and the KpointsData
. ArrayData
is placed in aiida/orm/data/array/__init__.py
and KpointsData
which inherits from ArrayData
is placed in aiida/orm/data/array/kpoints.py
This is an implicit & quick way to check the inheritance of the Node
subclasses.
General purpose methods¶
__init__()
: The initialization of the Node class can be done by not providing any attributes or by providing a DbNode as initialization. E.g.:dbn = a_dbnode_object n = Node(dbnode=dbn.dbnode)
ctime()
andmtime()
provide the creation and the modification time of the node.computer()
returns the computer associated to this node._validate()
does a validation check for the node. This is important forNode
subclasses where various attributes should be checked for consistency before storing.user()
returns the user that created the node.uuid()
returns the universally unique identifier (UUID) of the node.
Annotation methods¶
The Node
can be annotated with labels, description and comments. The following methods can be used for the management of these properties.
Label management:
label()
returns the label of the node and can be used as a setter property.
Description management:
description()
: the description of the node (more detailed than the label) and can be used as a setter property.
Comment management:
add_comment()
adds a comment.get_comments()
returns a sorted list of the comments.update_comment()
updates the node comment. It can be done byverdi comment update
.remove_comment()
removes the node comment. It can be done byverdi comment remove
.
Link management methods¶
Node
objects and objects of its subclasses can have ancestors and descendants. These are connected with links. The following methods exist for the processing & management of these links.
has_cached_links()
shows if there are cached links to other nodes.add_incoming()
adds a link to the current node from the ‘src’ node with the given label. Depending on whether the nodes are stored or node, the linked are written to the database or to the cache.
Listing links example
Assume that the user wants to see the available links of a node in order to understand the structure of the graph and maybe traverse it. In the following example, we load a specific node and we list its input and output links. The returned dictionaries have as keys the link name and as value the linked node
. Here is the code:
In [1]: # Let's load a node with a specific pk
In [2]: c = load_node(139168)
In [3]: c.get_incoming()
Out[3]:
[Neighbor(link_type='inputlink', label='code',
node=<Code: Remote code 'cp-5.1' on daint, pk: 75709, uuid: 3c9cdb7f-0cda-402e-b898-4dd0d06aa5a4>),
Neighbor(link_type='inputlink', label='parameters',
node=<Dict: uuid: 94efe64f-7f7e-46ea-922a-fe64a7fba8a5 (pk: 139166)>)
Neighbor(link_type='inputlink', label='parent_calc_folder',
node=<RemoteData: uuid: becb4894-c50c-4779-b84f-713772eaceff (pk: 139118)>)
Neighbor(link_type='inputlink', label='pseudo_Ba',
node=<UpfData: uuid: 5e53b22d-5757-4d50-bbe0-51f3b9ac8b7c (pk: 1905)>)
Neighbor(link_type='inputlink', label='pseudo_O',
node=<UpfData: uuid: 5cccd0d9-7944-4c67-b3c7-a39a1f467906 (pk: 1658)>)
Neighbor(link_type='inputlink', label='pseudo_Ti',
node=<UpfData: uuid: e5744077-8615-4927-9f97-c5f7b36ba421 (pk: 1660)>)
Neighbor(link_type='inputlink', label='settings',
node=<Dict: uuid: a5a828b8-fdd8-4d75-b674-2e2d62792de0 (pk: 139167)>)
Neighbor(link_type='inputlink', label='structure',
node=<StructureData: uuid: 3096f83c-6385-48c4-8cb2-24a427ce11b1 (pk: 139001)>)]
In [4]: c.get_outgoing()
Out[4]:
[Neighbor(link_type='createlink', label='output_parameters',
node=<Dict: uuid: f7a3ca96-4594-497f-a128-9843a1f12f7f (pk: 139257)>),
Neighbor(link_type='createlink', label='output_parameters_139257',
node=<Dict: uuid: f7a3ca96-4594-497f-a128-9843a1f12f7f (pk: 139257)>),
Neighbor(link_type='createlink', label='output_trajectory',
node=<TrajectoryData: uuid: 7c5b65bc-22bb-4b87-ac92-e8a78cf145c3 (pk: 139256)>),
Neighbor(link_type='createlink', label='output_trajectory_139256',
node=<TrajectoryData: uuid: 7c5b65bc-22bb-4b87-ac92-e8a78cf145c3 (pk: 139256)>),
Neighbor(link_type='createlink', label='remote_folder',
node=<RemoteData: uuid: 17642a1c-8cac-4e7f-8bd0-1dcebe974aa4 (pk: 139169)>),
Neighbor(link_type='createlink', label='remote_folder_139169',
node=<RemoteData: uuid: 17642a1c-8cac-4e7f-8bd0-1dcebe974aa4 (pk: 139169)>),
Neighbor(link_type='createlink', label='retrieved',
node=<FolderData: uuid: a9037dc0-3d84-494d-9616-42b8df77083f (pk: 139255)>),
Neighbor(link_type='createlink', label='retrieved_139255',
node=<FolderData: uuid: a9037dc0-3d84-494d-9616-42b8df77083f (pk: 139255)>)]
Understanding link names
The nodes may have input and output links. Every input link of a node
should have a unique name and this unique name is mapped to a specific node
. On the other hand, given a node
c
, many output nodes
may share the same output link name. To differentiate between the output nodes of c
that have the same link name, the pk
of the output node is added next to the link name (please see the input & output nodes in the above example).
Folder management¶
Folder
objects represent directories on the disk (virtual or not) where extra information for the node are stored. These folders can be temporary or permanent.
Store & deletion¶
store_all()
stores all the inputnodes
, then it stores the currentnode
and in the end, it stores the cached input links.verify_are_parents_stored()
checks that the parents are stored.store()
method checks that thenode
data is valid, then check ifnode
’s parents are stored, then moves the contents of the temporary folder to the repository folder and in the end, it stores in the database the information that are in the cache. The latter happens with a database transaction. In case this transaction fails, then the data transfered to the repository folder are moved back to the temporary folder.
DbNode¶
The DbNode
is the Django class that corresponds to the Node
class allowing to store and retrieve the needed information from and to the database. Other classes extending the Node
class, like Data
, ProcessNode
and Code
use the DbNode
code too to interact with the database. The main methods are:
get_simple_name()
which returns a string with the type of the class (by stripping the path before the class name).attributes()
which returns the all the attributes of the specific node as a dictionary.extras()
which returns all the extras of the specific node as a dictionary.
Folders¶
AiiDA uses Folder
and its subclasses to add an abstraction layer between the functions and methods working directly on the file-system and AiiDA. This is particularly useful when we want to easily change between different folder options (temporary, permanent etc) and storage options (plain local directories, compressed files, remote files & directories etc).
Folder
¶
This is the main class of the available Folder
classes. Apart from the abstraction provided to the OS operations needed by AiiDA, one of its main features is that it can restrict all the available operations within a given folder limit. The available methods are:
mode_dir()
andmode_file()
return the mode with which folders and files should be writable.get_subfolder()
returns the subfolder matching the given nameget_content_list()
returns the contents matching a pattern.insert_path()
adds a file/folder to a specific location andremove_path()
removes a file/folderget_abs_path()
returns the absolute path of a file/folder under a given folder andabspath()
returns the absolute path of the folder.create_symlink()
creates a symlink pointing the given location inside thefolder
.create_file_from_filelike()
creates a file from the given contents.open()
opens a file in thefolder
.folder_limit()
returns the limit under which the creation of files/folders is restrained.exists()
returns true or false depending whether a folder exists or not.isfile()
and py:meth:~aiida.common.folders.Folder.isdir return true or false depending on the existence of the given file/folder.create()
creates thefolder
,erase()
deletes thefolder
andreplace_with_folder()
copies/moves a given folder.
RepositoryFolder
¶
Objects of this class correspond to the repository folders. The RepositoryFolder
specific methods are:
__init__()
initializes the object with the necessary folder names and limits.get_topdir()
returns the top directory.section()
returns the section to which thefolder
belongs. This can be for the moment onlynode
.subfolder()
returns the subfolder within the section/uuid folder.uuid()
the UUID of the correspondingnode
.
SandboxFolder
¶
SandboxFolder
objects correspond to temporary (“sandbox”) folders. The main methods are:
__init__()
creates a new temporary folder__exit__()
destroys the folder on exit.
Data¶
ProcessNode¶
CalculationNode¶
Navigating inputs and outputs¶
inputs()
returns aNodeLinksManager()
object that can be used to access the node’s incoming INPUT_CALC links.The
NodeLinksManager
can be used to quickly go from a node to a neighboring node. For example:In [1]: # Let's load a node with a specific pk In [2]: c = load_node(139168) In [3]: c Out[3]: <CpCalculation: uuid: 49084dcf-c708-4422-8bcf-808e4c3382c2 (pk: 139168)> In [4]: # Let's traverse the inputs of this node. In [5]: # By typing c.inputs.<TAB> we get all the input links In [6]: c.inputs. c.inputs.code c.inputs.parent_calc_folder c.inputs.pseudo_O c.inputs.settings c.inputs.parameters c.inputs.pseudo_Ba c.inputs.pseudo_Ti c.inputs.structure In [7]: # We may follow any of these links to access other nodes. For example, let's follow the parent_calc_folder In [8]: c.inputs.parent_calc_folder Out[8]: <RemoteData: uuid: becb4894-c50c-4779-b84f-713772eaceff (pk: 139118)> In [9]: # Let's assign to r the node reached by the parent_calc_folder link In [10]: r = c.inputs.parent_calc_folder In [11]: r.inputs.__dir__() Out[11]: ['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__format__', '__getattr__', '__getattribute__', '__getitem__', '__hash__', '__init__', '__iter__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', u'remote_folder']
The
.inputs
manager forWorkflowNode
and the.outputs
manager both forCalculationNode
andWorkflowNode
work in the same way (see below).outputs()
returns aNodeLinksManager()
object that can be used to access the node’s outgoing CREATE links.
Updatable attributes¶
The ProcessNode
class is a subclass of the Node
class, which means that its attributes become immutable once stored.
However, for a Calculation
to be runnable it needs to be stored, but that would mean that its state, which is stored in an attribute can no longer be updated.
To solve this issue the Sealable
mixin is introduced. This mixin can be used for subclasses of Node
that need to have updatable attributes even after the node has been stored in the database.
The mixin defines the _updatable_attributes
tuple, which defines the attributes that are considered to be mutable even when the node is stored.
It also allows the node to be sealed, after which even the updatable attributes become immutable.
WorkflowNode¶
Navigating inputs and outputs¶
inputs()
returns aNodeLinksManager()
object that can be used to access the node’s incoming INPUT_WORK links.outputs()
returns aNodeLinksManager()
object that can be used to access the node’s outgoing RETURN links.
ORM overview¶
Below you find an overview of the main classes in the AiiDA object-relational mapping.
For the complete API documentation see aiida.orm
.
Deprecated features, renaming, and adding new methods¶
In case a method is renamed or removed, this is the procedure to follow:
(If you want to rename) move the code to the new function name. Then, in the docstring, add something like:
.. versionadded:: 0.7 Renamed from OLDMETHODNAME
Don’t remove directly the old function, but just change the code to use the new function, and add in the docstring:
.. deprecated:: 0.7 Use :meth:`NEWMETHODNAME` instead.
Moreover, at the beginning of the function, add something like:
import warnings # If we call this DeprecationWarning, pycharm will properly strike out the function from aiida.common.warnings import AiidaDeprecationWarning as DeprecationWarning # pylint: disable=redefined-builtin warnings.warn("<Deprecation warning here - MAKE IT SPECIFIC TO THIS DEPRECATION, as it will be shown only once per different message>", DeprecationWarning) # <REST OF THE FUNCTION HERE>
(of course replace the parts between
< >
symbols with the correct strings).The advantage of the method above is:
pycharm will still show the method crossed out
Our
AiidaDeprecationWarning
does not inherit fromDeprecationWarning
, so it will not be “hidden” by pythonUser can disable our warnings (and only those) by using AiiDA properties with:
verdi config warnings.showdeprecations False
Changing the config.json structure¶
In general, changes to config.json
should be avoided if possible. However, if there is a need to modify it, the following procedure should be used to create a migration:
Determine whether the change will be backwards-compatible. This means that an older version of AiiDA will still be able to run with the new
config.json
structure. It goes without saying that it’s preferable to changeconfig.json
in a backwards-compatible way.In
aiida/manage/configuration/migrations/migrations.py
, increase theCURRENT_CONFIG_VERSION
by one. If the change is not backwards-compatible, setOLDEST_COMPATIBLE_CONFIG_VERSION
to the same value.Write a function which transforms the old config dict into the new version. It is possible that you need user input for the migration, in which case this should also be handled in that function.
Add an entry in
_MIGRATION_LOOKUP
where the key is the version before the migration, and the value is aConfigMigration
object. TheConfigMigration
is constructed from your migration function, and the hard-coded values ofCURRENT_CONFIG_VERSION
andOLDEST_COMPATIBLE_CONFIG_VERSION
. If these values are not hard-coded, the migration will break as soon as the values are changed again.Add tests for the migration, in
aiida/backends/tests/manage/configuration/migrations/test_migrations.py
. You can add two types of tests:- Tests that run the entire migration, using the
check_and_migrate_config
function. Make sure to run it withstore=False
, otherwise it will overwrite yourconfig.json
file. For these tests, you will have to update the reference files. - Tests that run a single step in the migration, using the
ConfigMigration.apply
method. This can be used if you need to test different edge cases of the migration.
- Tests that run the entire migration, using the
There are examples for both types of tests.
Daemon and signal handling¶
While the AiiDA daemon is running, interrupt signals (SIGINT
and SIGTERM
) are captured so that the daemon can shut down gracefully. This is implemented using Python’s signal
module, as shown in the following dummy example:
import signal
def print_foo(*args):
print('foo')
signal.signal(signal.SIGINT, print_foo)
You should be aware of this while developing code which runs in the daemon. In particular, it’s important when creating subprocesses. When a signal is sent, the whole process group receives that signal. As a result, the subprocess can be killed even though the Python main process captures the signal. This can be avoided by creating a new process group for the subprocess, meaning that it will not receive the signal. To do this, you need to pass preexec_fn=os.setsid
to the subprocess
function:
import os
import subprocess
print(subprocess.check_output('sleep 3; echo bar', preexec_fn=os.setsid))