Caching: implementation details

This section covers some details of the caching mechanism which are not discussed in the user guide. If you are developing plugins and want to modify the caching behavior of your classes, we recommend you read this section first.

Controlling hashing

Below are some methods you can use to control how the hashes of calculation and data classes are computed:

  • To ignore specific attributes, a Node subclass can have a _hash_ignored_attributes attribute. This is a list of attribute names, which are ignored when creating the hash.

  • For calculations, the _hash_ignored_inputs attribute lists inputs that should be ignored when creating the hash.

  • To add things which should be considered in the hash, you can override the _get_objects_to_hash() method. Note that doing so overrides the behavior described above, so you should make sure to use the super() method.

  • Pass a keyword argument to get_hash(). These are passed on to make_hash().

Controlling caching

There are several methods you can use to disable caching for particular nodes:

On the level of generic aiida.orm.nodes.Node:

  • The is_valid_cache() property determines whether a particular node can be used as a cache. This is used for example to disable caching from failed calculations.

  • Node classes have a _cachable attribute, which can be set to False to completely switch off caching for nodes of that class. This avoids performing queries for the hash altogether.

On the level of aiida.engine.processes.process.Process and aiida.orm.nodes.process.ProcessNode:

  • The ProcessNode.is_valid_cache calls Process.is_valid_cache, passing the node itself. This can be used in Process subclasses (e.g. in calculation plugins) to implement custom ways of invalidating the cache.

  • The spec.exit_code has a keyword argument invalidates_cache. If this is set to True, returning that exit code means the process is no longer considered a valid cache. This is implemented in Process.is_valid_cache.

The WorkflowNode example

As discussed in the user guide, nodes which can have RETURN links cannot be cached. This is enforced on two levels:

  • The _cachable property is set to False in the Node, and only re-enabled in CalculationNode (which affects CalcJobs and calcfunctions). This means that a WorkflowNode will not be cached.

  • The _store_from_cache method, which is used to “clone” an existing node, will raise an error if the existing node has any RETURN links. This extra safe-guard prevents cases where a user might incorrectly override the _cachable property on a WorkflowNode subclass.

Design guidelines

When modifying the hashing/caching behaviour of your classes, keep in mind that cache matches can go wrong in two ways:

  • False negatives, where two nodes should have the same hash but do not

  • False positives, where two different nodes get the same hash by mistake

False negatives are highly preferrable because they only increase the runtime of your calculations, while false positives can lead to wrong results.