Querying in AiiDA¶
The advantage of storing information in a database is that questions can be asked on the data, and an answer can be rapidly provided.
Here we describe different approaches to query the data in AiiDA.
This section is still only a stub and will be significantly improved in the next versions.
Directly querying in Django¶
If you know how AiiDA stores the data internally in the database, you can
directly use Django to query the database (or even use directly SQL commands,
if you really feel the urge to do so). Documentation on how queries work
in Django can be found on the official Django documentation. The models can be found in
aiida.djsite.db.models and is directly accessible as
verdi shell via
We provide a Python class (
aiida.orm.querytool.QueryTool) to perform the most common types of queries
(mainly on nodes, links and their attributes) through an easy Python
class interface, without the need to know anything about the SQL query language.
We are working a lot on the interface for querying through the QueryTool, so the interface could change significantly in the future to allow for more advanced querying capabilities.
To use it, in your script (or within the verdi shell)
you need first to load the
from aiida.orm.querytool import QueryTool
Then, create an instance of this class, which will represent your query (you need to create a new instance for each different query you want to execute):
q = QueryTool()
Now, you can call a set of methods on the
q object to decide the filters
you want to apply. The first type of filter one may want to apply is on the
type of nodes you want to obtain (the QueryTool, in the current version,
always queries only nodes in the DB). You can do so passing the correct
Node subclass to the
method, for instance:
Then, if you want to query only calculations within a given group:
group_name is the name of the group you want to select.
exclude parameter, if
negates the query (i.e., considers all objects not included in the
give group). You can call the
multiple times to add more filters.
The most important query specification, though, is on the attributes of a given node.
If you want to query for attributes in the
q.add_attr_filter("energy", "<=", 0., relnode="res")
At this point, the query
q describes a query you still have to run, which
will return each calculation
for which the result node
calc.res.energy is less or equal to 0.
relnode parameter allows the user to perform queries not only
on the nodes you want to get out of the query (in this case, do not specify
relnode parameter) but also on the value of the attributes of
nodes linked to the result nodes. For instance, specifying
relnode, one gets as result of the query nodes whose output result
has a negative energy.
Also in this case, you can add multiple filters on attributes, or you can
use the same syntax also on data you stored in the
We remind here that while attributes are properties that describe
a node, are used internally by AiiDA and cannot be changed
after the node is stored –
for instance, the coordinates of atoms in a crystal structure, the input
parameters for a calculation, ... – extras (stored in
the same format and are at full disposal of the user for adding metadata
to each node, tagging, and later quick querying.
Finally, to run the query and get the results, you can use the
run_query() method, that will
return an iterator over the results of the query. For instance, if you
B as extra data of a given node, you can get a list
of the energy of each calculation, and the value of
the following command:
res = [(node.res.energy, node.get_extra("A"), node.get_extra("B") ) for node in q.run_query()]
After having run a query, if you want to run a new one, even if
it is a simple modification of the current one, please discard the
object and create a new one with the new filters.
The transitive closure table¶
Another type of query that is very common is the discovery of whether two nodes are linked through a path in the AiiDA graph database, regardless of how many nodes are in between.
This is particularly important because, for instance, you may be interested in discovering which crystal structures have, say, all phonon frequencies that are positive; but the information on the phonon frequencies is in a node that is typically not directly linked to the crystal structure (you typically have in between at least a SCF calculation, a phonon calculation on a coarse grid, and an interpolation of the phonon bands on a denser grid; moreover, each calculation may include multiple restarts).
In order to make these queries very efficient (and since we expect that typical workflows, especially in Physics and Materials Science, involve a lot of relatively small, disconnected graphs), we have implemented triggers at the database SQL level to automatically generate a transitive closure table, i.e., a table that for each node contains all his parents (at any depth level) and all the children (at any depth level). This means that, every time two nodes are joined by a link, this table is automatically updated to contain all the new available paths.
With the aid of such a table, discovering if two nodes are connected or not
becomes a matter of a single query.
This table is accessible using Django commands, and is called
Transitive closure paths contain a parent and a child.
Moreover, they also contain a
depth, giving how many nodes have to
be traversed to connect the two
child nodes (to make
this possible, an entry in the DbPath table is stored for each possible
path in the graph). The depth does not include the first and last node
(so, a depth of zero means that two nodes are directly connected through
Three further columns are stored, and they are mainly used to quickly (and recursively) discover which are the nodes that have been traversed.
The description of the exact meaning of the three additional
will be added soon; in the meatime, you can give a look to the
implementation of the
Finally, given a
DbPath object, we provide a
expand() method to get a list
of all the nodes (in the correct order) that are traversed by
the specific path. List elements are AiiDA nodes.
Here we present a simple example of how you can use the transitive closure
table, imagining that you want to get the path between two nodes
We will assume that only a single path exists between the two nodes. If no
path exists, an exception will be raised in the line marked below.
If more than one path exists, only the first one will be returned.
The extension to manage the exception and to manage multiple paths
n1 = load_node(NODEPK1) n2 = load_node(NODEPK2) # In the following line, we are choosing only the first # path returned by the query (with ). # Change here to manage zero or multiple paths! dbpath = models.DbPath.objects.filter(parent=n1, child=n2) # Print all nodes in the path print dbpath.expand()