Database#
Advanced querying#
The basics on using the QueryBuilder
to find the data you are interested in is explained in the finding and querying how-to.
This section explains some more advanced methods for querying your database and the QueryBuilder dictionary.
Working with edges#
Filters and projections can be applied to both the vertices of the query path and the edges that connect them.
Applying a filter or projection to an edge works the same way as for vertices, but the relevant keyword is now preceded by edge_
.
Using the ArithmeticAddCalculation
calculation job as an example, let’s say we want to query for the first input of the addition, i.e. the Int
nodes which have been provided as the input with label x
:
from aiida.orm import QueryBuilder
qb = QueryBuilder()
qb.append(CalcJobNode, tag='calcjob')
qb.append(Int, with_outgoing='calcjob', edge_filters={'label': 'x'})
By using the edge_filters
keyword argument, we can query for only the inputs that have the label x
.
Note that any operator that can be used to filter vertices can also be applied to edges.
Say we want to find all input Int
nodes that are not connected to the CalcJobNode
’s via an edge with label x
:
qb = QueryBuilder()
qb.append(CalcJobNode, tag='calcjob')
qb.append(Int, with_outgoing='calcjob', edge_filters={'label': {'!==': 'x'}})
Here, the equality operator ==
is negated by prepending an exclamation mark !
.
See the reference table below for a table with all operators.
Similar to filters, we can project information of the edge using the edge_project
keyword argument:
qb = QueryBuilder()
qb.append(CalcJobNode, tag='calcjob')
qb.append(Int, with_outgoing='calcjob', edge_project='label')
In the example above, we are querying for the edge labels of the incoming Int
nodes of all CalcJobNode
’s.
Ordering and limiting results#
You can order the results of your query by the properties of the entity.
Say you want to return the list of Int
outputs from all CalcJobNode
’s, sorted by the time they were created in descending order, i.e. the most recent first:
qb = QueryBuilder()
qb.append(CalcJobNode, tag='calcjob')
qb.append(Int, with_incoming='calcjob')
qb.order_by({Int: {'ctime': 'desc'}})
This can also be used to order your results based on values in a (nested) dictionary, such as the attributes
column.
However, as the QueryBuilder
cannot infer the type of the value in this case, you have to cast the type:
qb = QueryBuilder()
qb.append(CalcJobNode, tag='calcjob')
qb.append(Int, with_incoming='calcjob')
qb.order_by({Int: {'attributes.value': {'order': 'asc', 'cast': 'i'}}})
The query above will return all Int
nodes that are output of all CalcJobNode
’s, in ascending order of their value, i.e. from small to big.
Note that in this case you have to specify the order operation with a dictionary, where the order
key details how you want to order the query results and the cast
key informs the QueryBuilder
of the attribute type.
A list of the available cast types and their aliases can be found in the table below:
Python type |
Alias |
SQLAlchemy type |
---|---|---|
int |
i |
Integer |
float |
f |
Float |
bool |
b |
Boolean |
str |
t |
String |
dict |
j |
JSONB |
datetime.datetime |
d |
DateTime |
You can also order using multiple properties by providing a list of dictionaries that each specify one sorting operation:
qb = QueryBuilder()
qb.append(CalcJobNode, tag='calcjob')
qb.append(Int, with_incoming='calcjob')
qb.order_by({Int: [{'attributes.value': {'order': 'asc', 'cast': 'f'}}, {'ctime': 'desc'}]})
Here the Int
nodes will first be sorted by their value in ascending order.
Nodes for which the value is equal are subsequently sorted by their modification time in descending order.
Finally, you can also limit the number of query results returned with the limit()
method.
Suppose you only want the first three results from our query:
qb = QueryBuilder()
qb.append(CalcJobNode)
qb.limit(3)
This can be easily combined with the order_by
method in order to get the last three CalcJobNode
’s that were created in the database:
qb = QueryBuilder()
qb.append(CalcJobNode)
qb.limit(3)
qb.order_by({CalcJobNode: {'ctime': 'desc'}})
Reference tables#
List of all operators:
Operator |
Datatype |
Example |
Explanation |
---|---|---|---|
|
all |
|
Filter for equality |
|
all |
|
Filter for values that are in the given list. |
|
float, integer, date |
|
Filter for values that are greater or smaller than a certain value |
|
string |
|
Filter for matching substrings where |
|
string |
|
Case insensitive version of |
|
list of expressions |
|
A list of expressions where at least one should be matched. |
|
list of expressions |
|
A list of expressions where all should be matched. |
|
dict |
|
Filter for dictionaries that contain a certain key. |
|
any |
|
Filter for values of a certain type. |
|
lists |
|
Filter for lists of a certain length. |
|
lists |
|
Filter for lists that are shorter than a certain length. |
|
lists |
|
Filter for lists that are longer than a certain length. |
|
lists |
|
Filter for lists that should contain certain values. |
New in version 2.6: Logical expression syntax
In the new logical expression syntax for filters and projections, the above operations are distributed by type. When using the autocompletion feature in constructing query expressions, only operations associated with the type of the queried field will be presented.
For example, Node.fields.uuid
is a string type, and as such, Node.fields.uuid.
will, in addition to ==
and in_
, only suggest like
and ilike
.
As mentioned in the section about operator negations all operators can be turned into their associated negation (NOT
operator) by adding a !
in front of the operator.
New in version 2.6: Programmatic filter negation
In the new logical expression syntax, negation is defined with ~
, such that ~(Node.fields.some_field < 1)
is equivalent to Node.fields.some_field >= 1
.
Note
The form of (negation) operators in the rendered SQL may differ from the ones specified in the QueryBuilder
instance.
For example, the !==
operator of the QueryBuilder
will be rendered to !=
in SQL.
List of all relationships:
Entity from |
Entity to |
Relationship |
Explanation |
---|---|---|---|
Node |
Node |
with_outgoing |
One node as input of another node |
Node |
Node |
with_incoming |
One node as output of another node |
Node |
Node |
with_descendants |
One node as the ancestor of another node (Path) |
Node |
Node |
with_ancestors |
One node as descendant of another node (Path) |
Node |
Group |
with_node |
The group of a node |
Group |
Node |
with_group |
The node is a member of a group |
Node |
Computer |
with_node |
The computer of a node |
Computer |
Node |
with_computer |
The node of a computer |
Node |
User |
with_node |
The creator of a node is a user |
User |
Node |
with_user |
The node was created by a user |
User |
Group |
with_user |
The node was created by a user |
Group |
User |
with_group |
The node was created by a user |
Node |
Log |
with_node |
The log of a node |
Log Node |
Node Comment |
with_log with_node |
The node has a log The comment of a node |
Comment |
Node |
with_comment |
The node has a comment |
User |
Comment |
with_user |
The comment was created by a user |
Comment |
User |
with_comment |
The creator of a comment is a user |
Converting the QueryBuilder to/from a dictionary#
Important
In aiida-core version 1, this dictionary was accessed with QueryBuilder.queryhelp
, which is now deprecated.
The QueryBuilder
class can be converted to a dictionary and also loaded from a dictionary, for easy serialisation and storage.
Once you have built your query using the appender method explained in the finding and querying for data how-to and the advanced sections above, you can easily store your query by saving the QueryBuilder.as_dict()
dictionary as a JSON file for later use:
import json
from aiida.orm import QueryBuilder
qb = QueryBuilder()
qb.append(CalcJobNode)
with open("querydict.json", "w") as file:
file.write(json.dumps(qb.as_dict(), indent=4))
To use this dictionary to instantiate the QueryBuilder
, you can use the from_dict
class method:
with open("querydict.json", "r") as file:
query_dict = json.load(file)
qb = QueryBuilder.from_dict(query_dict)
Alternatively, you can also use a dictionary to set up your query by specifying the path, filters and projections and constructing the dictionary by hand. To do this, you have to specify:
the
path
: Here, the user specifies the path along which to join tables as a list of dictionaries, where each list item identifies a vertex in your path. You define the vertex class with thecls
key:query_dict = { 'path':[ {'cls': Data} ] }
Each entity in the query has to have a unique tag. If the tag is not provided, it is set to the name of the class. However, this will not work if you choose the same class twice in the query. In this case you have to provide the tag using the
tag
key:query_dict = { 'path':[ { 'cls':Node, 'tag':'node_1' }, { 'cls':Node, 'tag':'node_2' } ] }
You also have to detail some information on the vertex edges, in order to connect them correctly. There are several redundant ways this can be done:
You can specify that this node is an input or output of another node preceding the current one in the list. That other node can be specified by an integer or the class or type. The following examples are all valid joining instructions, assuming there is a structure defined at index 2 of the path with tag “struc1”:
edge_specification = query_dict['path'][3] edge_specification['with_incoming'] = 2 edge_specification['with_incoming'] = StructureData edge_specification['with_incoming'] = 'struc1' edge_specification['with_outgoing'] = 2 edge_specification['with_outgoing'] = StructureData edge_specification['with_outgoing'] = 'struc1'
query_dict['path'][<i>]['direction'] = integer
If any of the above specs (“with_outgoing”, “with_incoming”) were not specified, the key “direction” is looked for. Directions are defined as distances in the tree. 1 is defined as one step down the tree along a link. This means that 1 joins the node specified in this dictionary to the node specified on list-item before as an output. Direction defaults to 1, which is why, if nothing is specified, this node is joined to the previous one as an output by default. A negative number reverse the direction of the link. The absolute value of the direction defines the table to join to with respect to your own position in the list. An absolute value of 1 joins one table above, a value of 2 to the table defined 2 indices above. The two following dictionaries yield the same query:
from aiida.orm import TrajectoryData from aiida_quantumespresso.calculations.pw import PwCalculation from aiida.orm import Dict query_dict_1 = { 'path': [ { 'cls':PwCalculation }, { 'cls':TrajectoryData }, { 'cls':Dict, 'direction':-2 } ] } # returns same query as: query_dict_2 = { 'path':[ { 'cls':PwCalculation }, { 'cls':TrajectoryData }, { 'cls':Dict, 'with_outgoing':PwCalculation } ] } # Shorter version: query_dict_3 = { 'path':[ Dict, PwCalculation, TrajectoryData, ] }
what to
project
: Determining which columns the query will return:query_dict = { 'path':[PwCalculation], 'project':{ PwCalculation:['user_id', 'id'], } }
If you are using JSONB columns, you can also project a value stored inside the json:
query_dict = { 'path':[ PwCalculation, StructureData, ], 'project':{ PwCalculation:['state', 'id'], } }
Returns the state and the id of all instances of
PwCalculation
where a structures is linked as output of a relax-calculation. The strings that you pass have to be name of the columns. If you pass an asterisk (‘*’), the query will return the instance of the AiidaClass.the
filters
: Filters enable you to further specify the query. This is an example for a query for structures that were added after a certain time (say last 4 days) and have an id larger than 50:from aiida.common import timezone from datetime import timedelta query_dict = { 'path':[ {'cls':PwCalculation}, # PwCalculation with structure as output {'cls':StructureData} ], 'filters':{ StructureData:{ 'ctime':{'>': timezone.now() - timedelta(days=4)}, 'id':{'>': 50} } } }
If you want to include filters and projections on links between nodes, you will have to add these to filters and projections in the query dictionary. Let’s take an example from before and add a few filters on the link:
query_dict = {
'path':[
{'cls':PwCalculation, 'tag':'relax'}, # PwCalculation with structure as output
{'cls':StructureData, 'tag':'structure'}
],
'filters':{
'structure':{
'id':{'>': 50}
},
'relax--structure':{
'label':{'like':'output_%'},
}
},
'project':{
'relax--structure':['label'],
'structure':['label'],
'relax':['label', 'uuid'],
}
}
Notice that the tag for the link, by default, is the tag of the two connecting nodes delimited by two dashes ‘–’ and the order DOES matter.
Alternatively, you can choose the tag for the edge in the path when defining the entity to join using edge_tag
:
query_dict = {
'path':[
{'cls':PwCalculation, 'tag':'relax'}, # Relaxation with structure as output
{
'cls':StructureData,
'tag':'structure',
'edge_tag':'ThisIsMyLinkTag' # Definining the link tag
}
],
'filters':{
'structure':{
'id':{'>': 50}
},
'ThisIsMyLinkTag':{ # Using this link tag
'label':{'like':'output_%'},
}
},
'project':{
'ThisIsMyLinkTag':['label'],
'structure':['label'],
'relax':['label', 'uuid'],
}
}
Limits and offset can be set directly like this:
query_dict = {
'path':[Node],
'limit':10,
'offset':20
}
That query_dict
would tell the QueryBuilder to return 10 rows after the first 20 have been skipped.