Database¶

Advanced querying¶

The basics on using the QueryBuilder to find the data you are interested in is explained in the finding and querying how-to. This section explains some more advanced methods for querying your database and the QueryBuilder dictionary.

Working with edges¶

Filters and projections can be applied to both the vertices of the query path and the edges that connect them. Applying a filter or projection to an edge works the same way as for vertices, but the relevant keyword is now preceded by edge_. Using the ArithmeticAddCalculation calculation job as an example, let’s say we want to query for the first input of the addition, i.e. the Int nodes which have been provided as the input with label x:

from aiida.orm import QueryBuilder
qb = QueryBuilder()
qb.append(CalcJobNode, tag='calcjob')
qb.append(Int, with_outgoing='calcjob', edge_filters={'label': 'x'})

By using the edge_filters keyword argument, we can query for only the inputs that have the label x. Note that any operator that can be used to filter vertices can also be applied to edges. Say we want to find all input Int nodes that are not connected to the CalcJobNode’s via an edge with label x:

qb = QueryBuilder()
qb.append(CalcJobNode, tag='calcjob')
qb.append(Int, with_outgoing='calcjob', edge_filters={'label': {'!==': 'x'}})

Here, the equality operator == is negated by prepending an exclamation mark !. See the reference table below for a table with all operators. Similar to filters, we can project information of the edge using the edge_project keyword argument:

qb = QueryBuilder()
qb.append(CalcJobNode, tag='calcjob')
qb.append(Int, with_outgoing='calcjob', edge_project='label')

In the example above, we are querying for the edge labels of the incoming Int nodes of all CalcJobNode’s.

Ordering and limiting results¶

You can order the results of your query by the properties of the entity. Say you want to return the list of Int outputs from all CalcJobNode’s, sorted by the time they were created in descending order, i.e. the most recent first:

qb = QueryBuilder()
qb.append(CalcJobNode, tag='calcjob')
qb.append(Int, with_incoming='calcjob')
qb.order_by({Int: {'ctime': 'desc'}})

This can also be used to order your results based on values in a (nested) dictionary, such as the attributes column. However, as the QueryBuilder cannot infer the type of the value in this case, you have to cast the type:

qb = QueryBuilder()
qb.append(CalcJobNode, tag='calcjob')
qb.append(Int, with_incoming='calcjob')
qb.order_by({Int: {'attributes.value': {'order': 'asc', 'cast': 'i'}}})

The query above will return all Int nodes that are output of all CalcJobNode’s, in ascending order of their value, i.e. from small to big. Note that in this case you have to specify the order operation with a dictionary, where the order key details how you want to order the query results and the cast key informs the QueryBuilder of the attribute type. A list of the available cast types and their aliases can be found in the table below:

Python type	Alias	SQLAlchemy type
int	i	Integer
float	f	Float
bool	b	Boolean
str	t	String
dict	j	JSONB
datetime.datetime	d	DateTime

You can also order using multiple properties by providing a list of dictionaries that each specify one sorting operation:

qb = QueryBuilder()
qb.append(CalcJobNode, tag='calcjob')
qb.append(Int, with_incoming='calcjob')
qb.order_by({Int: [{'attributes.value': {'order': 'asc', 'cast': 'f'}}, {'ctime': 'desc'}]})

Here the Int nodes will first be sorted by their value in ascending order. Nodes for which the value is equal are subsequently sorted by their modification time in descending order.

Finally, you can also limit the number of query results returned with the limit() method. Suppose you only want the first three results from our query:

qb = QueryBuilder()
qb.append(CalcJobNode)
qb.limit(3)

This can be easily combined with the order_by method in order to get the last three CalcJobNode’s that were created in the database:

qb = QueryBuilder()
qb.append(CalcJobNode)
qb.limit(3)
qb.order_by({CalcJobNode: {'ctime': 'desc'}})

Reference tables¶

List of all operators:

Operator	Datatype	Example	Explanation
`==`	all	`'id': {'==': 123}`	Filter for equality
`in`	all	`'name': {'in': ['foo', 'bar']}`	Filter for values that are in the given list.
`>,<,<=,>=`	float, integer, date	`'ctime': {'<': datetime(2016, 03, 03)}`	Filter for values that are greater or smaller than a certain value
`like`	string	`'name': {'like': 'label%'}`	Filter for matching substrings where `%` and `_` are wildcards. To match a literal `%` or `_` escape it by prefixing it with `\\`.
`ilike`	string	`'name': {'ilike': 'lAbEl%'}`	Case insensitive version of `like`.
`or`	list of expressions	`'id': {'or': [{'<': 12}, {'==': 199}]}`	A list of expressions where at least one should be matched.
`and`	list of expressions	`'id': {'and': [{'<': 12}, {'>': 1}]}`	A list of expressions where all should be matched.
`has_key`	dict	`'attributes': {'has_key': 'some_key'}`	Filter for dictionaries that contain a certain key.
`of_type`	any	`'attributes.some_key': {'of_type': 'bool'}`	Filter for values of a certain type.
`of_length`	lists	`'attributes.some_list': {'of_length': 4}`	Filter for lists of a certain length.
`shorter`	lists	`'attributes.some_list': {'shorter': 4}`	Filter for lists that are shorter than a certain length.
`longer`	lists	`'attributes.some_list': {'longer': 4}`	Filter for lists that are longer than a certain length.
`contains`	lists	`'attributes.some_key': {'contains': ['a', 'b']}`	Filter for lists that should contain certain values.

List of all relationships:

Entity from	Entity to	Relationship	Explanation
Node	Node	with_outgoing	One node as input of another node
Node	Node	with_incoming	One node as output of another node
Node	Node	with_descendants	One node as the ancestor of another node (Path)
Node	Node	with_ancestors	One node as descendant of another node (Path)
Node	Group	with_node	The group of a node
Group	Node	with_group	The node is a member of a group
Node	Computer	with_node	The computer of a node
Computer	Node	with_computer	The node of a computer
Node	User	with_node	The creator of a node is a user
User	Node	with_user	The node was created by a user
User	Group	with_user	The node was created by a user
Group	User	with_group	The node was created by a user
Node	Log	with_node	The log of a node
Log Node	Node Comment	with_log with_node	The node has a log The comment of a node
Comment	Node	with_comment	The node has a comment
User	Comment	with_user	The comment was created by a user
Comment	User	with_comment	The creator of a comment is a user

Converting the QueryBuilder to/from a dictionary¶

Important

In aiida-core version 1, this dictionary was accessed with QueryBuilder.queryhelp, which is now deprecated.

The QueryBuilder class can be converted to a dictionary and also loaded from a dictionary, for easy serialisation and storage. Once you have built your query using the appender method explained in the finding and querying for data how-to and the advanced sections above, you can easily store your query by saving the QueryBuilder.as_dict() dictionary as a JSON file for later use:

import json
from aiida.orm import QueryBuilder

qb = QueryBuilder()
qb.append(CalcJobNode)

with open("querydict.json", "w") as file:
    file.write(json.dumps(qb.as_dict(), indent=4))

To use this dictionary to instantiate the QueryBuilder, you can use the from_dict class method:

with open("querydict.json", "r") as file:
    query_dict = json.load(file)

qb = QueryBuilder.from_dict(query_dict)

Alternatively, you can also use a dictionary to set up your query by specifying the path, filters and projections and constructing the dictionary by hand. To do this, you have to specify:

the path: Here, the user specifies the path along which to join tables as a list of dictionaries, where each list item identifies a vertex in your path. You define the vertex class with the cls key:
```
query_dict = {
    'path':[
        {'cls': Data}
    ]
}
```
Each entity in the query has to have a unique tag. If the tag is not provided, it is set to the name of the class. However, this will not work if you choose the same class twice in the query. In this case you have to provide the tag using the tag key:
```
query_dict = {
    'path':[
        {
            'cls':Node,
            'tag':'node_1'
        },
        {
            'cls':Node,
            'tag':'node_2'
        }
    ]
}
```
You also have to detail some information on the vertex edges, in order to connect them correctly. There are several redundant ways this can be done:
- You can specify that this node is an input or output of another node preceding the current one in the list. That other node can be specified by an integer or the class or type. The following examples are all valid joining instructions, assuming there is a structure defined at index 2 of the path with tag “struc1”:
```
edge_specification = query_dict['path'][3]
edge_specification['with_incoming'] = 2
edge_specification['with_incoming'] = StructureData
edge_specification['with_incoming'] = 'struc1'
edge_specification['with_outgoing']  = 2
edge_specification['with_outgoing']  = StructureData
edge_specification['with_outgoing']  = 'struc1'
```
- query_dict['path'][<i>]['direction'] = integer
  
  If any of the above specs (“with_outgoing”, “with_incoming”) were not specified, the key “direction” is looked for. Directions are defined as distances in the tree. 1 is defined as one step down the tree along a link. This means that 1 joins the node specified in this dictionary to the node specified on list-item before as an output. Direction defaults to 1, which is why, if nothing is specified, this node is joined to the previous one as an output by default. A negative number reverse the direction of the link. The absolute value of the direction defines the table to join to with respect to your own position in the list. An absolute value of 1 joins one table above, a value of 2 to the table defined 2 indices above. The two following dictionaries yield the same query:
```
from aiida.orm import TrajectoryData
from aiida_quantumespresso.calculations.pw import PwCalculation
from aiida.orm import Dict
query_dict_1 = {
    'path': [
        {
            'cls':PwCalculation
        },
        {
            'cls':TrajectoryData
        },
        {
            'cls':Dict,
            'direction':-2
        }
    ]
}

# returns same query as:

query_dict_2 = {
    'path':[
        {
            'cls':PwCalculation
        },
        {
            'cls':TrajectoryData
        },
        {
            'cls':Dict,
            'with_outgoing':PwCalculation
        }
    ]
}

# Shorter version:

query_dict_3 = {
    'path':[
        Dict,
        PwCalculation,
        TrajectoryData,
    ]
}
```
what to project: Determining which columns the query will return:
```
query_dict = {
    'path':[PwCalculation],
    'project':{
        PwCalculation:['user_id', 'id'],
    }
}
```
If you are using JSONB columns, you can also project a value stored inside the json:
```
query_dict = {
    'path':[
        PwCalculation,
        StructureData,
    ],
    'project':{
        PwCalculation:['state', 'id'],
    }
}
```
Returns the state and the id of all instances of PwCalculation where a structures is linked as output of a relax-calculation. The strings that you pass have to be name of the columns. If you pass an asterisk (‘*’), the query will return the instance of the AiidaClass.

the filters: Filters enable you to further specify the query. This is an example for a query for structures that were added after a certain time (say last 4 days) and have an id larger than 50:

from aiida.common import timezone
from datetime import timedelta

query_dict = {
    'path':[
        {'cls':PwCalculation}, # PwCalculation with structure as output
        {'cls':StructureData}
    ],
    'filters':{
        StructureData:{
            'ctime':{'>':  timezone.now() - timedelta(days=4)},
            'id':{'>': 50}
        }
    }
}

If you want to include filters and projections on links between nodes, you will have to add these to filters and projections in the query dictionary. Let’s take an example from before and add a few filters on the link:

query_dict = {
    'path':[
        {'cls':PwCalculation, 'tag':'relax'}, # PwCalculation with structure as output
        {'cls':StructureData, 'tag':'structure'}
    ],
    'filters':{
        'structure':{
            'id':{'>': 50}
        },
        'relax--structure':{
            'label':{'like':'output_%'},
        }
    },
    'project':{
        'relax--structure':['label'],
        'structure':['label'],
        'relax':['label', 'uuid'],
    }
}

Notice that the tag for the link, by default, is the tag of the two connecting nodes delimited by two dashes ‘–’ and the order DOES matter.

Alternatively, you can choose the tag for the edge in the path when defining the entity to join using edge_tag:

query_dict = {
    'path':[
        {'cls':PwCalculation, 'tag':'relax'},         # Relaxation with structure as output
        {
            'cls':StructureData,
            'tag':'structure',
            'edge_tag':'ThisIsMyLinkTag'     # Definining the link tag
        }
    ],
    'filters':{
        'structure':{
            'id':{'>': 50}
        },
        'ThisIsMyLinkTag':{                  # Using this link tag
            'label':{'like':'output_%'},
        }
    },
    'project':{
        'ThisIsMyLinkTag':['label'],
        'structure':['label'],
        'relax':['label', 'uuid'],
    }
}

Limits and offset can be set directly like this:

query_dict = {
    'path':[Node],
    'limit':10,
    'offset':20
}

That query_dict would tell the QueryBuilder to return 10 rows after the first 20 have been skipped.

Data types

Repository