psql_dos format

The PsqlDosBackend is the primary format for storing provenance data. It stores data in two places:

  1. A PostgreSQL database.

  2. A disk-objectstore repository (see The disk object store).

The database stores all “JSONable” entity data, organized into different tables (closely related to AiiDA ORM entities) and columns/fields. Larger binary data (such as input/output file content), required for nodes, are stored in the disk-objectstore, and referenced by db_dbnode.repository_metadata as a virtual file-system.

Interfacing with the database is achieved using the sqlalchemy ORM API.

The PostgreSQL database schema

The following section provides a complete schema for the PostgreSQL database.

Tables

In all tables, the primary key that uniquely identifies each of their members is a positive integer number in the id field. However, this number is only unique within the table, and thus there can be a user with an id of 2 and a node with an id of 2 in the same database (or, more trivially, two different nodes both with an id of 2, each in a different database).

Most of the entities also have a uuid value. The uuid is meant to serve as an identifier that is unique within all tables of all AiiDA databases in the world. This is a 32-position hexadecimal sequence that is stored as a string with some dash separated sections (for example: 479a312d-e9b6-4bbb-93b4-f0a7174ccbf4).

Note

  • * indicates columns with a unique constraint

  • indicate foreign keys

  • ? indicate value types that are nullable.

DbUser (db_dbuser)

Database model to store data for aiida.orm.User.

Every node that is created has a single user as its author.

The user information consists of the most basic personal contact details.

Columns:

id*

INTEGER

email*

VARCHAR(254)

first_name

VARCHAR(254)

last_name

VARCHAR(254)

institution

VARCHAR(254)

Constraints:

  • PRIMARY KEY (id)

  • UNIQUE (email)

Indexes:

  • ix_pat_db_dbuser_email (email)

DbNode (db_dbnode)

Database model to store data for aiida.orm.Node.

Each node can be categorized according to its node_type, which indicates what kind of data or process node it is. Additionally, process nodes also have a process_type that further indicates what is the specific plugin it uses.

Nodes can also store two kind of properties:

  • attributes are determined by the node_type, and are set before storing the node and can’t be modified afterwards.

  • extras, on the other hand, can be added and removed after the node has been stored and are usually set by the user.

Columns:

id*

INTEGER

uuid*

UUID

node_type

VARCHAR(255)

process_type

VARCHAR(255)?

label

VARCHAR(255)

description

TEXT

ctime

DATETIME

mtime

DATETIME

attributes

JSONB?

extras

JSONB?

repository_metadata

JSONB

→ dbcomputer_id

INTEGER?

→ user_id

INTEGER

Constraints:

  • FOREIGN KEY (db_dbcomputer.id → dbcomputer_id)

  • FOREIGN KEY (db_dbuser.id → user_id)

  • PRIMARY KEY (id)

  • UNIQUE (uuid)

Indexes:

  • ix_db_dbnode_db_dbnode_ctime (ctime)

  • ix_db_dbnode_db_dbnode_dbcomputer_id (dbcomputer_id)

  • ix_db_dbnode_db_dbnode_label (label)

  • ix_db_dbnode_db_dbnode_mtime (mtime)

  • ix_db_dbnode_db_dbnode_node_type (node_type)

  • ix_db_dbnode_db_dbnode_process_type (process_type)

  • ix_db_dbnode_db_dbnode_user_id (user_id)

  • ix_pat_db_dbnode_label (label)

  • ix_pat_db_dbnode_node_type (node_type)

  • ix_pat_db_dbnode_process_type (process_type)

DbLink (db_dblink)

Database model to store links between aiida.orm.Node.

Each entry in this table contains not only the id information of the two nodes that are linked, but also some extra properties of the link themselves. This includes the type of the link (see the Concepts section for all possible types) as well as a label which is more specific and typically determined by the procedure generating the process node that links the data nodes.

Columns:

id*

INTEGER

→ input_id

INTEGER

→ output_id

INTEGER

label

VARCHAR(255)

type

VARCHAR(255)

Constraints:

  • FOREIGN KEY (db_dbnode.id → input_id)

  • FOREIGN KEY (db_dbnode.id → output_id)

  • PRIMARY KEY (id)

Indexes:

  • ix_db_dblink_db_dblink_input_id (input_id)

  • ix_db_dblink_db_dblink_label (label)

  • ix_db_dblink_db_dblink_output_id (output_id)

  • ix_db_dblink_db_dblink_type (type)

  • ix_pat_db_dblink_label (label)

  • ix_pat_db_dblink_type (type)

DbGroup (db_dbgroup)

Database model to store aiida.orm.Group data.

A group may contain many different nodes, but also each node can be included in different groups.

Users will typically identify and handle groups by using their label (which, unlike the labels in other models, must be unique). Groups also have a type, which serves to identify what plugin is being instanced, and the extras property for users to set any relevant information.

Columns:

id*

INTEGER

uuid*

UUID

label

VARCHAR(255)

type_string

VARCHAR(255)

time

DATETIME

description

TEXT

extras

JSONB

→ user_id

INTEGER

Constraints:

  • FOREIGN KEY (db_dbuser.id → user_id)

  • PRIMARY KEY (id)

  • UNIQUE (label, type_string)

  • UNIQUE (uuid)

Indexes:

  • ix_db_dbgroup_db_dbgroup_label (label)

  • ix_db_dbgroup_db_dbgroup_type_string (type_string)

  • ix_db_dbgroup_db_dbgroup_user_id (user_id)

  • ix_pat_db_dbgroup_label (label)

  • ix_pat_db_dbgroup_type_string (type_string)

DbGroupNode (db_dbgroup_dbnodes)

Database model to store group-to-nodes relations.

Columns:

id*

INTEGER

→ dbnode_id

INTEGER

→ dbgroup_id

INTEGER

Constraints:

  • FOREIGN KEY (db_dbgroup.id → dbgroup_id)

  • FOREIGN KEY (db_dbnode.id → dbnode_id)

  • PRIMARY KEY (id)

  • UNIQUE (dbgroup_id, dbnode_id)

Indexes:

  • ix_db_dbgroup_dbnodes_db_dbgroup_dbnodes_dbgroup_id (dbgroup_id)

  • ix_db_dbgroup_dbnodes_db_dbgroup_dbnodes_dbnode_id (dbnode_id)

DbComputer (db_dbcomputer)

Database model to store data for aiida.orm.Computer.

Computers represent (and contain the information of) the physical hardware resources available. Nodes can be associated with computers if they are remote codes, remote folders, or processes that had run remotely.

Computers are identified within AiiDA by their label (and thus it must be unique for each one in the database), whereas the hostname is the label that identifies the computer within the network from which one can access it.

The scheduler_type column contains the information of the scheduler (and plugin) that the computer uses to manage jobs, whereas the transport_type the information of the transport (and plugin) required to copy files and communicate to and from the computer. The metadata contains some general settings for these communication and management protocols.

Columns:

metadata

JSONB

id*

INTEGER

uuid*

UUID

label*

VARCHAR(255)

hostname

VARCHAR(255)

description

TEXT

scheduler_type

VARCHAR(255)

transport_type

VARCHAR(255)

Constraints:

  • PRIMARY KEY (id)

  • UNIQUE (label)

  • UNIQUE (uuid)

Indexes:

  • ix_pat_db_dbcomputer_label (label)

DbAuthInfo (db_dbauthinfo)

Database model to store data for aiida.orm.AuthInfo, and keep computer authentication data, per user.

Specifications are user-specific of how to submit jobs in the computer. The model also has an enabled logical switch that indicates whether the device is available for use or not. This last one can be set and unset by the user.

Columns:

metadata

JSONB

id*

INTEGER

→ aiidauser_id

INTEGER

→ dbcomputer_id

INTEGER

auth_params

JSONB

enabled

BOOLEAN

Constraints:

  • FOREIGN KEY (db_dbcomputer.id → dbcomputer_id)

  • FOREIGN KEY (db_dbuser.id → aiidauser_id)

  • PRIMARY KEY (id)

  • UNIQUE (aiidauser_id, dbcomputer_id)

Indexes:

  • ix_db_dbauthinfo_db_dbauthinfo_aiidauser_id (aiidauser_id)

  • ix_db_dbauthinfo_db_dbauthinfo_dbcomputer_id (dbcomputer_id)

DbComment (db_dbcomment)

Database model to store data for aiida.orm.Comment.

Comments can be attach to the nodes by the users.

Columns:

id*

INTEGER

uuid*

UUID

→ dbnode_id

INTEGER

ctime

DATETIME

mtime

DATETIME

→ user_id

INTEGER

content

TEXT

Constraints:

  • FOREIGN KEY (db_dbnode.id → dbnode_id)

  • FOREIGN KEY (db_dbuser.id → user_id)

  • PRIMARY KEY (id)

  • UNIQUE (uuid)

Indexes:

  • ix_db_dbcomment_db_dbcomment_dbnode_id (dbnode_id)

  • ix_db_dbcomment_db_dbcomment_user_id (user_id)

DbLog (db_dblog)

Database model to data for aiida.orm.Log, corresponding to aiida.orm.ProcessNode.

Columns:

metadata

JSONB

id*

INTEGER

uuid*

UUID

time

DATETIME

loggername

VARCHAR(255)

What process recorded the message

levelname

VARCHAR(50)

How critical the message is

→ dbnode_id

INTEGER

message

TEXT

Constraints:

  • FOREIGN KEY (db_dbnode.id → dbnode_id)

  • PRIMARY KEY (id)

  • UNIQUE (uuid)

Indexes:

  • ix_db_dblog_db_dblog_dbnode_id (dbnode_id)

  • ix_db_dblog_db_dblog_levelname (levelname)

  • ix_db_dblog_db_dblog_loggername (loggername)

  • ix_pat_db_dblog_levelname (levelname)

  • ix_pat_db_dblog_loggername (loggername)

DbSetting (db_dbsetting)

Database model to store global settings.

Columns:

id*

INTEGER

key*

VARCHAR(1024)

val

JSONB?

description

TEXT

time

DATETIME

Constraints:

  • PRIMARY KEY (id)

  • UNIQUE (key)

Indexes:

  • ix_pat_db_dbsetting_key (key)

The many-to-one relationship

You can see an example of a many-to-one relationship between users and nodes: each node will have one and only one user that has created it, while a single user may have created many nodes. Although in that case the relationship is “mandatory”, this doesn’t need to be the case: for example, not all nodes will have a computer associated with them, but the ones that do will have only one and no more.

The following entities have a many-to-one relationship:

  • Many nodes can be created by the same user.

  • Many nodes can point to the same computer.

  • Many groups can be created by the same user.

  • Many authinfos can be set for the same user.

  • Many authinfos can be set for the same computer.

  • Many comments can be created by the same user.

  • Many comments can be attached to the same node.

  • Many logs can be attached to the same node.

The way to keep track of these relationships is by inserting a foreign key column in the table of the “many” entity that points to the corresponding id value of the “one” entity they are related to. For example, there is a user_id foreign key column in the db_dbnode table that stores the id of the user that created each node.

The many-to-many relationship

This type of relationship is a bit more difficult to track, since now both members can be related to more than one element. Recording this in the same table as one of the entities would imply storing a list of values in a column (which is often discouraged and not well supported). Therefore, it is more convenient to use an extra table in which each of the connections has its corresponding entry indicating which are the specific elements that are related.

There are only two many-to-many relationships in AiiDA:

Between groups and nodes

as specified before, many nodes can be inside the same group and a single node can belong to many different groups. This relationship is tracked in the db_dbgroup_dbnodes table.

Between nodes themselves (Links)

nodes have what is known as a “self-referencing relationship”, meaning that they can be connected among themselves. Indeed, this is one of the core principles of how the provenance graph works. This relationship is tracked in the db_dblinks table.

Storage schema migrations

Migrations of the storage schema, to bring it inline with updates to the aiida-core API, are implemented by PsqlDostoreMigrator , using alembic.

Legacy schema

The psql_dos storage format originates from the merging of the django and sqlalchemy backends, present in aiida-core version 1. Both backends had very similar PostgreSQL database schema, and there are now two separate migration branches to merge these into a single schema.