psql_dos
format#
The PsqlDosBackend
is the primary format for storing provenance data.
It stores data in two places:
A PostgreSQL database.
A disk-objectstore repository (see The disk object store).
The database stores all “JSONable” entity data, organized into different tables (closely related to AiiDA ORM entities) and columns/fields. Larger binary data (such as input/output file content), required for nodes, are stored in the disk-objectstore, and referenced by db_dbnode.repository_metadata as a virtual file-system.
Interfacing with the database is achieved using the sqlalchemy ORM API.
The PostgreSQL database schema#
The following section provides a complete schema for the PostgreSQL database.
Tables#
In all tables, the primary key that uniquely identifies each of their members is a positive integer number in the id
field.
However, this number is only unique within the table, and thus there can be a user with an id
of 2 and a node with an id
of 2 in the same database (or, more trivially, two different nodes both with an id
of 2, each in a different database).
Most of the entities also have a uuid
value.
The uuid
is meant to serve as an identifier that is unique within all tables of all AiiDA databases in the world.
This is a 32-position hexadecimal sequence that is stored as a string with some dash separated sections (for example: 479a312d-e9b6-4bbb-93b4-f0a7174ccbf4
).
Note
*
indicates columns with a unique constraint→
indicate foreign keys?
indicate value types that are nullable.
- DbUser (db_dbuser)
Database model to store data for
aiida.orm.User
.Every node that is created has a single user as its author.
The user information consists of the most basic personal contact details.
Columns:
id*
INTEGER
email*
VARCHAR(254)
first_name
VARCHAR(254)
last_name
VARCHAR(254)
institution
VARCHAR(254)
Constraints:
PRIMARY KEY (id)
UNIQUE (email)
Indexes:
ix_pat_db_dbuser_email (email)
- DbNode (db_dbnode)
Database model to store data for
aiida.orm.Node
.Each node can be categorized according to its
node_type
, which indicates what kind of data or process node it is. Additionally, process nodes also have aprocess_type
that further indicates what is the specific plugin it uses.Nodes can also store two kind of properties:
attributes
are determined by thenode_type
, and are set before storing the node and can’t be modified afterwards.extras
, on the other hand, can be added and removed after the node has been stored and are usually set by the user.
Columns:
id*
INTEGER
uuid*
UUID
node_type
VARCHAR(255)
process_type
VARCHAR(255)?
label
VARCHAR(255)
description
TEXT
ctime
DATETIME
mtime
DATETIME
attributes
JSONB?
extras
JSONB?
repository_metadata
JSONB
→ dbcomputer_id
INTEGER?
→ user_id
INTEGER
Constraints:
FOREIGN KEY (db_dbcomputer.id → dbcomputer_id)
FOREIGN KEY (db_dbuser.id → user_id)
PRIMARY KEY (id)
UNIQUE (uuid)
Indexes:
ix_db_dbnode_db_dbnode_ctime (ctime)
ix_db_dbnode_db_dbnode_dbcomputer_id (dbcomputer_id)
ix_db_dbnode_db_dbnode_label (label)
ix_db_dbnode_db_dbnode_mtime (mtime)
ix_db_dbnode_db_dbnode_node_type (node_type)
ix_db_dbnode_db_dbnode_process_type (process_type)
ix_db_dbnode_db_dbnode_user_id (user_id)
ix_pat_db_dbnode_label (label)
ix_pat_db_dbnode_node_type (node_type)
ix_pat_db_dbnode_process_type (process_type)
- DbLink (db_dblink)
Database model to store links between
aiida.orm.Node
.Each entry in this table contains not only the
id
information of the two nodes that are linked, but also some extra properties of the link themselves. This includes thetype
of the link (see the Concepts section for all possible types) as well as alabel
which is more specific and typically determined by the procedure generating the process node that links the data nodes.Columns:
id*
INTEGER
→ input_id
INTEGER
→ output_id
INTEGER
label
VARCHAR(255)
type
VARCHAR(255)
Constraints:
FOREIGN KEY (db_dbnode.id → input_id)
FOREIGN KEY (db_dbnode.id → output_id)
PRIMARY KEY (id)
Indexes:
ix_db_dblink_db_dblink_input_id (input_id)
ix_db_dblink_db_dblink_label (label)
ix_db_dblink_db_dblink_output_id (output_id)
ix_db_dblink_db_dblink_type (type)
ix_pat_db_dblink_label (label)
ix_pat_db_dblink_type (type)
- DbGroup (db_dbgroup)
Database model to store
aiida.orm.Group
data.A group may contain many different nodes, but also each node can be included in different groups.
Users will typically identify and handle groups by using their
label
(which, unlike thelabels
in other models, must be unique). Groups also have atype
, which serves to identify what plugin is being instanced, and theextras
property for users to set any relevant information.Columns:
id*
INTEGER
uuid*
UUID
label
VARCHAR(255)
type_string
VARCHAR(255)
time
DATETIME
description
TEXT
extras
JSONB
→ user_id
INTEGER
Constraints:
FOREIGN KEY (db_dbuser.id → user_id)
PRIMARY KEY (id)
UNIQUE (label, type_string)
UNIQUE (uuid)
Indexes:
ix_db_dbgroup_db_dbgroup_label (label)
ix_db_dbgroup_db_dbgroup_type_string (type_string)
ix_db_dbgroup_db_dbgroup_user_id (user_id)
ix_pat_db_dbgroup_label (label)
ix_pat_db_dbgroup_type_string (type_string)
- DbGroupNode (db_dbgroup_dbnodes)
Database model to store group-to-nodes relations.
Columns:
id*
INTEGER
→ dbnode_id
INTEGER
→ dbgroup_id
INTEGER
Constraints:
FOREIGN KEY (db_dbgroup.id → dbgroup_id)
FOREIGN KEY (db_dbnode.id → dbnode_id)
PRIMARY KEY (id)
UNIQUE (dbgroup_id, dbnode_id)
Indexes:
ix_db_dbgroup_dbnodes_db_dbgroup_dbnodes_dbgroup_id (dbgroup_id)
ix_db_dbgroup_dbnodes_db_dbgroup_dbnodes_dbnode_id (dbnode_id)
- DbComputer (db_dbcomputer)
Database model to store data for
aiida.orm.Computer
.Computers represent (and contain the information of) the physical hardware resources available. Nodes can be associated with computers if they are remote codes, remote folders, or processes that had run remotely.
Computers are identified within AiiDA by their
label
(and thus it must be unique for each one in the database), whereas thehostname
is the label that identifies the computer within the network from which one can access it.The
scheduler_type
column contains the information of the scheduler (and plugin) that the computer uses to manage jobs, whereas thetransport_type
the information of the transport (and plugin) required to copy files and communicate to and from the computer. Themetadata
contains some general settings for these communication and management protocols.Columns:
metadata
JSONB
id*
INTEGER
uuid*
UUID
label*
VARCHAR(255)
hostname
VARCHAR(255)
description
TEXT
scheduler_type
VARCHAR(255)
transport_type
VARCHAR(255)
Constraints:
PRIMARY KEY (id)
UNIQUE (label)
UNIQUE (uuid)
Indexes:
ix_pat_db_dbcomputer_label (label)
- DbAuthInfo (db_dbauthinfo)
Database model to store data for
aiida.orm.AuthInfo
, and keep computer authentication data, per user.Specifications are user-specific of how to submit jobs in the computer. The model also has an
enabled
logical switch that indicates whether the device is available for use or not. This last one can be set and unset by the user.Columns:
metadata
JSONB
id*
INTEGER
→ aiidauser_id
INTEGER
→ dbcomputer_id
INTEGER
auth_params
JSONB
enabled
BOOLEAN
Constraints:
FOREIGN KEY (db_dbcomputer.id → dbcomputer_id)
FOREIGN KEY (db_dbuser.id → aiidauser_id)
PRIMARY KEY (id)
UNIQUE (aiidauser_id, dbcomputer_id)
Indexes:
ix_db_dbauthinfo_db_dbauthinfo_aiidauser_id (aiidauser_id)
ix_db_dbauthinfo_db_dbauthinfo_dbcomputer_id (dbcomputer_id)
- DbComment (db_dbcomment)
Database model to store data for
aiida.orm.Comment
.Comments can be attach to the nodes by the users.
Columns:
id*
INTEGER
uuid*
UUID
→ dbnode_id
INTEGER
ctime
DATETIME
mtime
DATETIME
→ user_id
INTEGER
content
TEXT
Constraints:
FOREIGN KEY (db_dbnode.id → dbnode_id)
FOREIGN KEY (db_dbuser.id → user_id)
PRIMARY KEY (id)
UNIQUE (uuid)
Indexes:
ix_db_dbcomment_db_dbcomment_dbnode_id (dbnode_id)
ix_db_dbcomment_db_dbcomment_user_id (user_id)
- DbLog (db_dblog)
Database model to data for
aiida.orm.Log
, corresponding toaiida.orm.ProcessNode
.Columns:
metadata
JSONB
id*
INTEGER
uuid*
UUID
time
DATETIME
loggername
VARCHAR(255)
What process recorded the message
levelname
VARCHAR(50)
How critical the message is
→ dbnode_id
INTEGER
message
TEXT
Constraints:
FOREIGN KEY (db_dbnode.id → dbnode_id)
PRIMARY KEY (id)
UNIQUE (uuid)
Indexes:
ix_db_dblog_db_dblog_dbnode_id (dbnode_id)
ix_db_dblog_db_dblog_levelname (levelname)
ix_db_dblog_db_dblog_loggername (loggername)
ix_pat_db_dblog_levelname (levelname)
ix_pat_db_dblog_loggername (loggername)
- DbSetting (db_dbsetting)
Database model to store global settings.
Columns:
id*
INTEGER
key*
VARCHAR(1024)
val
JSONB?
description
TEXT
time
DATETIME
Constraints:
PRIMARY KEY (id)
UNIQUE (key)
Indexes:
ix_pat_db_dbsetting_key (key)
The many-to-one relationship#
You can see an example of a many-to-one relationship between users and nodes: each node will have one and only one user that has created it, while a single user may have created many nodes. Although in that case the relationship is “mandatory”, this doesn’t need to be the case: for example, not all nodes will have a computer associated with them, but the ones that do will have only one and no more.
The following entities have a many-to-one relationship:
Many nodes can be created by the same user.
Many nodes can point to the same computer.
Many groups can be created by the same user.
Many authinfos can be set for the same user.
Many authinfos can be set for the same computer.
Many comments can be created by the same user.
Many comments can be attached to the same node.
Many logs can be attached to the same node.
The way to keep track of these relationships is by inserting a foreign key column in the table of the “many” entity that points to the corresponding id value of the “one” entity they are related to.
For example, there is a user_id
foreign key column in the db_dbnode table that stores the id of the user that created each node.
The many-to-many relationship#
This type of relationship is a bit more difficult to track, since now both members can be related to more than one element. Recording this in the same table as one of the entities would imply storing a list of values in a column (which is often discouraged and not well supported). Therefore, it is more convenient to use an extra table in which each of the connections has its corresponding entry indicating which are the specific elements that are related.
There are only two many-to-many relationships in AiiDA:
- Between groups and nodes
as specified before, many nodes can be inside the same group and a single node can belong to many different groups. This relationship is tracked in the db_dbgroup_dbnodes table.
- Between nodes themselves (Links)
nodes have what is known as a “self-referencing relationship”, meaning that they can be connected among themselves. Indeed, this is one of the core principles of how the provenance graph works. This relationship is tracked in the db_dblinks table.
Storage schema migrations#
Migrations of the storage schema, to bring it inline with updates to the aiida-core
API, are implemented by PsqlDosMigrator
, using alembic.
Legacy schema#
The psql_dos storage format originates from the merging of the django and sqlalchemy backends, present in aiida-core version 1. Both backends had very similar PostgreSQL database schema, and there are now two separate migration branches to merge these into a single schema.