sqlite_zip
(archive) format#
The SqliteZipBackend
is the storage format used for the AiiDA archive,
whose design draws from consideration outlined in AEP 005: Improved export archive format.
An AiiDA archive is a single file format (with canonical extension .aiida
), for long term storage of an AiiDA provenance graph.
It provides a data storage backend, integrating a database and file repository.
The standard format is a ZIP archive, containing the following files:
metadata.json
file containing information on the version of the archive.db.sqlite3
file containing the AiiDA database.repo/
directory containing the AiiDA file repository.
The central directory is written with the metadata and database records at the top of the file. Zip files are read first from the bottom, which contains the byte position of the start of the central directory, then scanning down the central directory to extract records for each file. When extracting the metadata/database only, one can simply scan for that record, then break and directly decompress the byte array for that file. In this way, we do not have to scan through all the records of the repository files
As opposed to the psql_dos format, this format is “read-only”, since zip files cannot be modified once created.
metadata schema#
This file contains important information, and it is necessary for the correct interpretation of db.sqlite3`
.
This is used to avoid any incompatibilities among different versions of AiiDA.
Here is an example metadata.json
:
{
"export_version": "1.0",
"aiida_version": "2.0.0",
"key_format": "sha256",
"compression": 6,
"ctime": "2021-12-01T03:43:11.909061",
"creation_parameters": {
"entities_starting_set": {},
"include_authinfos": false,
"include_comments": true,
"include_logs": true,
"graph_traversal_rules": {
"input_calc_forward": false,
"input_calc_backward": true,
"create_forward": true,
"create_backward": true,
"return_forward": true,
"return_backward": false,
"input_work_forward": false,
"input_work_backward": true,
"call_calc_forward": true,
"call_calc_backward": true,
"call_work_forward": true,
"call_work_backward": true
},
"entity_counts": {
"users": 8,
"computers": 14,
"groups": 2,
"nodes": 109547,
"links": 159905,
"group_nodes": 219094
}
}
}
At the beginning of the file, we see the version of the archive file (under export_version
) and the version of the AiiDA code.
New archive versions are introduced for several different reasons; this may generally be when:
a change occurs in what can or cannot be exported for each entity,
the database and/or archive schemes are updated or changed,
or standardized exported property values are updated in AiiDA.
Important
For archives of version 0.3 and older it is advisable that you manually try to convince yourself that the migration was completely successful. While all migrations are tested, trying to include reasonable edge-cases, the migrations involved from version 0.3 to 0.4 are intricate and the possibility of a missing edge-case test is quite real. It is worth noting that if you ever have an issue, please report it on GitHub, join the AiiDA mailing list, or use the contact form.
Note
If you have migrated an archive file to the newest version, there may be an extra entry in metadata.json
.
This simply states from which archive version the file was migrated.
Note
If you supply an old archive file that the current AiiDA code does not support, verdi archive import
will automatically try to migrate the archive by calling verdi archive migrate
.
repository format#
The repository is read by the ZipfileBackendRepository
.
The zip file should contain repository files with the key format: repo/<sha256 hash>
, i.e. files named by the sha256 hash of the file contents, inside a repo
directory.
database schema#
The database schema is intended to directly mirror that of the psql_dos format. The only differences are in the handling of certain data types by SQLite versus PostgreSQL:
UUID
->CHAR(32)
DateTime
->TZDateTime
JSONB
->JSON
Also, varchar_pattern_ops indexes are not possible in SQLite.
Tables#
- DbUser (db_dbuser)
Database model to store data for
aiida.orm.User
.Every node that is created has a single user as its author.
The user information consists of the most basic personal contact details.
Columns:
id*
INTEGER
email*
VARCHAR(254)
first_name
VARCHAR(254)
last_name
VARCHAR(254)
institution
VARCHAR(254)
Constraints:
PRIMARY KEY (id)
UNIQUE (email)
- DbNode (db_dbnode)
Database model to store data for
aiida.orm.Node
.Each node can be categorized according to its
node_type
, which indicates what kind of data or process node it is. Additionally, process nodes also have aprocess_type
that further indicates what is the specific plugin it uses.Nodes can also store two kind of properties:
attributes
are determined by thenode_type
, and are set before storing the node and can’t be modified afterwards.extras
, on the other hand, can be added and removed after the node has been stored and are usually set by the user.
Columns:
id*
INTEGER
uuid*
VARCHAR(32)
node_type
VARCHAR(255)
process_type
VARCHAR(255)?
label
VARCHAR(255)
description
TEXT
ctime
DATETIME
mtime
DATETIME
attributes
JSON?
extras
JSON?
repository_metadata
JSON
→ dbcomputer_id
INTEGER?
→ user_id
INTEGER
Constraints:
FOREIGN KEY (db_dbcomputer.id → dbcomputer_id)
FOREIGN KEY (db_dbuser.id → user_id)
PRIMARY KEY (id)
UNIQUE (uuid)
Indexes:
ix_db_dbnode_db_dbnode_ctime (ctime)
ix_db_dbnode_db_dbnode_dbcomputer_id (dbcomputer_id)
ix_db_dbnode_db_dbnode_label (label)
ix_db_dbnode_db_dbnode_mtime (mtime)
ix_db_dbnode_db_dbnode_node_type (node_type)
ix_db_dbnode_db_dbnode_process_type (process_type)
ix_db_dbnode_db_dbnode_user_id (user_id)
- DbLink (db_dblink)
Database model to store links between
aiida.orm.Node
.Each entry in this table contains not only the
id
information of the two nodes that are linked, but also some extra properties of the link themselves. This includes thetype
of the link (see the Concepts section for all possible types) as well as alabel
which is more specific and typically determined by the procedure generating the process node that links the data nodes.Columns:
id*
INTEGER
→ input_id
INTEGER
→ output_id
INTEGER
label
VARCHAR(255)
type
VARCHAR(255)
Constraints:
FOREIGN KEY (db_dbnode.id → input_id)
FOREIGN KEY (db_dbnode.id → output_id)
PRIMARY KEY (id)
Indexes:
ix_db_dblink_db_dblink_input_id (input_id)
ix_db_dblink_db_dblink_label (label)
ix_db_dblink_db_dblink_output_id (output_id)
ix_db_dblink_db_dblink_type (type)
- DbGroup (db_dbgroup)
Database model to store
aiida.orm.Group
data.A group may contain many different nodes, but also each node can be included in different groups.
Users will typically identify and handle groups by using their
label
(which, unlike thelabels
in other models, must be unique). Groups also have atype
, which serves to identify what plugin is being instanced, and theextras
property for users to set any relevant information.Columns:
id*
INTEGER
uuid*
VARCHAR(32)
label
VARCHAR(255)
type_string
VARCHAR(255)
time
DATETIME
description
TEXT
extras
JSON
→ user_id
INTEGER
Constraints:
FOREIGN KEY (db_dbuser.id → user_id)
PRIMARY KEY (id)
UNIQUE (label, type_string)
UNIQUE (uuid)
Indexes:
ix_db_dbgroup_db_dbgroup_label (label)
ix_db_dbgroup_db_dbgroup_type_string (type_string)
ix_db_dbgroup_db_dbgroup_user_id (user_id)
- DbGroupNodes (db_dbgroup_dbnodes)
Database model to store group-to-nodes relations.
Columns:
id*
INTEGER
→ dbnode_id
INTEGER
→ dbgroup_id
INTEGER
Constraints:
FOREIGN KEY (db_dbgroup.id → dbgroup_id)
FOREIGN KEY (db_dbnode.id → dbnode_id)
PRIMARY KEY (id)
UNIQUE (dbgroup_id, dbnode_id)
Indexes:
ix_db_dbgroup_dbnodes_db_dbgroup_dbnodes_dbgroup_id (dbgroup_id)
ix_db_dbgroup_dbnodes_db_dbgroup_dbnodes_dbnode_id (dbnode_id)
- DbComputer (db_dbcomputer)
Database model to store data for
aiida.orm.Computer
.Computers represent (and contain the information of) the physical hardware resources available. Nodes can be associated with computers if they are remote codes, remote folders, or processes that had run remotely.
Computers are identified within AiiDA by their
label
(and thus it must be unique for each one in the database), whereas thehostname
is the label that identifies the computer within the network from which one can access it.The
scheduler_type
column contains the information of the scheduler (and plugin) that the computer uses to manage jobs, whereas thetransport_type
the information of the transport (and plugin) required to copy files and communicate to and from the computer. Themetadata
contains some general settings for these communication and management protocols.Columns:
metadata
JSON
id*
INTEGER
uuid*
VARCHAR(32)
label*
VARCHAR(255)
hostname
VARCHAR(255)
description
TEXT
scheduler_type
VARCHAR(255)
transport_type
VARCHAR(255)
Constraints:
PRIMARY KEY (id)
UNIQUE (label)
UNIQUE (uuid)
- DbAuthInfo (db_dbauthinfo)
Database model to store data for
aiida.orm.AuthInfo
, and keep computer authentication data, per user.Specifications are user-specific of how to submit jobs in the computer. The model also has an
enabled
logical switch that indicates whether the device is available for use or not. This last one can be set and unset by the user.Columns:
metadata
JSON
id*
INTEGER
→ aiidauser_id
INTEGER
→ dbcomputer_id
INTEGER
auth_params
JSON
enabled
BOOLEAN
Constraints:
FOREIGN KEY (db_dbcomputer.id → dbcomputer_id)
FOREIGN KEY (db_dbuser.id → aiidauser_id)
PRIMARY KEY (id)
UNIQUE (aiidauser_id, dbcomputer_id)
Indexes:
ix_db_dbauthinfo_db_dbauthinfo_aiidauser_id (aiidauser_id)
ix_db_dbauthinfo_db_dbauthinfo_dbcomputer_id (dbcomputer_id)
- DbComment (db_dbcomment)
Database model to store data for
aiida.orm.Comment
.Comments can be attach to the nodes by the users.
Columns:
id*
INTEGER
uuid*
VARCHAR(32)
→ dbnode_id
INTEGER
ctime
DATETIME
mtime
DATETIME
→ user_id
INTEGER
content
TEXT
Constraints:
FOREIGN KEY (db_dbnode.id → dbnode_id)
FOREIGN KEY (db_dbuser.id → user_id)
PRIMARY KEY (id)
UNIQUE (uuid)
Indexes:
ix_db_dbcomment_db_dbcomment_dbnode_id (dbnode_id)
ix_db_dbcomment_db_dbcomment_user_id (user_id)
- DbLog (db_dblog)
Database model to data for
aiida.orm.Log
, corresponding toaiida.orm.ProcessNode
.Columns:
metadata
JSON
id*
INTEGER
uuid*
VARCHAR(32)
time
DATETIME
loggername
VARCHAR(255)
What process recorded the message
levelname
VARCHAR(50)
How critical the message is
→ dbnode_id
INTEGER
message
TEXT
Constraints:
FOREIGN KEY (db_dbnode.id → dbnode_id)
PRIMARY KEY (id)
UNIQUE (uuid)
Indexes:
ix_db_dblog_db_dblog_dbnode_id (dbnode_id)
ix_db_dblog_db_dblog_levelname (levelname)
ix_db_dblog_db_dblog_loggername (loggername)