`sqlite_zip` (archive) format#

The SqliteZipBackend is the storage format used for the AiiDA archive, whose design draws from consideration outlined in AEP 005: Improved export archive format.

An AiiDA archive is a single file format (with canonical extension .aiida), for long term storage of an AiiDA provenance graph. It provides a data storage backend, integrating a database and file repository.

The standard format is a ZIP archive, containing the following files:

metadata.json file containing information on the version of the archive.
db.sqlite3 file containing the AiiDA database.
repo/ directory containing the AiiDA file repository.

../../_images/archive-file-structure.svg

Fig. 32 sqlite_zip zip file format.#

The central directory is written with the metadata and database records at the top of the file. Zip files are read first from the bottom, which contains the byte position of the start of the central directory, then scanning down the central directory to extract records for each file. When extracting the metadata/database only, one can simply scan for that record, then break and directly decompress the byte array for that file. In this way, we do not have to scan through all the records of the repository files

As opposed to the psql_dos format, this format is “read-only”, since zip files cannot be modified once created.

metadata schema#

This file contains important information, and it is necessary for the correct interpretation of db.sqlite3`. This is used to avoid any incompatibilities among different versions of AiiDA.

Here is an example metadata.json:

{
  "export_version": "1.0",
  "aiida_version": "2.0.0",
  "key_format": "sha256",
  "compression": 6,
  "ctime": "2021-12-01T03:43:11.909061",
  "creation_parameters": {
    "entities_starting_set": {},
    "include_authinfos": false,
    "include_comments": true,
    "include_logs": true,
    "graph_traversal_rules": {
      "input_calc_forward": false,
      "input_calc_backward": true,
      "create_forward": true,
      "create_backward": true,
      "return_forward": true,
      "return_backward": false,
      "input_work_forward": false,
      "input_work_backward": true,
      "call_calc_forward": true,
      "call_calc_backward": true,
      "call_work_forward": true,
      "call_work_backward": true
    },
    "entity_counts": {
      "users": 8,
      "computers": 14,
      "groups": 2,
      "nodes": 109547,
      "links": 159905,
      "group_nodes": 219094
    }
  }
}

At the beginning of the file, we see the version of the archive file (under export_version) and the version of the AiiDA code. New archive versions are introduced for several different reasons; this may generally be when:

a change occurs in what can or cannot be exported for each entity,
the database and/or archive schemes are updated or changed,
or standardized exported property values are updated in AiiDA.

Important

For archives of version 0.3 and older it is advisable that you manually try to convince yourself that the migration was completely successful. While all migrations are tested, trying to include reasonable edge-cases, the migrations involved from version 0.3 to 0.4 are intricate and the possibility of a missing edge-case test is quite real. It is worth noting that if you ever have an issue, please report it on GitHub, join the AiiDA mailing list, or use the contact form.

Note

If you have migrated an archive file to the newest version, there may be an extra entry in metadata.json. This simply states from which archive version the file was migrated.

Note

If you supply an old archive file that the current AiiDA code does not support, verdi archive import will automatically try to migrate the archive by calling verdi archive migrate.

repository format#

The repository is read by the ZipfileBackendRepository.

The zip file should contain repository files with the key format: repo/<sha256 hash>, i.e. files named by the sha256 hash of the file contents, inside a repo directory.

database schema#

The database schema is intended to directly mirror that of the psql_dos format. The only differences are in the handling of certain data types by SQLite versus PostgreSQL:

UUID -> CHAR(32)
DateTime -> TZDateTime
JSONB -> JSON

Also, varchar_pattern_ops indexes are not possible in SQLite.

Tables#

DbUser (db_dbuser)

Database model to store data for aiida.orm.User.

Every node that is created has a single user as its author.

The user information consists of the most basic personal contact details.

Columns:

id*	INTEGER
email*	VARCHAR(254)
first_name	VARCHAR(254)
last_name	VARCHAR(254)
institution	VARCHAR(254)

Constraints:

PRIMARY KEY (id)
UNIQUE (email)

DbNode (db_dbnode)

Database model to store data for aiida.orm.Node.

Each node can be categorized according to its node_type, which indicates what kind of data or process node it is. Additionally, process nodes also have a process_type that further indicates what is the specific plugin it uses.

Nodes can also store two kind of properties:

attributes are determined by the node_type, and are set before storing the node and can’t be modified afterwards.

extras, on the other hand, can be added and removed after the node has been stored and are usually set by the user.

Columns:

id*	INTEGER
uuid*	VARCHAR(32)
node_type	VARCHAR(255)
process_type	VARCHAR(255)?
label	VARCHAR(255)
description	TEXT
ctime	DATETIME
mtime	DATETIME
attributes	JSON?
extras	JSON?
repository_metadata	JSON
→ dbcomputer_id	INTEGER?
→ user_id	INTEGER

Constraints:

FOREIGN KEY (db_dbcomputer.id → dbcomputer_id)
FOREIGN KEY (db_dbuser.id → user_id)
PRIMARY KEY (id)
UNIQUE (uuid)

Indexes:

ix_db_dbnode_db_dbnode_ctime (ctime)
ix_db_dbnode_db_dbnode_dbcomputer_id (dbcomputer_id)
ix_db_dbnode_db_dbnode_label (label)
ix_db_dbnode_db_dbnode_mtime (mtime)
ix_db_dbnode_db_dbnode_node_type (node_type)
ix_db_dbnode_db_dbnode_process_type (process_type)
ix_db_dbnode_db_dbnode_user_id (user_id)

DbLink (db_dblink)

Database model to store links between aiida.orm.Node.

Each entry in this table contains not only the id information of the two nodes that are linked, but also some extra properties of the link themselves. This includes the type of the link (see the Concepts section for all possible types) as well as a label which is more specific and typically determined by the procedure generating the process node that links the data nodes.

Columns:

id*	INTEGER
→ input_id	INTEGER
→ output_id	INTEGER
label	VARCHAR(255)
type	VARCHAR(255)

Constraints:

FOREIGN KEY (db_dbnode.id → input_id)
FOREIGN KEY (db_dbnode.id → output_id)
PRIMARY KEY (id)

Indexes:

ix_db_dblink_db_dblink_input_id (input_id)
ix_db_dblink_db_dblink_label (label)
ix_db_dblink_db_dblink_output_id (output_id)
ix_db_dblink_db_dblink_type (type)

DbGroup (db_dbgroup)

Database model to store aiida.orm.Group data.

A group may contain many different nodes, but also each node can be included in different groups.

Users will typically identify and handle groups by using their label (which, unlike the labels in other models, must be unique). Groups also have a type, which serves to identify what plugin is being instanced, and the extras property for users to set any relevant information.

Columns:

id*	INTEGER
uuid*	VARCHAR(32)
label	VARCHAR(255)
type_string	VARCHAR(255)
time	DATETIME
description	TEXT
extras	JSON
→ user_id	INTEGER

Constraints:

FOREIGN KEY (db_dbuser.id → user_id)
PRIMARY KEY (id)
UNIQUE (label, type_string)
UNIQUE (uuid)

Indexes:

ix_db_dbgroup_db_dbgroup_label (label)
ix_db_dbgroup_db_dbgroup_type_string (type_string)
ix_db_dbgroup_db_dbgroup_user_id (user_id)

DbGroupNodes (db_dbgroup_dbnodes)

Database model to store group-to-nodes relations.

Columns:

id*	INTEGER
→ dbnode_id	INTEGER
→ dbgroup_id	INTEGER

Constraints:

FOREIGN KEY (db_dbgroup.id → dbgroup_id)
FOREIGN KEY (db_dbnode.id → dbnode_id)
PRIMARY KEY (id)
UNIQUE (dbgroup_id, dbnode_id)

Indexes:

ix_db_dbgroup_dbnodes_db_dbgroup_dbnodes_dbgroup_id (dbgroup_id)
ix_db_dbgroup_dbnodes_db_dbgroup_dbnodes_dbnode_id (dbnode_id)

DbComputer (db_dbcomputer)

Database model to store data for aiida.orm.Computer.

Computers represent (and contain the information of) the physical hardware resources available. Nodes can be associated with computers if they are remote codes, remote folders, or processes that had run remotely.

Computers are identified within AiiDA by their label (and thus it must be unique for each one in the database), whereas the hostname is the label that identifies the computer within the network from which one can access it.

The scheduler_type column contains the information of the scheduler (and plugin) that the computer uses to manage jobs, whereas the transport_type the information of the transport (and plugin) required to copy files and communicate to and from the computer. The metadata contains some general settings for these communication and management protocols.

Columns:

metadata	JSON
id*	INTEGER
uuid*	VARCHAR(32)
label*	VARCHAR(255)
hostname	VARCHAR(255)
description	TEXT
scheduler_type	VARCHAR(255)
transport_type	VARCHAR(255)

Constraints:

PRIMARY KEY (id)
UNIQUE (label)
UNIQUE (uuid)

DbAuthInfo (db_dbauthinfo)

Database model to store data for aiida.orm.AuthInfo, and keep computer authentication data, per user.

Specifications are user-specific of how to submit jobs in the computer. The model also has an enabled logical switch that indicates whether the device is available for use or not. This last one can be set and unset by the user.

Columns:

metadata	JSON
id*	INTEGER
→ aiidauser_id	INTEGER
→ dbcomputer_id	INTEGER
auth_params	JSON
enabled	BOOLEAN

Constraints:

FOREIGN KEY (db_dbcomputer.id → dbcomputer_id)
FOREIGN KEY (db_dbuser.id → aiidauser_id)
PRIMARY KEY (id)
UNIQUE (aiidauser_id, dbcomputer_id)

Indexes:

ix_db_dbauthinfo_db_dbauthinfo_aiidauser_id (aiidauser_id)
ix_db_dbauthinfo_db_dbauthinfo_dbcomputer_id (dbcomputer_id)

DbComment (db_dbcomment)

Database model to store data for aiida.orm.Comment.

Comments can be attach to the nodes by the users.

Columns:

id*	INTEGER
uuid*	VARCHAR(32)
→ dbnode_id	INTEGER
ctime	DATETIME
mtime	DATETIME
→ user_id	INTEGER
content	TEXT

Constraints:

FOREIGN KEY (db_dbnode.id → dbnode_id)
FOREIGN KEY (db_dbuser.id → user_id)
PRIMARY KEY (id)
UNIQUE (uuid)

Indexes:

ix_db_dbcomment_db_dbcomment_dbnode_id (dbnode_id)
ix_db_dbcomment_db_dbcomment_user_id (user_id)

DbLog (db_dblog)

Database model to data for aiida.orm.Log, corresponding to aiida.orm.ProcessNode.

Columns:

metadata	JSON
id*	INTEGER
uuid*	VARCHAR(32)
time	DATETIME
loggername	VARCHAR(255)	What process recorded the message
levelname	VARCHAR(50)	How critical the message is
→ dbnode_id	INTEGER
message	TEXT

Constraints:

FOREIGN KEY (db_dbnode.id → dbnode_id)
PRIMARY KEY (id)
UNIQUE (uuid)

Indexes:

ix_db_dblog_db_dblog_dbnode_id (dbnode_id)
ix_db_dblog_db_dblog_levelname (levelname)
ix_db_dblog_db_dblog_loggername (loggername)

psql_dos format

Plugin system

sqlite_zip (archive) format#

metadata schema#

repository format#

database schema#

Tables#

`sqlite_zip` (archive) format#