sqlite_zip (archive) format#

The SqliteZipBackend is the storage format used for the AiiDA archive, whose design draws from consideration outlined in AEP 005: Improved export archive format.

An AiiDA archive is a single file format (with canonical extension .aiida), for long term storage of an AiiDA provenance graph. It provides a data storage backend, integrating a database and file repository.

The standard format is a ZIP archive, containing the following files:

  • metadata.json file containing information on the version of the archive.

  • db.sqlite3 file containing the AiiDA database.

  • repo/ directory containing the AiiDA file repository.

../../_images/archive-file-structure.svg

Fig. 31 sqlite_zip zip file format.#

The central directory is written with the metadata and database records at the top of the file. Zip files are read first from the bottom, which contains the byte position of the start of the central directory, then scanning down the central directory to extract records for each file. When extracting the metadata/database only, one can simply scan for that record, then break and directly decompress the byte array for that file. In this way, we do not have to scan through all the records of the repository files

As opposed to the psql_dos format, this format is “read-only”, since zip files cannot be modified once created.

metadata schema#

This file contains important information, and it is necessary for the correct interpretation of db.sqlite3`. This is used to avoid any incompatibilities among different versions of AiiDA.

Here is an example metadata.json:

{
  "export_version": "1.0",
  "aiida_version": "2.0.0",
  "key_format": "sha256",
  "compression": 6,
  "ctime": "2021-12-01T03:43:11.909061",
  "creation_parameters": {
    "entities_starting_set": {},
    "include_authinfos": false,
    "include_comments": true,
    "include_logs": true,
    "graph_traversal_rules": {
      "input_calc_forward": false,
      "input_calc_backward": true,
      "create_forward": true,
      "create_backward": true,
      "return_forward": true,
      "return_backward": false,
      "input_work_forward": false,
      "input_work_backward": true,
      "call_calc_forward": true,
      "call_calc_backward": true,
      "call_work_forward": true,
      "call_work_backward": true
    },
    "entity_counts": {
      "users": 8,
      "computers": 14,
      "groups": 2,
      "nodes": 109547,
      "links": 159905,
      "group_nodes": 219094
    }
  }
}

At the beginning of the file, we see the version of the archive file (under export_version) and the version of the AiiDA code. New archive versions are introduced for several different reasons; this may generally be when:

  • a change occurs in what can or cannot be exported for each entity,

  • the database and/or archive schemes are updated or changed,

  • or standardized exported property values are updated in AiiDA.

Important

For archives of version 0.3 and older it is advisable that you manually try to convince yourself that the migration was completely successful. While all migrations are tested, trying to include reasonable edge-cases, the migrations involved from version 0.3 to 0.4 are intricate and the possibility of a missing edge-case test is quite real. It is worth noting that if you ever have an issue, please report it on GitHub, join the AiiDA mailing list, or use the contact form.

Note

If you have migrated an archive file to the newest version, there may be an extra entry in metadata.json. This simply states from which archive version the file was migrated.

Note

If you supply an old archive file that the current AiiDA code does not support, verdi archive import will automatically try to migrate the archive by calling verdi archive migrate.

repository format#

The repository is read by the ZipfileBackendRepository.

The zip file should contain repository files with the key format: repo/<sha256 hash>, i.e. files named by the sha256 hash of the file contents, inside a repo directory.

database schema#

The database schema is intended to directly mirror that of the psql_dos format. The only differences are in the handling of certain data types by SQLite versus PostgreSQL:

  • UUID -> CHAR(32)

  • DateTime -> TZDateTime

  • JSONB -> JSON

Also, varchar_pattern_ops indexes are not possible in SQLite.

Tables#

DbUser (db_dbuser)

Database model to store data for aiida.orm.User.

Every node that is created has a single user as its author.

The user information consists of the most basic personal contact details.

Columns:

id*

INTEGER

email*

VARCHAR(254)

first_name

VARCHAR(254)

last_name

VARCHAR(254)

institution

VARCHAR(254)

Constraints:

  • PRIMARY KEY (id)

  • UNIQUE (email)

DbNode (db_dbnode)

Database model to store data for aiida.orm.Node.

Each node can be categorized according to its node_type, which indicates what kind of data or process node it is. Additionally, process nodes also have a process_type that further indicates what is the specific plugin it uses.

Nodes can also store two kind of properties:

  • attributes are determined by the node_type, and are set before storing the node and can’t be modified afterwards.

  • extras, on the other hand, can be added and removed after the node has been stored and are usually set by the user.

Columns:

id*

INTEGER

uuid*

VARCHAR(32)

node_type

VARCHAR(255)

process_type

VARCHAR(255)?

label

VARCHAR(255)

description

TEXT

ctime

DATETIME

mtime

DATETIME

attributes

JSON?

extras

JSON?

repository_metadata

JSON

→ dbcomputer_id

INTEGER?

→ user_id

INTEGER

Constraints:

  • FOREIGN KEY (db_dbcomputer.id → dbcomputer_id)

  • FOREIGN KEY (db_dbuser.id → user_id)

  • PRIMARY KEY (id)

  • UNIQUE (uuid)

Indexes:

  • ix_db_dbnode_db_dbnode_ctime (ctime)

  • ix_db_dbnode_db_dbnode_dbcomputer_id (dbcomputer_id)

  • ix_db_dbnode_db_dbnode_label (label)

  • ix_db_dbnode_db_dbnode_mtime (mtime)

  • ix_db_dbnode_db_dbnode_node_type (node_type)

  • ix_db_dbnode_db_dbnode_process_type (process_type)

  • ix_db_dbnode_db_dbnode_user_id (user_id)

DbLink (db_dblink)

Database model to store links between aiida.orm.Node.

Each entry in this table contains not only the id information of the two nodes that are linked, but also some extra properties of the link themselves. This includes the type of the link (see the Concepts section for all possible types) as well as a label which is more specific and typically determined by the procedure generating the process node that links the data nodes.

Columns:

id*

INTEGER

→ input_id

INTEGER

→ output_id

INTEGER

label

VARCHAR(255)

type

VARCHAR(255)

Constraints:

  • FOREIGN KEY (db_dbnode.id → input_id)

  • FOREIGN KEY (db_dbnode.id → output_id)

  • PRIMARY KEY (id)

Indexes:

  • ix_db_dblink_db_dblink_input_id (input_id)

  • ix_db_dblink_db_dblink_label (label)

  • ix_db_dblink_db_dblink_output_id (output_id)

  • ix_db_dblink_db_dblink_type (type)

DbGroup (db_dbgroup)

Database model to store aiida.orm.Group data.

A group may contain many different nodes, but also each node can be included in different groups.

Users will typically identify and handle groups by using their label (which, unlike the labels in other models, must be unique). Groups also have a type, which serves to identify what plugin is being instanced, and the extras property for users to set any relevant information.

Columns:

id*

INTEGER

uuid*

VARCHAR(32)

label

VARCHAR(255)

type_string

VARCHAR(255)

time

DATETIME

description

TEXT

extras

JSON

→ user_id

INTEGER

Constraints:

  • FOREIGN KEY (db_dbuser.id → user_id)

  • PRIMARY KEY (id)

  • UNIQUE (label, type_string)

  • UNIQUE (uuid)

Indexes:

  • ix_db_dbgroup_db_dbgroup_label (label)

  • ix_db_dbgroup_db_dbgroup_type_string (type_string)

  • ix_db_dbgroup_db_dbgroup_user_id (user_id)

DbGroupNodes (db_dbgroup_dbnodes)

Database model to store group-to-nodes relations.

Columns:

id*

INTEGER

→ dbnode_id

INTEGER

→ dbgroup_id

INTEGER

Constraints:

  • FOREIGN KEY (db_dbgroup.id → dbgroup_id)

  • FOREIGN KEY (db_dbnode.id → dbnode_id)

  • PRIMARY KEY (id)

  • UNIQUE (dbgroup_id, dbnode_id)

Indexes:

  • ix_db_dbgroup_dbnodes_db_dbgroup_dbnodes_dbgroup_id (dbgroup_id)

  • ix_db_dbgroup_dbnodes_db_dbgroup_dbnodes_dbnode_id (dbnode_id)

DbComputer (db_dbcomputer)

Database model to store data for aiida.orm.Computer.

Computers represent (and contain the information of) the physical hardware resources available. Nodes can be associated with computers if they are remote codes, remote folders, or processes that had run remotely.

Computers are identified within AiiDA by their label (and thus it must be unique for each one in the database), whereas the hostname is the label that identifies the computer within the network from which one can access it.

The scheduler_type column contains the information of the scheduler (and plugin) that the computer uses to manage jobs, whereas the transport_type the information of the transport (and plugin) required to copy files and communicate to and from the computer. The metadata contains some general settings for these communication and management protocols.

Columns:

metadata

JSON

id*

INTEGER

uuid*

VARCHAR(32)

label*

VARCHAR(255)

hostname

VARCHAR(255)

description

TEXT

scheduler_type

VARCHAR(255)

transport_type

VARCHAR(255)

Constraints:

  • PRIMARY KEY (id)

  • UNIQUE (label)

  • UNIQUE (uuid)

DbAuthInfo (db_dbauthinfo)

Database model to store data for aiida.orm.AuthInfo, and keep computer authentication data, per user.

Specifications are user-specific of how to submit jobs in the computer. The model also has an enabled logical switch that indicates whether the device is available for use or not. This last one can be set and unset by the user.

Columns:

metadata

JSON

id*

INTEGER

→ aiidauser_id

INTEGER

→ dbcomputer_id

INTEGER

auth_params

JSON

enabled

BOOLEAN

Constraints:

  • FOREIGN KEY (db_dbcomputer.id → dbcomputer_id)

  • FOREIGN KEY (db_dbuser.id → aiidauser_id)

  • PRIMARY KEY (id)

  • UNIQUE (aiidauser_id, dbcomputer_id)

Indexes:

  • ix_db_dbauthinfo_db_dbauthinfo_aiidauser_id (aiidauser_id)

  • ix_db_dbauthinfo_db_dbauthinfo_dbcomputer_id (dbcomputer_id)

DbComment (db_dbcomment)

Database model to store data for aiida.orm.Comment.

Comments can be attach to the nodes by the users.

Columns:

id*

INTEGER

uuid*

VARCHAR(32)

→ dbnode_id

INTEGER

ctime

DATETIME

mtime

DATETIME

→ user_id

INTEGER

content

TEXT

Constraints:

  • FOREIGN KEY (db_dbnode.id → dbnode_id)

  • FOREIGN KEY (db_dbuser.id → user_id)

  • PRIMARY KEY (id)

  • UNIQUE (uuid)

Indexes:

  • ix_db_dbcomment_db_dbcomment_dbnode_id (dbnode_id)

  • ix_db_dbcomment_db_dbcomment_user_id (user_id)

DbLog (db_dblog)

Database model to data for aiida.orm.Log, corresponding to aiida.orm.ProcessNode.

Columns:

metadata

JSON

id*

INTEGER

uuid*

VARCHAR(32)

time

DATETIME

loggername

VARCHAR(255)

What process recorded the message

levelname

VARCHAR(50)

How critical the message is

→ dbnode_id

INTEGER

message

TEXT

Constraints:

  • FOREIGN KEY (db_dbnode.id → dbnode_id)

  • PRIMARY KEY (id)

  • UNIQUE (uuid)

Indexes:

  • ix_db_dblog_db_dblog_dbnode_id (dbnode_id)

  • ix_db_dblog_db_dblog_levelname (levelname)

  • ix_db_dblog_db_dblog_loggername (loggername)