
Utilities to perform the migrations.


Data structures for mapping legacy JobCalculation data to new process attributes.

class, process_state, exit_status, process_status)#



Return self as a plain tuple. Used by copy and pickle.

__match_args__ = ('state', 'process_state', 'exit_status', 'process_status')#
__module__ = ''#
static __new__(_cls, state, process_state, exit_status, process_status)#

Create new instance of StateMapping(state, process_state, exit_status, process_status)


Return a nicely formatted representation string

__slots__ = ()#

Return a new dict which maps field names to their values.

_field_defaults = {}#
_fields = ('state', 'process_state', 'exit_status', 'process_status')#
classmethod _make(iterable)#

Make a new StateMapping object from a sequence or iterable


Return a new StateMapping object replacing specified fields with new values


Alias for field number 2


Alias for field number 1


Alias for field number 3


Alias for field number 0

Create an old style node attribute/extra, via the db_dbattribute/db_dbextra tables.

Adapted from: aiida/backends/djsite/db/migrations/ str, value, node_id: int) list[dict][源代码]#

Create an old style node attribute/extra, via the db_dbattribute/db_dbextra tables.


No hits are done on the DB, in particular no check is done on the existence of the given nodes.

  • key – a string with the key to create (can contain the separator self._sep if this is a sub-attribute: indeed, this function calls itself recursively)

  • value – the value to store (a basic data type or a list or a dict)

  • node_id – the node id to store the attribute/extra


A list of column name -> value dictionaries, with which to instantiate database rows

Shared function for django_0024 and sqlalchemy ea2f50e7f615, profile)[源代码]#

Export the logs records that correspond to legacy workflows and to unknown entities (place them to files and remove them from the DbLog table).[源代码]#

Get the number of the log records that correspond to legacy workflows[源代码]#

Get the number of the log records that correspond to nodes that were deleted[源代码]#

Get the serialized log records that correspond to legacy workflows[源代码]#

Get the serialized log records that correspond to nodes that were deleted[源代码]#

Get the serialized log records that correspond to unknown entities[源代码]#

Get the number of the log records that correspond to unknown entities[源代码]#

Set new and distinct UUIDs to all the logs

Generic functions to verify the integrity of the database and optionally apply patches to fix problems. str, connection)[源代码]#

Check whether database table contains rows with duplicate UUIDS. str, connection)[源代码]#

Check whether database table contains rows with duplicate UUIDS.

Methods to validate the database integrity and fix violations., hash_extra_key: str, entry_point_string: str | None = None) None[源代码]#

Drop hashes of nodes.

Print warning only if the DB actually contains nodes.

  • hash_extra_key – The key in the extras used to store the hash at the time of this migration.

  • entry_point_string – Optional entry point string of a node type to narrow the subset of nodes to reset. The value should be a complete entry point string, e.g., aiida.node:process.calculation.calcjob to drop the hash of all CalcJobNode rows.[源代码]#

Try to infer a calculation entry point name for all the calculation type strings that are found in the database.

Before the plugin system was introduced, the type column of the node table was a string based on the base node type with the module path and class name appended. For example, for the PwCalculation class, which was a sub class of JobCalculation, would get as its type string. At this point, the JobCalculation also still fullfilled the role of both the Process class as well as the Node class. In the migration for v1.0.0, this had to be migrated, where the type became that of the actual node i.e. node.process.calculation.calcjob.CalcJobNode. which would lose the information of which actual sub class it represented. This information should be stored in the process_type column, where the value is the name of the entry point of that calculation class.

This function will, for a given set of calculation type strings of pre v1.0.0, try to map them on the known entry points for the calculation category. This is the union of those entry points registered at the AiiDA registry (see the mapping above) and those available in the environment in which this function is ran.

If a type string cannot be mapped onto an entry point name, a fallback process_type string will be generated which is based on part of the old type string. For example, calculation.job.unknown.UnknownCalculation. would get the process type string ~unknown.UnknownCalculation.

The function will return a mapping of type strings onto their inferred process type strings.


type_strings – a set of type strings whose entry point is to be inferred


a mapping of current node type string to the inferred entry point name, headers, reason_message, action_message=None)[源代码]#

Emit a integrity violation warning and write the violating records to a log file in the current directory

  • results – a list of tuples representing the violating records

  • headers – a tuple of strings that will be used as a header for the log file. Should have the same length as each tuple in the results list.

  • reason_message – a human readable message detailing the reason of the integrity violation

  • action_message – an optional human readable message detailing a performed action, if any

Utilities for removing legacy workflows., profile)[源代码]#

Export existing legacy workflow data to a JSON file.[源代码]#

JSON serializer for objects not serializable by default json code

“Migrate the file repository to the new disk object store based implementation., profile)[源代码]#

Migrations for the upgrade.

Utilities for synchronizing the django and sqlalchemy schema. <module 'alembic.op' from '/home/docs/checkouts/'>) None[源代码]#

This function is used by the final migration step, of django/sqlalchemy branches, to synchronize their schemas.

  1. Remove and recreate all (non-unique) indexes, with standard names and postgresql ops.

  2. Remove and recreate all unique constraints, with standard names.

  3. Remove and recreate all foreign key constraints, with standard names and other rules.

Schema naming conventions are defined aiida/storage/sqlalchemy/models/

Note we assume here that (a) all primary keys are already correct, and (b) there are no check constraints.

SQL statements to detect invalid/understood links for the provenance redesign migration.

Scan the database for any links that are unexpected.

The checks will verify that there are no outgoing call or return links from calculation nodes and that if a workflow node has a create link, it has at least an accompanying return link to the same data node, or it has a call link to a calculation node that takes the created data node as input.[源代码]#

Set the process type for calculation nodes by inferring it from their type string.

Utility for performing schema migrations, via reflection of the current database.

class <module 'alembic.op' from '/home/docs/checkouts/'>)[源代码]#


Perform schema migrations, via reflection of the current database.

In django, it is not possible to explicitly specify constraints/indexes and their names, instead they are implicitly created by internal “auto-generation” code (as opposed to sqlalchemy, where one can explicitly specify the names). For a specific django version, this auto-generation code is deterministic, however, over time it has changed. So is not possible to know declaratively exactly what constraints/indexes are present on a users database, withtout knowing the exact django version that created it (and run migrations). Therefore, we need to reflect the database’s schema, to determine what is present on the database, to know what to drop.

__dict__ = mappingproxy({'__module__': '', '__doc__': 'Perform schema migrations, via reflection of the current database.\n\n    In django, it is not possible to explicitly specify constraints/indexes and their names,\n    instead they are implicitly created by internal "auto-generation" code\n    (as opposed to sqlalchemy, where one can explicitly specify the names).\n    For a specific django version, this auto-generation code is deterministic,\n    however, over time it has changed.\n    So is not possible to know declaratively exactly what constraints/indexes are present on a users database,\n    withtout knowing the exact django version that created it (and run migrations).\n    Therefore, we need to reflect the database\'s schema, to determine what is present on the database,\n    to know what to drop.\n    ', '__init__': <function ReflectMigrations.__init__>, 'reset_cache': <function ReflectMigrations.reset_cache>, 'drop_all_unique_constraints': <function ReflectMigrations.drop_all_unique_constraints>, 'drop_unique_constraints': <function ReflectMigrations.drop_unique_constraints>, 'drop_all_indexes': <function ReflectMigrations.drop_all_indexes>, 'drop_indexes': <function ReflectMigrations.drop_indexes>, 'drop_all_foreign_keys': <function ReflectMigrations.drop_all_foreign_keys>, 'drop_foreign_keys': <function ReflectMigrations.drop_foreign_keys>, 'replace_index': <function ReflectMigrations.replace_index>, 'replace_unique_constraint': <function ReflectMigrations.replace_unique_constraint>, 'replace_foreign_key': <function ReflectMigrations.replace_foreign_key>, '__dict__': <attribute '__dict__' of 'ReflectMigrations' objects>, '__weakref__': <attribute '__weakref__' of 'ReflectMigrations' objects>, '__annotations__': {}})#
__init__(op: <module 'alembic.op' from '/home/docs/checkouts/'>) None[源代码]#
__module__ = ''#

list of weak references to the object (if defined)

drop_all_foreign_keys(table_name: str) None[源代码]#

Drop all foreign keys set for this table.

drop_all_indexes(table_name: str, unique: bool = False) None[源代码]#

Drop all non-unique indexes set for this table.

drop_all_unique_constraints(table_name: str) None[源代码]#

Drop all unique constraints set for this table.

drop_foreign_keys(table_name: str, columns: list[str], ref_tbl: str, ref_columns: list[str]) None[源代码]#

Drop all foreign keys set for this column name group and referring column set.

drop_indexes(table_name: str, column: str | list[str], unique: bool = False) None[源代码]#

Drop all indexes set for this column name group.

drop_unique_constraints(table_name: str, column_names: list[str]) None[源代码]#

Drop all unique constraints set for this column name group.

replace_foreign_key(label: str, table_name: str, columns: list[str], ref_tbl: str, ref_columns: list[str], **kwargs) None[源代码]#

Create foreign key, dropping any existing foreign key with the same constraints.

replace_index(label: str, table_name: str, column: str, unique: bool = False) None[源代码]#

Create index, dropping any existing index with the same table+columns.

replace_unique_constraint(label: str, table_name: str, columns: list[str]) None[源代码]#

Create unique constraint, dropping any existing unique constraint with the same table+columns.

reset_cache() None[源代码]#

Reset the inspector cache.

Various utils that should be used during migrations and migrations tests because the AiiDA ORM cannot be used.

class str = '', file_type: FileType = FileType.DIRECTORY, key: str | None | LazyOpener = None, objects: Dict[str, File] | None = None)[源代码]#


Subclass of File where key also allows LazyOpener in addition to a string.

This subclass is necessary because the migration will be storing instances of LazyOpener as the key which should normally only be a string. This subclass updates the key type check to allow this.

__annotations__ = {}#
__init__(name: str = '', file_type: FileType = FileType.DIRECTORY, key: str | None | LazyOpener = None, objects: Dict[str, File] | None = None)[源代码]#

Construct a new instance.

  • name – The final element of the file path

  • file_type – Identifies whether the File is a file or a directory

  • key – A key to map the file to its contents in the backend repository (file only)

  • objects – Mapping of child names to child Files (directory only)


ValueError – If a key is defined for a directory, or objects are defined for a file

__module__ = ''#
class AbstractRepositoryBackend | None = None)[源代码]#


Subclass of Repository that uses LazyFile instead of File as its file class.

__annotations__ = {}#
__module__ = ''#

LazyFile 的别名



Implementation of the AbstractRepositoryBackend where all write operations are no-ops.

This repository backend is used to use the Repository interface to build repository metadata but instead of actually writing the content of the current repository to disk elsewhere, it will simply open a lazy file opener. In a subsequent step, all these streams are passed to the new Disk Object Store that will write their content directly to pack files for optimal efficiency.

__abstractmethods__ = frozenset({})#
__module__ = ''#
_abc_impl = <_abc._abc_data object>#
_put_object_from_filelike(handle: BufferedIOBase) str[源代码]#
delete_objects(keys: List[str]) None[源代码]#

Delete the objects from the repository.


keys – list of fully qualified identifiers for the objects within the repository.


Delete the repository itself and all its contents.


This should not merely delete the contents of the repository but any resources it created. For example, if the repository is essentially a folder on disk, the folder itself should also be deleted, not just its contents.

get_info(detailed: bool = False, **kwargs) dict[源代码]#

Returns relevant information about the content of the repository.


detailed – flag to enable extra information (detailed=False by default, only returns basic information).


a dictionary with the information.

has_objects(keys: List[str]) List[bool][源代码]#

Return whether the repository has an object with the given key.


keys – list of fully qualified identifiers for objects within the repository.


list of logicals, in the same order as the keys provided, with value True if the respective object exists and False otherwise.

initialise(**kwargs) None[源代码]#

Initialise the repository if it hasn’t already been initialised.


kwargs – parameters for the initialisation.

property is_initialised: bool#

Return whether the repository has been initialised.

iter_object_streams(keys: List[str])[源代码]#

Return an iterator over the (read-only) byte streams of objects identified by key.


handles should only be read within the context of this iterator.


keys – fully qualified identifiers for the objects within the repository.


an iterator over the object byte streams.

property key_format: str | None#

Return the format for the keys of the repository.

Important for when migrating between backends (e.g. archive -> main), as if they are not equal then it is necessary to re-compute all the Node.base.repository.metadata before importing (otherwise they will not match with the repository).

list_objects() Iterable[str][源代码]#

Return iterable that yields all available objects by key.


An iterable for all the available object keys.

maintain(dry_run: bool = False, live: bool = True, **kwargs) None[源代码]#

Performs maintenance operations.

  • dry_run – flag to only print the actions that would be taken without actually executing them.

  • live – flag to indicate to the backend whether AiiDA is live or not (i.e. if the profile of the backend is currently being used/accessed). The backend is expected then to only allow (and thus set by default) the operations that are safe to perform in this state.

property uuid: str | None#

Return the unique identifier of the repository.


A sandbox folder does not have the concept of a unique identifier and so always returns None., uuid, name)[源代码]#

Delete the numpy array with a given name from the repository corresponding to a node with a given uuid.

  • uuid – the UUID of the node

  • name – the name of the numpy array[源代码]#

Transforms all datetime object into isoformat and then returns the JSON., uuid)[源代码]#

Make sure that the repository sub folder for the node with the given UUID exists or create it.


uuid – UUID of the node, basepath, shard=None)[源代码]#

Return a mapping of node UUIDs onto the path to their current repository folder in the old repository.

  • basepath – the absolute path of the base folder of the old file repository.

  • shard – optional shard to define which first shard level to check. If None, all shard levels are checked.


dictionary of node UUID onto absolute filepath and list of node repo missing one of the two known sub folders, path or raw_input, which is unexpected.


StorageMigrationError – if the repository contains node folders that contain both the path and raw_input subdirectories, which should never happen., uuid, subfolder='path')[源代码]#

Return the absolute path to the sub folder path within the repository of the node with the given UUID.


uuid – UUID of the node


absolute path to node repository folder, i.e /some/path/repository/node/12/ab/c123134-a123/path, uuid, name)[源代码]#

Return the absolute path of a numpy array with the given name in the repository of the node with the given uuid.

  • uuid – the UUID of the node

  • name – the name of the numpy array


the absolute path of the numpy array file, hashkey)[源代码]#

Return the content of an object stored in the disk object store repository for the given hashkey., uuid, name)[源代码]#

Load and return a numpy array from the repository folder of a node.

  • uuid – the node UUID

  • name – the name under which to store the array


the numpy array, shard=None)[源代码]#

Migrate the legacy file repository to the new disk object store and return mapping of repository metadata.


this method assumes that the new disk object store container has been initialized.

The format of the return value will be a dictionary where the keys are the UUIDs of the nodes whose repository folder has contents have been migrated to the disk object store. The values are the repository metadata that contain the keys for the generated files with which the files in the disk object store can be retrieved. The format of the repository metadata follows exactly that of what is generated normally by the ORM.

This implementation consciously uses the Repository interface in order to not have to rewrite the logic that builds the nested repository metadata based on the contents of a folder on disk. The advantage is that in this way it is guarantee that the exact same repository metadata is generated as it would have during normal operation. However, if the Repository interface or its implementation ever changes, it is possible that this solution will have to be adapted and the significant parts of the implementation will have to be copy pasted here.


mapping of node UUIDs onto the new repository metadata., uuid, name, content)[源代码]#

Write a file with the given content in the repository sub folder of the given node.

  • uuid – UUID of the node

  • name – name to use for the file

  • content – the content to write to the file[源代码]#

Convert all datetime objects in the given value to string representations in ISO format.


value – a mapping, sequence or single value optionally containing datetime objects Repository) dict[源代码]#

Serialize the metadata into a JSON-serializable format.


the serialization format is optimized to reduce the size in bytes.


dictionary with the content metadata., uuid, name, array)[源代码]#

Store a numpy array in the repository folder of a node.

  • uuid – the node UUID

  • name – the name under which to store the array

  • array – the numpy array to store