aiida.storage.psql_dos.migrations.utils package#

Utilities to perform the migrations.

Submodules#

Data structures for mapping legacy JobCalculation data to new process attributes.

class aiida.storage.psql_dos.migrations.utils.calc_state.StateMapping(state, process_state, exit_status, process_status)#

Bases: tuple

__getnewargs__()#

Return self as a plain tuple. Used by copy and pickle.

__match_args__ = ('state', 'process_state', 'exit_status', 'process_status')#
__module__ = 'aiida.storage.psql_dos.migrations.utils.calc_state'#
static __new__(_cls, state, process_state, exit_status, process_status)#

Create new instance of StateMapping(state, process_state, exit_status, process_status)

__repr__()#

Return a nicely formatted representation string

__slots__ = ()#
_asdict()#

Return a new dict which maps field names to their values.

_field_defaults = {}#
_fields = ('state', 'process_state', 'exit_status', 'process_status')#
classmethod _make(iterable)#

Make a new StateMapping object from a sequence or iterable

_replace(**kwds)#

Return a new StateMapping object replacing specified fields with new values

exit_status#

Alias for field number 2

process_state#

Alias for field number 1

process_status#

Alias for field number 3

state#

Alias for field number 0

Create an old style node attribute/extra, via the db_dbattribute/db_dbextra tables.

Adapted from: aiida/backends/djsite/db/migrations/__init__.py

aiida.storage.psql_dos.migrations.utils.create_dbattribute.create_rows(key: str, value, node_id: int) list[dict][source]#

Create an old style node attribute/extra, via the db_dbattribute/db_dbextra tables.

Note:

No hits are done on the DB, in particular no check is done on the existence of the given nodes.

Parameters:
  • key – a string with the key to create (can contain the separator self._sep if this is a sub-attribute: indeed, this function calls itself recursively)

  • value – the value to store (a basic data type or a list or a dict)

  • node_id – the node id to store the attribute/extra

Returns:

A list of column name -> value dictionaries, with which to instantiate database rows

Shared function for django_0024 and sqlalchemy ea2f50e7f615

aiida.storage.psql_dos.migrations.utils.dblog_update.export_and_clean_workflow_logs(connection, profile)[source]#

Export the logs records that correspond to legacy workflows and to unknown entities (place them to files and remove them from the DbLog table).

aiida.storage.psql_dos.migrations.utils.dblog_update.get_legacy_workflow_log_number(connection)[source]#

Get the number of the log records that correspond to legacy workflows

aiida.storage.psql_dos.migrations.utils.dblog_update.get_logs_with_no_nodes_number(connection)[source]#

Get the number of the log records that correspond to nodes that were deleted

aiida.storage.psql_dos.migrations.utils.dblog_update.get_serialized_legacy_workflow_logs(connection)[source]#

Get the serialized log records that correspond to legacy workflows

aiida.storage.psql_dos.migrations.utils.dblog_update.get_serialized_logs_with_no_nodes(connection)[source]#

Get the serialized log records that correspond to nodes that were deleted

aiida.storage.psql_dos.migrations.utils.dblog_update.get_serialized_unknown_entity_logs(connection)[source]#

Get the serialized log records that correspond to unknown entities

aiida.storage.psql_dos.migrations.utils.dblog_update.get_unknown_entity_log_number(connection)[source]#

Get the number of the log records that correspond to unknown entities

aiida.storage.psql_dos.migrations.utils.dblog_update.set_new_uuid(connection)[source]#

Set new and distinct UUIDs to all the logs

Generic functions to verify the integrity of the database and optionally apply patches to fix problems.

aiida.storage.psql_dos.migrations.utils.duplicate_uuids._get_duplicate_uuids(table: str, connection)[source]#

Check whether database table contains rows with duplicate UUIDS.

aiida.storage.psql_dos.migrations.utils.duplicate_uuids.verify_uuid_uniqueness(table: str, connection)[source]#

Check whether database table contains rows with duplicate UUIDS.

Methods to validate the database integrity and fix violations.

aiida.storage.psql_dos.migrations.utils.integrity.drop_hashes(conn, hash_extra_key: str, entry_point_string: str | None = None) None[source]#

Drop hashes of nodes.

Print warning only if the DB actually contains nodes.

Parameters:
  • hash_extra_key – The key in the extras used to store the hash at the time of this migration.

  • entry_point_string – Optional entry point string of a node type to narrow the subset of nodes to reset. The value should be a complete entry point string, e.g., aiida.node:process.calculation.calcjob to drop the hash of all CalcJobNode rows.

aiida.storage.psql_dos.migrations.utils.integrity.infer_calculation_entry_point(type_strings)[source]#

Try to infer a calculation entry point name for all the calculation type strings that are found in the database.

Before the plugin system was introduced, the type column of the node table was a string based on the base node type with the module path and class name appended. For example, for the PwCalculation class, which was a sub class of JobCalculation, would get calculation.job.quantumespresso.pw.PwCalculation. as its type string. At this point, the JobCalculation also still fullfilled the role of both the Process class as well as the Node class. In the migration for v1.0.0, this had to be migrated, where the type became that of the actual node i.e. node.process.calculation.calcjob.CalcJobNode. which would lose the information of which actual sub class it represented. This information should be stored in the process_type column, where the value is the name of the entry point of that calculation class.

This function will, for a given set of calculation type strings of pre v1.0.0, try to map them on the known entry points for the calculation category. This is the union of those entry points registered at the AiiDA registry (see the mapping above) and those available in the environment in which this function is ran.

If a type string cannot be mapped onto an entry point name, a fallback process_type string will be generated which is based on part of the old type string. For example, calculation.job.unknown.UnknownCalculation. would get the process type string ~unknown.UnknownCalculation.

The function will return a mapping of type strings onto their inferred process type strings.

Parameters:

type_strings – a set of type strings whose entry point is to be inferred

Returns:

a mapping of current node type string to the inferred entry point name

aiida.storage.psql_dos.migrations.utils.integrity.write_database_integrity_violation(results, headers, reason_message, action_message=None)[source]#

Emit a integrity violation warning and write the violating records to a log file in the current directory

Parameters:
  • results – a list of tuples representing the violating records

  • headers – a tuple of strings that will be used as a header for the log file. Should have the same length as each tuple in the results list.

  • reason_message – a human readable message detailing the reason of the integrity violation

  • action_message – an optional human readable message detailing a performed action, if any

Utilities for removing legacy workflows.

aiida.storage.psql_dos.migrations.utils.legacy_workflows.export_workflow_data(connection, profile)[source]#

Export existing legacy workflow data to a JSON file.

aiida.storage.psql_dos.migrations.utils.legacy_workflows.json_serializer(obj)[source]#

JSON serializer for objects not serializable by default json code

“Migrate the file repository to the new disk object store based implementation.

aiida.storage.psql_dos.migrations.utils.migrate_repository.migrate_repository(connection, profile)[source]#

Migrations for the upgrade.

Utilities for synchronizing the django and sqlalchemy schema.

aiida.storage.psql_dos.migrations.utils.parity.synchronize_schemas(alembic_op: <module 'alembic.op' from '/home/docs/checkouts/readthedocs.org/user_builds/aiida-core/envs/latest/lib/python3.10/site-packages/alembic/op.py'>) None[source]#

This function is used by the final migration step, of django/sqlalchemy branches, to synchronize their schemas.

  1. Remove and recreate all (non-unique) indexes, with standard names and postgresql ops.

  2. Remove and recreate all unique constraints, with standard names.

  3. Remove and recreate all foreign key constraints, with standard names and other rules.

Schema naming conventions are defined aiida/storage/sqlalchemy/models/base.py::naming_convention.

Note we assume here that (a) all primary keys are already correct, and (b) there are no check constraints.

SQL statements to detect invalid/understood links for the provenance redesign migration.

Scan the database for any links that are unexpected.

The checks will verify that there are no outgoing call or return links from calculation nodes and that if a workflow node has a create link, it has at least an accompanying return link to the same data node, or it has a call link to a calculation node that takes the created data node as input.

aiida.storage.psql_dos.migrations.utils.provenance_redesign.migrate_infer_calculation_entry_point(alembic_op)[source]#

Set the process type for calculation nodes by inferring it from their type string.

Utility for performing schema migrations, via reflection of the current database.

class aiida.storage.psql_dos.migrations.utils.reflect.ReflectMigrations(op: alembic.op)[source]#

Bases: object

Perform schema migrations, via reflection of the current database.

In django, it is not possible to explicitly specify constraints/indexes and their names, instead they are implicitly created by internal “auto-generation” code (as opposed to sqlalchemy, where one can explicitly specify the names). For a specific django version, this auto-generation code is deterministic, however, over time it has changed. So is not possible to know declaratively exactly what constraints/indexes are present on a users database, withtout knowing the exact django version that created it (and run migrations). Therefore, we need to reflect the database’s schema, to determine what is present on the database, to know what to drop.

__dict__ = mappingproxy({'__module__': 'aiida.storage.psql_dos.migrations.utils.reflect', '__doc__': 'Perform schema migrations, via reflection of the current database.\n\n    In django, it is not possible to explicitly specify constraints/indexes and their names,\n    instead they are implicitly created by internal "auto-generation" code\n    (as opposed to sqlalchemy, where one can explicitly specify the names).\n    For a specific django version, this auto-generation code is deterministic,\n    however, over time it has changed.\n    So is not possible to know declaratively exactly what constraints/indexes are present on a users database,\n    withtout knowing the exact django version that created it (and run migrations).\n    Therefore, we need to reflect the database\'s schema, to determine what is present on the database,\n    to know what to drop.\n    ', '__init__': <function ReflectMigrations.__init__>, 'reset_cache': <function ReflectMigrations.reset_cache>, 'drop_all_unique_constraints': <function ReflectMigrations.drop_all_unique_constraints>, 'drop_unique_constraints': <function ReflectMigrations.drop_unique_constraints>, 'drop_all_indexes': <function ReflectMigrations.drop_all_indexes>, 'drop_indexes': <function ReflectMigrations.drop_indexes>, 'drop_all_foreign_keys': <function ReflectMigrations.drop_all_foreign_keys>, 'drop_foreign_keys': <function ReflectMigrations.drop_foreign_keys>, 'replace_index': <function ReflectMigrations.replace_index>, 'replace_unique_constraint': <function ReflectMigrations.replace_unique_constraint>, 'replace_foreign_key': <function ReflectMigrations.replace_foreign_key>, '__dict__': <attribute '__dict__' of 'ReflectMigrations' objects>, '__weakref__': <attribute '__weakref__' of 'ReflectMigrations' objects>, '__annotations__': {}})#
__init__(op: alembic.op) None[source]#
__module__ = 'aiida.storage.psql_dos.migrations.utils.reflect'#
__weakref__#

list of weak references to the object (if defined)

drop_all_foreign_keys(table_name: str) None[source]#

Drop all foreign keys set for this table.

drop_all_indexes(table_name: str, unique: bool = False) None[source]#

Drop all non-unique indexes set for this table.

drop_all_unique_constraints(table_name: str) None[source]#

Drop all unique constraints set for this table.

drop_foreign_keys(table_name: str, columns: list[str], ref_tbl: str, ref_columns: list[str]) None[source]#

Drop all foreign keys set for this column name group and referring column set.

drop_indexes(table_name: str, column: str | list[str], unique: bool = False) None[source]#

Drop all indexes set for this column name group.

drop_unique_constraints(table_name: str, column_names: list[str]) None[source]#

Drop all unique constraints set for this column name group.

replace_foreign_key(label: str, table_name: str, columns: list[str], ref_tbl: str, ref_columns: list[str], **kwargs) None[source]#

Create foreign key, dropping any existing foreign key with the same constraints.

replace_index(label: str, table_name: str, column: str, unique: bool = False) None[source]#

Create index, dropping any existing index with the same table+columns.

replace_unique_constraint(label: str, table_name: str, columns: list[str]) None[source]#

Create unique constraint, dropping any existing unique constraint with the same table+columns.

reset_cache() None[source]#

Reset the inspector cache.

Various utils that should be used during migrations and migrations tests because the AiiDA ORM cannot be used.

class aiida.storage.psql_dos.migrations.utils.utils.LazyFile(name: str = '', file_type: FileType = FileType.DIRECTORY, key: str | None | LazyOpener = None, objects: Dict[str, File] | None = None)[source]#

Bases: File

Subclass of File where key also allows LazyOpener in addition to a string.

This subclass is necessary because the migration will be storing instances of LazyOpener as the key which should normally only be a string. This subclass updates the key type check to allow this.

__annotations__ = {}#
__init__(name: str = '', file_type: FileType = FileType.DIRECTORY, key: str | None | LazyOpener = None, objects: Dict[str, File] | None = None)[source]#

Construct a new instance.

Parameters:
  • name – The final element of the file path

  • file_type – Identifies whether the File is a file or a directory

  • key – A key to map the file to its contents in the backend repository (file only)

  • objects – Mapping of child names to child Files (directory only)

Raises:

ValueError – If a key is defined for a directory, or objects are defined for a file

__module__ = 'aiida.storage.psql_dos.migrations.utils.utils'#
class aiida.storage.psql_dos.migrations.utils.utils.MigrationRepository(backend: AbstractRepositoryBackend | None = None)[source]#

Bases: Repository

Subclass of Repository that uses LazyFile instead of File as its file class.

__annotations__ = {}#
__module__ = 'aiida.storage.psql_dos.migrations.utils.utils'#
_file_cls#

alias of LazyFile

class aiida.storage.psql_dos.migrations.utils.utils.NoopRepositoryBackend[source]#

Bases: AbstractRepositoryBackend

Implementation of the AbstractRepositoryBackend where all write operations are no-ops.

This repository backend is used to use the Repository interface to build repository metadata but instead of actually writing the content of the current repository to disk elsewhere, it will simply open a lazy file opener. In a subsequent step, all these streams are passed to the new Disk Object Store that will write their content directly to pack files for optimal efficiency.

__abstractmethods__ = frozenset({})#
__module__ = 'aiida.storage.psql_dos.migrations.utils.utils'#
_abc_impl = <_abc._abc_data object>#
_put_object_from_filelike(handle: BufferedIOBase) str[source]#
delete_objects(keys: List[str]) None[source]#

Delete the objects from the repository.

Parameters:

keys – list of fully qualified identifiers for the objects within the repository.

Raises:
erase()[source]#

Delete the repository itself and all its contents.

Note

This should not merely delete the contents of the repository but any resources it created. For example, if the repository is essentially a folder on disk, the folder itself should also be deleted, not just its contents.

get_info(detailed: bool = False, **kwargs) dict[source]#

Returns relevant information about the content of the repository.

Parameters:

detailed – flag to enable extra information (detailed=False by default, only returns basic information).

Returns:

a dictionary with the information.

has_objects(keys: List[str]) List[bool][source]#

Return whether the repository has an object with the given key.

Parameters:

keys – list of fully qualified identifiers for objects within the repository.

Returns:

list of logicals, in the same order as the keys provided, with value True if the respective object exists and False otherwise.

initialise(**kwargs) None[source]#

Initialise the repository if it hasn’t already been initialised.

Parameters:

kwargs – parameters for the initialisation.

property is_initialised: bool#

Return whether the repository has been initialised.

iter_object_streams(keys: List[str])[source]#

Return an iterator over the (read-only) byte streams of objects identified by key.

Note

handles should only be read within the context of this iterator.

Parameters:

keys – fully qualified identifiers for the objects within the repository.

Returns:

an iterator over the object byte streams.

Raises:
property key_format: str | None#

Return the format for the keys of the repository.

Important for when migrating between backends (e.g. archive -> main), as if they are not equal then it is necessary to re-compute all the Node.base.repository.metadata before importing (otherwise they will not match with the repository).

list_objects() Iterable[str][source]#

Return iterable that yields all available objects by key.

Returns:

An iterable for all the available object keys.

maintain(dry_run: bool = False, live: bool = True, **kwargs) None[source]#

Performs maintenance operations.

Parameters:
  • dry_run – flag to only print the actions that would be taken without actually executing them.

  • live – flag to indicate to the backend whether AiiDA is live or not (i.e. if the profile of the backend is currently being used/accessed). The backend is expected then to only allow (and thus set by default) the operations that are safe to perform in this state.

property uuid: str | None#

Return the unique identifier of the repository.

Note

A sandbox folder does not have the concept of a unique identifier and so always returns None.

aiida.storage.psql_dos.migrations.utils.utils.delete_numpy_array_from_repository(repository_path, uuid, name)[source]#

Delete the numpy array with a given name from the repository corresponding to a node with a given uuid.

Parameters:
  • uuid – the UUID of the node

  • name – the name of the numpy array

aiida.storage.psql_dos.migrations.utils.utils.dumps_json(dictionary)[source]#

Transforms all datetime object into isoformat and then returns the JSON.

aiida.storage.psql_dos.migrations.utils.utils.ensure_repository_folder_created(repository_path, uuid)[source]#

Make sure that the repository sub folder for the node with the given UUID exists or create it.

Parameters:

uuid – UUID of the node

aiida.storage.psql_dos.migrations.utils.utils.get_node_repository_dirpaths(profile, basepath, shard=None)[source]#

Return a mapping of node UUIDs onto the path to their current repository folder in the old repository.

Parameters:
  • basepath – the absolute path of the base folder of the old file repository.

  • shard – optional shard to define which first shard level to check. If None, all shard levels are checked.

Returns:

dictionary of node UUID onto absolute filepath and list of node repo missing one of the two known sub folders, path or raw_input, which is unexpected.

Raises:

StorageMigrationError – if the repository contains node folders that contain both the path and raw_input subdirectories, which should never happen.

aiida.storage.psql_dos.migrations.utils.utils.get_node_repository_sub_folder(repository_path, uuid, subfolder='path')[source]#

Return the absolute path to the sub folder path within the repository of the node with the given UUID.

Parameters:

uuid – UUID of the node

Returns:

absolute path to node repository folder, i.e /some/path/repository/node/12/ab/c123134-a123/path

aiida.storage.psql_dos.migrations.utils.utils.get_numpy_array_absolute_path(repository_path, uuid, name)[source]#

Return the absolute path of a numpy array with the given name in the repository of the node with the given uuid.

Parameters:
  • uuid – the UUID of the node

  • name – the name of the numpy array

Returns:

the absolute path of the numpy array file

aiida.storage.psql_dos.migrations.utils.utils.get_repository_object(profile, hashkey)[source]#

Return the content of an object stored in the disk object store repository for the given hashkey.

aiida.storage.psql_dos.migrations.utils.utils.load_numpy_array_from_repository(repository_path, uuid, name)[source]#

Load and return a numpy array from the repository folder of a node.

Parameters:
  • uuid – the node UUID

  • name – the name under which to store the array

Returns:

the numpy array

aiida.storage.psql_dos.migrations.utils.utils.migrate_legacy_repository(profile, shard=None)[source]#

Migrate the legacy file repository to the new disk object store and return mapping of repository metadata.

Warning

this method assumes that the new disk object store container has been initialized.

The format of the return value will be a dictionary where the keys are the UUIDs of the nodes whose repository folder has contents have been migrated to the disk object store. The values are the repository metadata that contain the keys for the generated files with which the files in the disk object store can be retrieved. The format of the repository metadata follows exactly that of what is generated normally by the ORM.

This implementation consciously uses the Repository interface in order to not have to rewrite the logic that builds the nested repository metadata based on the contents of a folder on disk. The advantage is that in this way it is guarantee that the exact same repository metadata is generated as it would have during normal operation. However, if the Repository interface or its implementation ever changes, it is possible that this solution will have to be adapted and the significant parts of the implementation will have to be copy pasted here.

Returns:

mapping of node UUIDs onto the new repository metadata.

aiida.storage.psql_dos.migrations.utils.utils.put_object_from_string(repository_path, uuid, name, content)[source]#

Write a file with the given content in the repository sub folder of the given node.

Parameters:
  • uuid – UUID of the node

  • name – name to use for the file

  • content – the content to write to the file

aiida.storage.psql_dos.migrations.utils.utils.recursive_datetime_to_isoformat(value)[source]#

Convert all datetime objects in the given value to string representations in ISO format.

Parameters:

value – a mapping, sequence or single value optionally containing datetime objects

aiida.storage.psql_dos.migrations.utils.utils.serialize_repository(repository: Repository) dict[source]#

Serialize the metadata into a JSON-serializable format.

Note

the serialization format is optimized to reduce the size in bytes.

Returns:

dictionary with the content metadata.

aiida.storage.psql_dos.migrations.utils.utils.store_numpy_array_in_repository(repository_path, uuid, name, array)[source]#

Store a numpy array in the repository folder of a node.

Parameters:
  • uuid – the node UUID

  • name – the name under which to store the array

  • array – the numpy array to store