Repository

In addition to the database, AiiDA also stores information in the repository in the form of files. The repository is optimized to store large amounts of files, which allows AiiDA to scale to high-throughput loads. As a result, the files cannot be accessed directly using file system tools, despite the fact that they are stored somewhere on the local file system. Instead, you should interact with the repository through the API.

Since each node can have its own virtual file hierarchy, the repository contents of a node are accessed through the Node class. The hierarchy is virtual because the files may not actually be written to disk with the same hierarchy. For more technical information on the implementation, please refer to the repository internals section.

Writing to the repository

To write files to a node, you can use one of the following three methods:

Let’s assume that you have a file on your local file system called /some/path/file.txt that you want to copy to a node. The most straightforward solution is the following:

node = Node()
node.put_object_from_file('/some/path/file.txt', 'file.txt')

Note that the first argument should be an absolute filepath. The second argument is the filename with which the file will be written to the repository of the node. It can be any valid filename as long as it is relative. The target filename can contain nested subdirectories, for example some/relative/path/file.txt. The nested directories do not have to exist.

Alternatively, it is also possible to write a file to a node from a stream or filelike-object. This is useful when the content of the file is already in memory and prevents having to write it to the local filesystem first. For example, one can do the following:

with open('/some/path/file.txt') as handle:
    node = Node()
    node.put_object_from_filelike(handle, 'file.txt')

which is the same as the previous example, except the file is opened first in a context manager and then the filelike-object is passed in. The put_object_from_filelike() method should work with any filelike-object, for example also byte- and textstreams:

import io
node = Node()
node.put_object_from_filelike(io.BytesIO(b'some content'), 'file.txt')

Finally, instead of writing one file at a time, you can write the contents of an entire directory to the node’s repository:

node = Node()
node.put_object_from_tree('/some/directory')

The contents of the entire directory will be recursively written to the node’s repository. Optionally, you can write the content to a subdirectory in the repository:

node = Node()
node.put_object_from_tree('/some/directory', 'some/sub/path')

As with put_object_from_file(), the sub directories do not have to be explicitly created first.

Listing repository content

To determine the contents of a node’s repository, you can use the following methods:

The first method will return a list of file objects contained within the node’s repository, where an object can be either a directory or a file:

In [1]: node.list_object_names()
Out[1]: ['sub', 'file.txt']

To determine the contents of a subdirectory, simply pass the path as an argument:

In [1]: node.list_object_names('sub/directory')
Out[1]: ['nested.txt']

Note that the elements in the returned list are simple strings and so one cannot tell if they correspond to a directory or a file. If this information is needed, use list_objects() instead. This method returns a list of File objects. These objects have a file_type() and name() property which returns the type and name of the file object, respectively. An example usage would be the following:

from aiida.repository.common import FileType

for obj in node.list_objects():
    if obj.file_type == FileType.DIRECTORY:
        print(f'{obj.name} is a directory.)
    elif obj.file_type == FileType.FILE:
        print(f'{obj.name} is a file.)

To retrieve a specific file object with a particular relative path, use get_object():

In [1]: node.get_object('sub/directory/nested.txt')
Out[1]: File(file_type=FileType.FILE, name='nested.txt')

Finally, if you want to recursively iterate over the contents of a node’s repository, you can use the walk() method. It operates exactly as the os.walk method of the Python standard library:

In [1]: for root, dirnames, filenames in node.walk():
            print(root, dirnames, filenames)
Out[1]: '.', ['sub'], ['file.txt']
        'sub', ['directory'], []
        'sub/directory', [], ['nested.txt']

Reading from the repository

To retrieve the content of files stored in a node’s repository, you can use the following methods:

The first method functions exactly as Python’s open built-in function:

with node.open('some/file.txt', 'r') as handle:
    content = handle.read()

The get_object_content() method provides a short-cut for this operation in case you want to directly read the content into memory:

content node.get_object_content('some/file.txt', 'r')

Both methods accept a second argument to determine whether the file should be opened in text- or binary-mode. The valid values are 'r' and 'rb', respectively. Note that these methods can only be used to read content from the repository and so any other read modes, such as 'wb', will result in an exception. To write files to the repository, use the methods that are described in the section on writing to the repository.

Copying from the repository

If you want to copy specific files from a node’s repository, the section on reading from the repository shows how to read their content which can then be written elsewhere. However, sometimes you want to copy the entire contents of the node’s repository, or a subdirectory of it. The copy_tree() method makes this easy and can be used as follows:

node.copy_tree('/some/target/directory')

which will write the entire repository content of node to the directory /some/target/directory on the local file system. If you only want to copy a particular subdirectory of the repository, you can pass this as the second path argument:

node.copy_tree('/some/target/directory', path='sub/directory')

This method, combined with put_object_from_tree(), makes it easy to copy the entire repository content (or a subdirectory) from one node to another:

import tempfile
node_source = load_node(<PK>)
node_target = Node()

with tempfile.TemporaryDirectory() as dirpath:
    node_source.copy_tree(dirpath)
    node_target.put_object_from_tree(dirpath)

Note that this method is not the most efficient as the files are first written from node_a to a temporary directory on disk, before they are read in memory again and written to the repository of node_b. There is a more efficient method which requires a bit more code and that directly uses the walk() method explained in the section on listing repository content.

node_source = load_node(<PK>)
node_target = Node()

for root, dirnames, filenames in node_source.walk():
    for filename in filenames:
        filepath = root / filename
        with node_source.open(filepath) as handle:
            node_target.put_object_from_filelike(handle, filepath)

Note

In the example above, only the files are explicitly copied over. Any intermediate nested directories will be automatically created in the virtual hierarchy. However, currently it is not possible to create a directory explicitly. Empty directories are not yet supported.