.. _topics:provenance:concepts: ======== Concepts ======== Nodes and links =============== Two of the most important concepts in AiiDA are **data** and **processes**. The former are pieces of data, such as a simple integer or float, all the way to more complex data concepts such as a dictionary of parameters, a folder of files or a crystal structure. Processes operate on this data in order to produce new data. Processes come in two different forms: * **Calculations** are processes that are able to **create** new data. This is the case, for instance, for externals simulation codes, that generate new data * **Workflows** are processes that **orchestrate** other workflows and calculations, i.e. they manage the logical flow, being able to **call** other processes. Workflows have data inputs, but cannot generate new data. They can only return data that is already in the database (one typical case is to return data created by a calculation they called). Data and processes are represented in the AiiDA provenance graph as the **nodes** of that graph. The graph edges are referred to as **links** and come in different forms: * **input** links: connect data nodes to the process nodes that used them as input, both calculations and workflows * **create** links: connect calculation nodes to the data nodes that they created * **return** links: connect workflow nodes to the data nodes that they returned * **call** links: connecting workflow nodes to the process nodes that they directly called, be it calculations or workflows Note that the **create** and **return** links are often collectively referred to as **output** links. Data provenance and logical provenance ====================================== AiiDA automatically stores entities in its database and links them forming a **directed graph**. This directed graph automatically tracks the **provenance** of all data produced by calculations or returned by workflows. By tracking the provenance in this way, one can always fully retrace how a particular piece of data came into existence, thus ensuring its reproducibility. In particular, we define two types of provenance: * The **data provenance**, consisting of the part of the graph that *only* consists of data and calculations (i.e. without considering workflows), and only the **input** and **create** links that connect them. The data provenance records the full history of how data has been generated. Due to the causality principle, the data provenance part of the graph is a **directed acyclic graph** (DAG), i.e. its nodes are connected by directed edges and it does not contain any cycles. * The **logical provenance** which consists of workflow and data nodes, together with the **input**, **return** and **call** links that connect them. The logical provenance is *not* acyclic, e.g. a workflow that acts as a filter can return one of its own inputs, directly introducing a cycle. The data provenance is essentially a log of which calculation generated what data using certain inputs. The data provenance alone already guarantees reproducibility (one could run again one by one the calculations with the provided input and would obtain the same outputs). The logical provenance gives additional information on why a specific calculation was run. Imagine the case in which you start from 100 structures, you have a filter operation that picks one, and then you run a simulation on it. The data provenance only shows the simulation you run on the structure that was picked, while the logical provenance can also show that the specific structure was not picked at random but via a specific workflow logic. Other entities ============== Beside nodes (data and processes), AiiDA defines a few more entities, like a :py:class:`~aiida.orm.computers.Computer` (representing a computer, supercomputer or computer cluster where calculations are run or data is stored), a :py:class:`~aiida.orm.groups.Group` (that group nodes together for organizational purposes) and the :py:class:`~aiida.orm.users.User` (to keep track of the user who first generated a given node, computer or group). In the following section we describe in more detail how the general provenance concepts above are actually implemented in AiiDA, with specific reference to the python classes that implement them and the class-inheritance relationships.