# 用法¶

## 算例函数(Calculation functions)¶

### 创建数据¶

# -*- coding: utf-8 -*-
from aiida.engine import calcfunction
from aiida.orm import Int

@calcfunction
result = Int(x + y).store()
return result



ValueError: trying to return an already stored Data node from a @calcfunction, however, @calcfunctions cannot return data.
If you stored the node yourself, simply do not call store() yourself.
If you want to return an input node, use a @workfunction instead.


# -*- coding: utf-8 -*-
from aiida.engine import calcfunction
from aiida.orm import Int

@calcfunction
return result



## 算例任务(Calculation jobs)¶

### 定义¶

# -*- coding: utf-8 -*-
from aiida.engine import CalcJob

"""Implementation of CalcJob to add two numbers for testing and demonstration purposes."""

@classmethod
def define(cls, spec):
super().define(spec)
spec.input('x', valid_type=orm.Int, help='The left operand.')
spec.input('y', valid_type=orm.Int, help='The right operand.')


• cls 是指向该类自身的可以访问类的所有类方法的变量

• spec 是 “计算规格参数和详情” (specification)

# -*- coding: utf-8 -*-
from aiida.engine import CalcJob

"""Implementation of CalcJob to add two numbers for testing and demonstration purposes."""

@classmethod
def define(cls, spec):
super().define(spec)
spec.input('x', valid_type=orm.Int, help='The left operand.')
spec.input('y', valid_type=orm.Int, help='The right operand.')
spec.output('sum', valid_type=orm.Int, help='The sum of the left and right operand.')


### 准备¶

• 在计算资源的可用空间上，准备一个任务运行的工作文件夹

• 创建可执行软件所需的的原始输入文件

• 创建一个包含调度程序指令的启动脚本，加载环境变量，最后使用特定命令行参数调用可执行文件。

# -*- coding: utf-8 -*-
from aiida.common.datastructures import CalcInfo, CodeInfo
from aiida.engine import CalcJob

"""Implementation of CalcJob to add two numbers for testing and demonstration purposes."""

@classmethod
def define(cls, spec):
super().define(spec)
spec.input('x', valid_type=orm.Int, help='The left operand.')
spec.input('y', valid_type=orm.Int, help='The right operand.')
spec.output('sum', valid_type=orm.Int, help='The sum of the left and right operand.')

def prepare_for_submission(self, folder):
"""Write the input files that are required for the code to run.

:param folder: an ~aiida.common.folders.Folder to temporarily write files on disk
:return: ~aiida.common.datastructures.CalcInfo instance
"""
input_x = self.inputs.x
input_y = self.inputs.y

# Write the input file based on the inputs that were passed
with folder.open(self.options.input_filename, 'w', encoding='utf8') as handle:
handle.write(f'{input_x.value} {input_y.value}\n')

codeinfo = CodeInfo()
codeinfo.code_uuid = self.inputs.code.uuid
codeinfo.stdout_name = self.options.output_filename
codeinfo.cmdline_params = ['-in', self.options.input_filename]

calcinfo = CalcInfo()
calcinfo.codes_info = [codeinfo]
calcinfo.local_copy_list = []
calcinfo.remote_copy_list = []
calcinfo.retrieve_list = []

return calcinfo


folder 参数指向本地文件系统上的一个临时沙箱文件夹，可用于将输入文件写入。在 prepare_for_submission 方法返回后，引擎将获取这些文件并将它们复制到运行计算的工作目录中。在此基础上，这些文件还将被写入表示计算节点的文件存储库，用作保存详细的额外的可验证性。即使写在那里的信息应该是作为输入节点传递的节点内容的派生，我们仍然显式地存储它。有时，这并非预期的行为，例如出于效率或数据隐私的原因。因此可以使用各种列表来控制它，例如 local_copy_listprovenance_exclude_list

• 编写运行计算所需的原始输入文件到 folder 沙箱文件夹。

• 使用 CalcInfo 告诉引擎将被复制到工作目录的文件

• 使用 CalcInfo 来决定应运行的计算代码，以及使用哪些命令行参数，例如标准输入和输出重定向。

• codes_info: 一个由数据结构 CodeInfo 构成的列表，用以告知任务运行期间要顺序执行的计算代码

• local_copy_list: 元组列表，指示要从本地计算机复制哪些文件到工作目录

• remote_copy_list: 一个元组列表，它指示要将哪些文件从运行任务的机器复制到工作目录

• retrieve_list: 一个元组列表，指示应从工作目录检索哪些文件，并在作业完成后存储在本地存储库中

### 文件列表¶

#### 本地复制列表¶

• node uuid: 其存储库包含文件的节点，通常是 SinglefileDataFolderData 节点

• source relative path: 节点存储库中文件的相对路径

• target relative path: 要复制文件的工作目录中的相对路径

calc_info.local_copy_list = [(self.inputs.pseudopotential.uuid, self.inputs.pseudopotential.filename, 'pseudopotential.dat')]


calc_info.local_copy_list = [(self.inputs.folder.uuid, 'internal/relative/path/file.txt', 'relative/target/file.txt')]


#### 可验证性排除列表¶

local_copy_list 允许指示引擎将文件从输入文件写入工作目录，而不 同时 复制到计算节点的文件存储库。正如在相应的部分中所讨论的，为了避免重复，或者当节点的数据是专有的或隐私敏感的，且不能在文件存储库中的任意位置重复时，这是有用的。然而， local_copy_list 的限制是，它只能针对单个文件的整体，不能用于写入 folder 沙箱文件夹的任意文件。为了完全控制 folder 中的文件永久存储在计算节点文件存储库中，我们引入了 provenance_exclude_list。这个 CalcInfo 属性是一个文件路径列表，相对于 folder 沙箱文件夹的基本路径，它 不存储 在文件存储库中。

├─ sub
│  ├─ file_b.txt
│  └─ personal.dat
├─ file_a.txt
└─ secret.key


calc_info.provenance_exclude_list = ['sub/personal.dat', 'secret.key']


├─ sub
│  └─ file_b.txt
└─ file_a.txt


#### 远程复制列表¶

• computer uuid: 这是源文件所在的 Computer 的UUID。目前，远程复制列表只能复制运行在同一台计算机上的文件。

• source absolute path: 远程机器上源文件的绝对路径

• target relative path: 要复制文件的工作目录中的相对路径

calc_info.remote_copy_list[(self.inputs.parent_folder.computer.uuid, 'output_folder', 'restart_folder')]


#### 检索列表¶

• 表示远程工作目录中相对文件路径的字符串

• 长度为3的元组，允许控制检索到的文件夹中的检索文件或文件夹的名称

• source relative path: 相对于要检索的文件或目录的远程工作目录的相对路径。

• 目标相对路径 ：检索到的文件夹中的目录的相对路径，源路径文件的内容将被复制到其中。字符串 '.' 表示检索到的文件夹的顶层。

• 深度 ：复制时要在源路径中的嵌套层次数，从最深的文件开始。

├─ path
|  ├── sub
│  │   ├─ file_c.txt
│  │   └─ file_d.txt
|  └─ file_b.txt
└─ file_a.txt


##### 显式文件或文件夹¶

retrieve_list = ['file_a.txt']

└─ file_a.txt

retrieve_list = ['path']

├── sub
│   ├─ file_c.txt
│   └─ file_d.txt
└─ file_b.txt

##### 显式嵌套的文件或文件夹¶

retrieve_list = ['path/file_b.txt']

└─ file_b.txt

retrieve_list = ['path/sub']

├─ file_c.txt
└─ file_d.txt

##### 显式嵌套的文件或文件夹，但保持(部分)层次结构¶

retrieve_list = [('path/sub/file_c.txt', '.', 3)]

└─ path
└─ sub
└─ file_c.txt


retrieve_list = [('path/sub/file_c.txt', '.', 2)]

└─ sub
└─ file_c.txt


retrieve_list = [('path/sub', '.', 1)]

└── sub
├─ file_c.txt
└─ file_d.txt

##### 模式匹配¶

retrieve_list = [('path/sub/*c.txt', '.', 0)]

└─ file_c.txt


retrieve_list = [('path/sub/*c.txt', '.', 2)]

└── sub
└─ file_c.txt

##### 特定的目标文件夹¶

retrieve_list = [('path/sub/file_c.txt', 'target', 3)]

└─ target
└─ path
└─ sub
└─ file_c.txt


retrieve_list = [('path/sub', 'target', 1)]

└─ target
└── sub
├─ file_c.txt
└─ file_d.txt


retrieve_list = [('path/sub/*c.txt', 'target', 0)]

└─ target
└─ file_c.txt


#### 把文件藏在远端¶

1.6.0 新版功能.

stash 选项允许用户指定由计算作业创建并存储在远程某处的某些文件。如果这些文件需要存储的时间比作业运行时通常定时清理的暂存(scratch)空间长，但是文件较大需要保存在远程机器上，而不需要检索，那么这就很有用。例如，有些文件是重新开始计算所必需的，但是由于太大而无法在本地文件存储库中检索和永久存储。

from aiida.common.datastructures import StashMode

inputs = {
'code': ....,
...
'options': {
'stash': {
'source_list': ['aiida.out', 'output.txt'],
'target_base': '/storage/project/stash_folder',
'stash_mode': StashMode.COPY.value,
}
}
}
}


AiiDA实际上并不拥有远程存储中的文件，因此内容可能会在某个时候消失。

### 选项¶

calcjobaiida.engine.processes.calcjobs.CalcJob

Implementation of the CalcJob process.

Inputs:

• code, Code, required – The Code to use for this job.
Namespace Ports
• call_link_label, str, optional, non_db – The label to use for the CALL link if the process is called by another process.
• computer, Computer, optional, non_db – When using a “local” code, set the computer on which the calculation should be run.
• description, str, optional, non_db – Description to set on the process node.
• dry_run, bool, optional, non_db – When set to True will prepare the calculation job for submission but not actually launch it.
• label, str, optional, non_db – Label to set on the process node.
• options, Namespace
Namespace Ports
• account, str, optional, non_db – Set the account to use in for the queue on the remote computer
• additional_retrieve_list, (list, tuple), optional, non_db – List of relative file paths that should be retrieved in addition to what the plugin specifies.
• append_text, str, optional, non_db – Set the calculation-specific append text, which is going to be appended in the scheduler-job script, just after the code execution
• custom_scheduler_commands, str, optional, non_db – Set a (possibly multiline) string with the commands that the user wants to manually set for the scheduler. The difference of this option with respect to the prepend_text is the position in the scheduler submission file where such text is inserted: with this option, the string is inserted before any non-scheduler command
• environment_variables, dict, optional, non_db – Set a dictionary of custom environment variables for this calculation
• import_sys_environment, bool, optional, non_db – If set to true, the submission script will load the system environment variables
• input_filename, str, optional, non_db – Filename to which the input for the code that is to be run is written.
• max_memory_kb, int, optional, non_db – Set the maximum memory (in KiloBytes) to be asked to the scheduler
• max_wallclock_seconds, int, optional, non_db – Set the wallclock in seconds asked to the scheduler
• mpirun_extra_params, (list, tuple), optional, non_db – Set the extra params to pass to the mpirun (or equivalent) command after the one provided in computer.mpirun_command. Example: mpirun -np 8 extra_params[0] extra_params[1] … exec.x
• output_filename, str, optional, non_db – Filename to which the content of stdout of the code that is to be run is written.
• parser_name, str, optional, non_db – Set a string for the output parser. Can be None if no output plugin is available or needed
• prepend_text, str, optional, non_db – Set the calculation-specific prepend text, which is going to be prepended in the scheduler-job script, just before the code execution
• priority, str, optional, non_db – Set the priority of the job to be queued
• qos, str, optional, non_db – Set the quality of service to use in for the queue on the remote computer
• queue_name, str, optional, non_db – Set the name of the queue on the remote computer
• resources, dict, required, non_db – Set the dictionary of resources to be used by the scheduler plugin, like the number of nodes, cpus etc. This dictionary is scheduler-plugin dependent. Look at the documentation of the scheduler for more details.
• scheduler_stderr, str, optional, non_db – Filename to which the content of stderr of the scheduler is written.
• scheduler_stdout, str, optional, non_db – Filename to which the content of stdout of the scheduler is written.
• stash, Namespace – Optional directives to stash files after the calculation job has completed.
Namespace Ports
• source_list, (tuple, list), optional, non_db – Sequence of relative filepaths representing files in the remote directory that should be stashed.
• stash_mode, str, optional, non_db – Mode with which to perform the stashing, should be value of aiida.common.datastructures.StashMode.
• target_base, str, optional, non_db – The base location to where the files should be stashd. For example, for the copy stash mode, this should be an absolute filepath on the remote computer.
• submit_script_filename, str, optional, non_db – Filename to which the job submission script is written.
• withmpi, bool, optional, non_db – Set the calculation to use mpi
• store_provenance, bool, optional, non_db – If set to False provenance will not be stored in the database.

Outputs:

• remote_folder, RemoteData, required – Input files necessary to run the process will be stored in this folder node.
• remote_stash, RemoteStashData, optional – Contents of the stash.source_list option are stored in this remote folder after job completion.
• retrieved, FolderData, required – Files that are retrieved by the daemon will be stored in this node. By default the stdout and stderr of the scheduler will be added, but one can add more by specifying them in CalcInfo.retrieve_list.

### 裸运行¶

builder.metadata.dry_run = True


• 当调用 run() 进行计算时，设置了 dry_run 标志，返回结果将始终是一个空字典 {}

• 如果你调用 run_get_node() ，你会得到一个未存储的 CalcJobNode 作为一个节点。在这种情况下，未存储的 CalcJobNode (让我们称它为 node )将有一个额外的属性 node.dry_run_info 。这是一个字典，它包含关于裸运行输出的附加信息。特别地，它将有以下键：

• folder： 在 submit_test 文件夹内创建文件的绝对路径，例如： /home/user/submit_test/20190726-00019

• script_filename： AiiDA在文件夹中生成的提交脚本的文件名，例如：  _aiidassubmit .sh

• 如果使用 submit() 来提交裸运行，将只是转发运行，您将获得未存储的节点(与上面相同的属性)。

### 结果解析¶

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 # -*- coding: utf-8 -*- from aiida.orm import Int from aiida.parsers.parser import Parser class ArithmeticAddParser(Parser): def parse(self, **kwargs): """Parse the contents of the output files retrieved in the FolderData.""" output_folder = self.retrieved try: with output_folder.open(self.node.get_option('output_filename'), 'r') as handle: result = self.parse_stdout(handle) except (OSError, IOError): return self.exit_codes.ERROR_READING_OUTPUT_FILE if result is None: return self.exit_codes.ERROR_INVALID_OUTPUT self.out('sum', Int(result)) @staticmethod def parse_stdout(filelike): """Parse the sum from the output of the ArithmeticAddcalculation written to standard out :param filelike: filelike object containing the output :returns: the sum """ try: result = int(filelike.read()) except ValueError: result = None return result 

• 打开并加载由算例任务生成并已被引擎检索的输出文件的内容

• 从作为输出节点附加在原始创建的数据节点上

• 在输出警告情况下，记录可读的警告消息

• 可选地返回 :ref:退出状态码 <topics:processes:concepts:exit_codes>来表示计算结果不成功

 12 13 14 15 16  try: with output_folder.open(self.node.get_option('output_filename'), 'r') as handle: result = self.parse_stdout(handle) except (OSError, IOError): return self.exit_codes.ERROR_READING_OUTPUT_FILE 

parse_stdout 方法只是一个辅助函数，用于将数据的实际解析与主解析器代码分离开来。在这种情况下，解析变得简单，我们可以将其保存在主解析方法中，但这只是为了说明，以清晰起见，您可以完全自由地在 parse 方法中组织代码。如果设法解析计算产生的和，则将其封装在相应的 Int 数据节点类中，并通过 out 方法将其注册为输出:

 21  self.out('sum', Int(result)) 

• 将您可能想要查询的数据存储在轻量级数据节点中，例如 DictListStructureData 。这些节点的内容作为属性存储在数据库中，这确保可以查询它们。

• 更大的数据集，比如大型(多维)数组，最好存储在 ArrayData 或它的一个子类。如果将所有这些数据存储在数据库中，它将变得不必要地臃肿，因为您不太可能查询这些数据。相反，这些数组类型的数据节点将其大部分内容存储在存储库中。通过这种方式，您仍然可以保留数据和计算的可验证性，同时保持您的数据库精简和快速!

### 任务调度程序错误¶

class SomeParser(Parser):

def parse(self, **kwargs):
"""Parse the contents of the output files retrieved in the FolderData."""
if self.node.exit_status is not None:
# If an exit status is already set on the node, that means the
# scheduler plugin detected a problem.
return


None

None

ExitCode(0)

ExitCode(100)

None

ExitCode(100)

None

ExitCode(400)

ExitCode(400)

ExitCode(100)

ExitCode(400)

ExitCode(400)

ExitCode(100)

ExitCode(0)

ExitCode(0)