Developer code plugin tutorial¶
In this chapter we will give you a brief guide that will teach you how to write a plugin to support a new code.
Generally speaking, we expect that each code will have its own peculiarity, so that sometimes a new strategy for code plugin might be needed to be carefully thought. Anyway, we will show you how we implemented the plugin for Quantum Espresso, in order for you to be able to replicate the task for other codes. Therefore, it will be assumed that you have already tried to run an example of QE, and you know more or less how the AiiDA interface works.
In fact, when writing your own plugin, keep in mind that you need to satisfy multiple users, and the interface needs to be simple (not the code below). But always try to follow the Zen of Python:
Simple is better than complex.
Complex is better than complicated.
Readability counts.
There will be two kinds of plugins, the input and the output. The former has the purpose to convert python object in text inputs that can be executed by external softwares. The latters will convert the text output of these softwares back into python dictionaries/objects that can be put back in the database.
InputPlugin¶
In abstract term, this plugin must contain these two pieces of information:
what are the input objects of the calculation
how to convert the input object in an input file
This is it, a minimal input plugin must have at least these two things.
Create a new file, which has the same name as the class you are
creating (in this way, it will be possible to load it with
CalculationFactory
).
Save it in a subfolder at the path aiida/orm/calculation/job
.
Step 1: inheritance¶
First define the class:
class SubclassCalculation(JobCalculation):
(Substitute Subclass
with the name of your plugin).
Take care of inheriting the JobCalculation
class, or the plugin will not work.
Note
The base Calculation
class should only be used as the abstract
base class. Any calculation that needs to run on a remote scheduler must
inherit from JobCalculation
, that
contains all the methods to run on a remote scheduler, get the calculation
state, copy files remotely and retrieve them, ...
Now, you will likely need to define some variables that belong to
SubclassCalculation
.
In order to be sure that you don’t lose any variables belonging to the
inherited class, every subclass of calculation needs to have a method which is
called _init_internal_params()
.
An example of it would look like:
def _init_internal_params(self):
super(SubclassCalculation, self)._init_internal_params()
self.A_NEW_VARIABLE = 'nabucco'
This function will be called by the __init__
method and will initialize
the variable A_NEW_VARIABLE
at the moment of the instancing.
The second line will call the _init_internal_params() of the parent class and
load other variables eventually defined there.
Now you are able to access the variable A_NEW_VARIABLE
also in the rest of
the class by calling self.A_NEW_VARIABLE
.
Note
Even if you don’t need to define new variables, it is safer to define
the method with the call to super()
.
Note
It is not recommended to rewrite an __init__
by yourself: this
method is inherited from the classes Node
and Calculation
, and you
shouldn’t alter it unless you really know the code down to the lowest-level.
Note
The following is a list of
relevant parameters you may want to (re)define in _init_internal_params
:
self._default_parser
: set to the string of the default parser to be used, in the form accepted by the plugin loader (e.g., for the Quantum ESPRESSO plugin for phonons, this would be “quantumespresso.ph”, loaded from theaiida.parsers.plugins
module).self._DEFAULT_INPUT_FILE
: specify here the relative path to the filename of the default file that should be shown byverdi calculation outputcat --default
. If not specified, the default value isNone
andverdi calculation outputcat
will not accept the--default
option, but it will instead always ask for a specific path name.self._DEFAULT_OUTPUT_FILE
: same of_DEFAULT_INPUT_FILE
, but for the default output file.
Step 2: define input nodes¶
First, you need to specify what are the objects that are going to be
accepted as input to the calculation class.
This is done by the class property _use_methods
.
An example is as follows:
from aiida.common.utils import classproperty
class SubclassCalculation(JobCalculation):
def _init_internal_params(self):
super(SubclassCalculation, self)._init_internal_params()
@classproperty
def _use_methods(cls):
retdict = JobCalculation._use_methods
retdict.update({
"settings": {
'valid_types': ParameterData,
'additional_parameter': None,
'linkname': 'settings',
'docstring': "Use an additional node for special settings",
},
"pseudo": {
'valid_types': UpfData,
'additional_parameter': 'kind',
'linkname': cls._get_pseudo_linkname,
'docstring': ("Use a remote folder as parent folder (for "
"restarts and similar"),
},
})
return retdict
@classmethod
def _get_pseudo_linkname(cls, kind):
"""
Return the linkname for a pseudopotential associated to a given
structure kind.
"""
return "pseudo_{}".format(kind)
After this piece of code is written, we now have defined two methods of the calculation that specify what DB object could be set as input (and draw the graph in the DB). Specifically, here we will find the two methods:
calculation.use_settings(an_object)
calculation.use_pseudo(another_object,'object_kind')
What did we do?
We added implicitly the two new
use_settings
anduse_pseudo
methods (because the dictionary returned by_use_methods
now contains asettings
and apseudo
key)We did not lose the
use_code
call defined in theCalculation
base class, because we are extendingCalculation._use_methods
. Therefore: don’t specify a code as input in the plugin!use_settings
will accept only one parameter, the node specifying the settings, since theadditional_parameter
value isNone
.use_pseudo
will require two parameters instead, sinceadditional_parameter
value is notNone
. If the second parameter is passed via kwargs, its name must be ‘kind’ (the value ofadditional_parameters
). That is, you can calluse_pseudo
in one of the two following ways:use_pseudo(pseudo_node, 'He') use_pseudo(pseudo_node, kind='He')
to associate the pseudopotential node
pseudo_node
(that you must have loaded before) to helium (He) atoms.The type of the node that you pass as first parameter will be checked against the type (or the tuple of types) specified with
valid_types
(the check is internally done using theisinstance
python call).The name of the link is taken from the
linkname
value. Note that ifadditional_parameter
isNone
, this is simply a string; otherwise, it must be a callable that accepts one single parameter (the further parameter passed to theuse_XXX
function) and returns a string with the proper name. This functionality is provided to have a singleuse_XXX
method to define more than one input node, as it is the case for pseudopotentials, where one input pseudopotential node must be specified for each atomic species or kind.Finally,
docstring
will contain the documentation of the function, that the user can obtain by printing e..g.use_pseudo.__doc__
.
Note
The actual implementation of the use_pseudo
method in the
Quantum ESPRESSO tutorial is slightly different, as it allows the user
to specify a list of kinds that are associated with the same pseudopotential
file (while in the example above only one kind string can be passed).
Step 3: prepare a text input¶
How are the input nodes used internally? Every plugin class is required to have the following method:
def _prepare_for_submission(self,tempfolder,inputdict):
This function is called by the daemon when it is trying to create a new calculation.
There are two arguments:
tempfolder
: is an object of kind SandboxFolder, which behaves exactly as a folder. In this placeholder, you are going to write the input files. This tempfolder is gonna be copied to the remote cluster.
2. inputdict
: contains all the input data nodes as a dictionary, in the
same format that is returned by the get_inputdata_dict()
method,
i.e. a linkname as key, and the object as value.
Note
inputdict should contain all input Data
nodes, but not the code.
(this is what the get_inputdata_dict()
method does, by the way).
In general, you simply want to do:
inputdict = self.get_inputdata_dict()
right before calling _prepare_for_submission
.
The reason for having this explicitly passed is that the plugin does not have
to perform explicit database queries, and moreover this is useful to test
for submission without the need to store all nodes on the DB.
For the sake of clarity, it’s probably going to be easier looking at
an implemented example. Take a look at the NamelistsCalculation
located in
aiida.orm.calculation.job.quantumespresso.namelists
.
How does the method _prepare_for_submission
work in practice?
You should start by checking if the input nodes passed in
inputdict
are logically sufficient to run an actual calculation. Remember to raise an exception (for exampleInputValidationError
) if something is missing or if something unexpected is found. Ideally, it is better to discover now if something is missing, rather than waiting the queue on the cluster and see that your job has crashed. Also, if there are some nodes left unused, you are gonna leave a DB more complicated than what has really been, and therefore is better to stop the calculation now.create an input file (or more if needed). In the Namelist plugin is done like:
input_filename = tempfolder.get_abs_path(self.INPUT_FILE_NAME) with open(input_filename,'w') as infile: # Here write the information of a ParameterData inside this # file
Note that here it all depends on how you decided the ParameterData to be written. In the namelists plugin we decided the convention that a ParameterData of the format:
ParameterData(dict={"INPUT":{'smearing':2, 'cutoff':30} })
is written in the input file as:
&INPUT smearing = 2, cutoff=30, /
Of course, it’s up to you to decide a convention which defines how to convert the dictionary to the input file. You can also impose some default values for simplicity. For example, the location of the scratch directory, if needed, should be imposed by the plugin and not by the user, and similarly you can/should decide the naming of output files.
Note
it is convenient to avoid hard coding of all the variables that your code has. The convention stated above is sufficient for all inputs structured as fortran cards, without the need of knowing which variables are accepted. Hard coding variable names implies that every time the external software is updated, you need to modify the plugin: in practice the plugin will easily become obsolete if poor maintained. Easyness of maintainance here win over user comfort!
copy inside this folder some auxiliary files that resides on your local machine, like for example pseudopotentials.
return a
CalcInfo
object.This object contains some accessory information. Here’s a template of what it may look like:
calcinfo = CalcInfo() calcinfo.uuid = self.uuid calcinfo.cmdline_params = settings_dict.pop('CMDLINE', []) calcinfo.local_copy_list = local_copy_list calcinfo.remote_copy_list = remote_copy_list ### Modify here and put a name for standard input/output files calcinfo.stdin_name = self.INPUT_FILE_NAME calcinfo.stdout_name = self.OUTPUT_FILE_NAME ### calcinfo.retrieve_list = [] ### Modify here ! calcinfo.retrieve_list.append('Every file/folder you want to store back locally') ### Modify here! calcinfo.retrieve_singlefile_list = [] return calcinfo
There are a couple of things to be set.
- stdin_name: the name of the standard input.
- stdin_name: the name of the standard output.
- cmdline_params: like parallelization flags, that will be used when running the code.
- retrieve_list: a list of relative file pathnames, that will be copied from the cluster to the aiida server, after the calculation has run on cluster. Note that all the file names you need to modify are not absolute path names (you don’t know the name of the folder where it will be created) but rather the path relative to the scratch folder.
- local_copy_list: a list of length-two-tuples: (localabspath, relativedestpath). Files to be copied from the aiida server to the cluster.
- remote_copy_list: a list of tuples: (remotemachinename, remoteabspath, relativedestpath). Files/folders to be copied from a remote source to a remote destination, sitting both on the same machine.
- retrieve_singlefile_list: a list of triplets, in the form
["linkname_from calc to singlefile","subclass of singlefile","filename"]
. If this is specified, at the end of the calculation it will be created a SinglefileData-like object in the Database, children of the calculation, if of course the file is found on the cluster.
If you need to change other settings to make the plugin work, you likely need to add more information to the calcinfo than what we showed here. For the full definition of
CalcInfo()
, refer to the sourceaiida.common.datastructures
.
That’s what is needed to write an input plugin.
To test that everything is done properly, remember to use the
calculation.submit_test()
method, which creates locally the folder
to be sent on cluster, without submitting the calculation on the cluster.
OutputPlugin¶
Well done! You were able to have a successful input plugin.
Now we are going to see what you need to do for an output plugin.
First of all let’s create a new folder:
$path_to_aiida/aiida/parsers/plugins/the_name_of_new_code
, and put there an
empty __init__.py
file.
Here you will write in a new python file the output parser class.
It is actually a rather simple class, performing only a few (but tedious) tasks.
After the calculation has been computed and retrieved from the cluster, that is, at the moment when the parser is going to be called, the calculation has two children: a RemoteData and a FolderData. The RemoteData is an object which represents the scratch folder on the cluster: you don’t need it for the parsing phase. The FolderData is the folder in the AiiDA server which contains the files that have been retrieved from the cluster. Moreover, if you specified a retrieve_singlefile_list, at this stage there is also going to be some children of SinglefileData kind.
Let’s say that you copied the standard output in the FolderData. The parser than has just a couple of tasks:
- open the files in the FolderData
- read them
- convert the information into objects that can be saved in the Database
- return the objects and the linkname.
Note
The parser should not save any object in the DB, that is
a task of the daemon: never use a .store()
method!
Basically, you just need to specify an __init__()
method, and a
function parse_with_retrieved(calc, retrieved)__
, which does the actual work.
The difficult and long part is the point 3, which is the actual parsing stage, which convert text into python objects. Here, you should try to parse as much as you can from the output files. The more you will write, the better it will be.
Note
You should not only parse physical values, a very important thing that could be used by workflows are exceptions or others errors occurring in the calculation. You could save them in a dedicated key of the dictionary (say ‘warnings’), later a workflow can easily read the exceptions from the results and perform a dedicated correction!
In principle, you can save the information in an arbitrary number of objects. The most useful classes to store the information back into the DB are:
ParameterData
: This is the DB representation of a python dictionary. If you put everything in a single ParameterData, then this could be easily accessed from the calculation with the.res
method. If you have to store arrays / large lists or matrices, consider using ArrayData instead.ArrayData
: If you need to store large arrays of values, for example, a list of points or a molecular dynamic trajectory, we strongly encourage you to use this class. At variance with ParameterData, the values are not stored in the DB, but are written to a file (mapped back in the DB). If instead you store large arrays of numbers in the DB with ParameterData, you might soon realize that: a) the DB grows large really rapidly; b) the time it takes to save an object in the DB gets very large.StructureData
: If your code relaxes an input structure, you can end up with an output structure.
Of course, you can create new classes to be stored in the DB, and use them at your own advantage.
A kind of template for writing such parser for the calculation class
NewCalculation
is as follows:
class NewParser(Parser):
"""
A doc string
"""
def __init__(self,calc):
"""
Initialize the instance of NewParser
"""
# check for valid input
if not isinstance(calc,NewCalculation):
raise ParsingError("Input must calc must be a NewCalculation")
super(NewParser, self).__init__(calc)
def parse_with_retrieved(self, retrieved):
"""
Parses the calculation-output datafolder, and stores
results.
:param retrieved: a dictionary of retrieved nodes, where the keys
are the link names of retrieved nodes, and the values are the
nodes.
"""
# check the calc status, not to overwrite anything
state = calc.get_state()
if state != calc_states.PARSING:
raise InvalidOperation("Calculation not in {} state"
.format(calc_states.PARSING) )
# retrieve the whole list of input links
calc_input_parameterdata = self._calc.get_inputs(type=ParameterData,
also_labels=True)
# then look for parameterdata only
input_param_name = self._calc.get_linkname('parameters')
params = [i[1] for i in calc_input_parameterdata if i[0]==input_param_name]
if len(params) != 1:
# Use self.logger to log errors, warnings, ...
# This will also add an entry to the DbLog table associated
# to the calculation that we are trying to parse, that can
# be then seen using 'verdi calculation logshow'
self.logger.error("Found {} input_params instead of one"
.format(params))
successful = False
calc_input = params[0]
# Check that the retrieved folder is there
try:
out_folder = retrieved[self._calc._get_linkname_retrieved()]
except KeyError:
self.logger.error("No retrieved folder found")
return False, ()
# check what is inside the folder
list_of_files = out_folder.get_folder_list()
# at least the stdout should exist
if not calc.OUTPUT_FILE_NAME in list_of_files:
raise QEOutputParsingError("Standard output not found")
# get the path to the standard output
out_file = os.path.join( out_folder.get_abs_path('.'),
calc.OUTPUT_FILE_NAME )
# read the file
with open(out_file) as f:
out_file_lines = f.readlines()
# call the raw parsing function. Here it was thought to return a
# dictionary with all keys and values parsed from the out_file (i.e. enery, forces, etc...)
# and a boolean indicating whether the calculation is successfull or not
# In practice, this is the function deciding the final status of the calculation
out_dict,successful = parse_raw_output(out_file_lines)
# convert the dictionary into an AiiDA object, here a
# ParameterData for instance
output_params = ParameterData(dict=out_dict)
# prepare the list of output nodes to be returned
# this must be a list of tuples having 2 elements each: the name of the
# linkname in the database (the one below, self.get_linkname_outparams(),
# is defined in the Parser class), and the object to be saved
new_nodes_list = [ (self.get_linkname_outparams(),output_params) ]
# The calculation state will be set to failed if successful=False,
# to finished otherwise
return successful, new_nodes_list
Example¶
In this example, we are supporting a code that performs a summation of two integers.
We try to imagine to create a calculation plugin that supports the code,
and that can be run using a script like this one
.
First, we need to create an executable on the remote machine (might be as well
your localhost if you installed a scheduler).
Therefore, put this script
on your remote
computer and install it as a code in AiiDA.
Such script take an input file as input on the command line, reads a JSON file
input and sums two keys that it finds in the JSON.
The output produced is another JSON file.
Therefore, we create an input plugin for a SumCalculation
, which can be done
with few lines as done in this file
aiida/orm/calculation/job/sum.py
.
The test can now be run, but the calculation Node will only have a RemoteData
and a retrieved FolderData which are not querable.
So, we create a parser (aiida/parsers/plugins/sum.py
)
which will read the output files and will create a ParameterData in output.
As you can see, with few lines we can support a new simple code. The most time consuming part in the development of a plugin is hidden for simplicity in this example. For the input plugin, this consists in converting the input Nodes into some files which are used by the calculation. For the parsers, the problem is opposite, and is to convert a text file produced by the executable into AiiDA objects. Here we only have a dictionary in input and output, so that its conversion to a from a JSON file can be done in one line, but in general, the difficulty of these operations depend on the details of the code you want to support.
Remember also that you can introduce new Data types to support new features or just to have a simpler and more intuitive interface. For example, the code above is not optimal if you want to pass the result of two SumCalculation to a third one and sum their results (the name of the keys of the output dictionary differs from the input). A relatively simple exercise you can do before jumping to develop the support for a serious code, try to create a new FloatData, which saves in the DB the value of a number:
class FloatData(Data):
@property
def value(self):
"""
The value of the Float
"""
return self.get_attr('number')
@value.setter
def value(self,value):
"""
Set the value of the Float
"""
self._set_attr('number',value)
Try to adapt the previous SumCalculation to acceps two FloatDatas as input and to produce an FloatData in output. Note that you can do this without changing the executable (a rather useless note in this example, but more interesting if you want to support a real code!).