Processes

Before you start working with processes, make sure you have read and understood the basic concept. This section will explain the aspects of working with processes that apply to all the various types of processes. Details that only pertain to a specific sub type of process, will be documented in their respective sections:

Since all of these are types of processes, everything that will be explained in this section, will apply to each and everyone of them. That makes it very useful to read and understand this section well, as the concepts apply so broadly. However, for the same reason, at times this section may feel a bit abstract. It may therefore be advisable to start reading a section on one of the more specific processes listed above first, to get a more concrete example, and then simply refer back here for a more extensive explanation of the details.

Defining processes

Process specification

How a process defines the inputs that it requires or can optionally take, depends on the process type. The inputs of CalcJob and WorkChain are given by the ProcessSpec class, which is defined though the define() method. For process functions, the ProcessSpec is dynamically generated by the engine from the signature of the decorated function. Therefore, to determine what inputs a process takes, one simply has to look at the process specification in the define method or the function signature. For the CalcJob and WorkChain there is also the concept of the process builder, which will allow one to inspect the inputs with tab-completion and help strings in the shell.

The three most important attributes of the ProcessSpec are:

  • inputs
  • outputs
  • exit_codes

Through these attributes, one can define what inputs a process takes, what outputs it will produce and what potential exit codes it can return in case of errors. Just by looking at a process specification then, one will know exactly what will happen, just not how it will happen. The inputs and outputs attributes are namespaces that contain so called ports, each one of which represents a specific input or output. The namespaces can be arbitrarily nested with ports and so are called port namespaces. The port and port namespace are implemented by the Port and PortNamespace class, respectively.

Ports and Port namespaces

To define an input for a process specification, we only need to add a port to the inputs port namespace, as follows:

spec = ProcessSpec()
spec.input('parameters')

The input method, will create an instance of InputPort, a sub class of the base Port, and will add it to the inputs port namespace of the spec. Creating an output is just as easy, but one should use the output() method instead:

spec = ProcessSpec()
spec.output('result')

This will cause an instance of OutputPort, also a sub class of the base Port, to be created and to be added to the outputs specifcation attribute. Recall, that the inputs and output are instances of a PortNamespace, which means that they can contain any port. But the PortNamespace itself is also a port itself, so it can be added to another port namespace, allowing one to create nested port namespaces. Creating a new namespace in for example the inputs namespace is as simple as:

spec = ProcessSpec()
spec.input_namespace('namespace')

This will create a new PortNamespace named namespace in the inputs namespace of the spec. You can create arbitrarily nested namespaces in one statement, by separating them with a . as shown here:

spec = ProcessSpec()
spec.input_namespace('nested.namespace')

This command will result in the PortNamespace name namespace to be nested inside another PortNamespace called nested.

Note

Because the period is reserved to denote different nested namespaces, it cannot be used in the name of terminal input and output ports as that could be misinterpreted later as a port nested in a namespace.

Graphically, this can be visualized as a nested dictionary and will look like the following:

'inputs': {
    'nested': {
        'namespace': {}
    }
}

The outputs attribute of the ProcessSpec is also a PortNamespace just as the inputs, with the only different that it will create OutputPort instead of InputPort instances. Therefore the same concept of nesting through PortNamespaces applies to the outputs of a ProcessSpec.

Validation and defaults

In the previous section, we saw that the ProcessSpec uses the PortNamespace, InputPort and OutputPort to define the inputs and outputs structure of the Process. The underlying concept that allows this nesting of ports is that the PortNamespace, InputPort and OutputPort, are all a subclass of Port. And as different subclasses of the same class, they have more properties and attributes in common, for example related to the concept of validation and default values. All three have the following attributes (with the exception of the OutputPort not having a default attribute):

  • default
  • required
  • valid_type
  • validator

These attributes can all be set upon construction of the port or after the fact, as long as the spec has not been sealed, which means that they can be altered without limit as long as it is within the define method of the corresponding Process. An example input port that explicitly sets all these attributes is the following:

spec.input('positive_number', required=False, default=Int(1), valid_type=(Int, Float), validator=is_number_positive)

Here we define an input named positive_number that is not required, if a value is not explicitly passed, the default Int(1) will be used and if a value is passed, it should be of type Int or Float and it should be valid according to the is_number_positive validator. Note that the validator is nothing more than a free function which takes a single argument, being the value that is to be validated. If nothing is returned, the value is considered to be valid. To signal that the value is invalid and to have a validation error raised, simply return a string with the validation error message, for example:

def is_number_positive(number):
    if number < 0:
        return 'The number has to be greater or equal to zero'

The valid_type can define a single type, or a tuple of valid types.

Note

Note that by default all ports are required, but specifying a default value implies that the input is not required and as such specifying required=False is not necessary in that case. It was added to the example above simply for clarity.

The validation of input or output values with respect to the specification of the corresponding port, happens at the instantiation of the process and when it is finalized, respectively. If the inputs are invalid, a corresponding exception will be thrown and the process instantiation will fail. When the outputs fail to be validated, likewise an exception will be thrown and the process state will be set to Excepted.

Dynamic namespaces

In the previous section we described the various attributes related to validation and claimed that all the port variants share those attributes, yet we only discussed the InputPort and OutputPort explicitly. The statement, however, is still correct and the PortNamespace has the same attributes. You might then wonder what the meaning is of a valid_type or default for a PortNamespace if all it does is contain InputPorts, OutputPorts or other PortNamespaces. The answer to this question lies in the PortNamespace attribute dynamic.

Often when designing the specification of a Process, we cannot know exactly which inputs we want to be able to pass to the process. However, with the concept of the InputPort and OutputPort one does need to know exactly, how many value one expects at least, as they do have to be defined. This is where the dynamic attribute of the PortNamespace comes in. By default this is set to False, but by setting it to True, one indicates that that namespace can take a number of values that is unknown at the time of definition of the specification. This now explains the meaning of the valid_type, validator and default attributes in the context of the PortNamespace. If you do mark a namespace as dynamic, you may still want to limit the set of values that are acceptable, which you can do by specifying the valid type and or validator. The values that will eventually be passed to the port namespace will then be validated according to these rules exactly as a value for a regular input port would be.

Non storable inputs

In principle, the only valid types for inputs and outputs should be instances of a Data node, or one of its sub classes, as that is the only data type that can be recorded in the provenance graph as an input or output of a process. However, there are cases where you might want to pass an input to a process, whose provenance you do not care about and therefore would want to pass a non-database storable type anyway.

Note

AiiDA allows you to break the provenance as to be not too restrictive, but always tries to urge you and guide you in a direction to keep the provenance. There are legitimate reasons to break it regardless, but make sure you think about the implications and whether you are really willing to lose the information.

For this situation, the InputPort has the attribute non_db. By default this is set to False, but by setting it to True the port is marked that the values that are passed to it should not be stored as a node in the provenance graph and linked to the process node. This allows one to pass any normal value that one would also be able to pass to a normal function.

Automatic input serialization

Quite often, inputs which are given as python data types need to be cast to the corresponding AiiDA type before passing them to a process. Doing this manually can be cumbersome, so you can define a function when defining the process specification, which does the conversion automatically. This function, passed as serializer parameter to spec.input, is invoked if the given input is not already an AiiDA type.

For inputs which are stored in the database (non_db=False), the serialization function should return an AiiDA data type. For non_db inputs, the function must be idempotent because it might be applied more than once.

The following example work chain takes three inputs a, b, c, and simply returns the given inputs. The aiida.orm.nodes.data.base.to_aiida_type() function is used as serialization function.

# -*- coding: utf-8 -*-
from aiida.engine import WorkChain
from aiida.orm.nodes.data import to_aiida_type
# The basic types need to be loaded such that they are registered with
# the 'to_aiida_type' function.
from aiida.orm.nodes.data.base import *


class SerializeWorkChain(WorkChain):

    @classmethod
    def define(cls, spec):
        super(SerializeWorkChain, cls).define(spec)

        spec.input('a', serializer=to_aiida_type)
        spec.input('b', serializer=to_aiida_type)
        spec.input('c', serializer=to_aiida_type)

        spec.outline(cls.echo)

    def echo(self):
        self.out('a', self.inputs.a)
        self.out('b', self.inputs.b)
        self.out('c', self.inputs.c)

This work chain can now be called with native Python types, which will automatically converted to AiiDA types by the aiida.orm.nodes.data.base.to_aiida_type() function. Note that the module which defines the corresponding AiiDA type must be loaded for it to be recognized by aiida.orm.nodes.data.base.to_aiida_type().

#!/usr/bin/env runaiida
from aiida.engine import run

from serialize_workchain import SerializeWorkChain

if __name__ == '__main__':
    print(run(
        SerializeWorkChain,
        a=1, b=1.2, c=True
    ))
    # Result: {'a': 1, 'b': 1.2, 'c': True}

Of course, you can also use the serialization feature to perform a more complex serialization of the inputs.

Exit codes

Any Process most likely will have one or multiple expected failure modes. To clearly communicate to the caller what went wrong, the Process supports setting its exit_status. This exit_status, a positive integer, is an attribute of the process node and by convention, when it is zero means the process was successful, whereas any other value indicates failure. This concept of an exit code, with a positive integer as the exit status, is a common concept in programming and a standard way for programs to communicate the result of their execution.

Potential exit codes for the Process can be defined through the ProcessSpec, just like inputs and ouputs. Any exit code consists of a positive non-zero integer, a string label to reference it and a more detailed description of the problem that triggers the exit code. Consider the following example:

spec = ProcessSpec()
spec.exit_code(418, 'ERROR_I_AM_A_TEAPOT', 'the process had an identity crisis')

This defines an exit code for the Process with exit status 418 and exit message the work chain had an identity crisis. The string ERROR_I_AM_A_TEAPOT is a label that the developer can use to reference this particular exit code somewhere in the Process code itself.

Whenever a Process exits through a particular error code, the caller will be able to introspect it through the exit_status and exit_message attributes of the node. Assume for example that we ran a Process that threw the exit code described above, the caller would be able to do the following:

in[1] node = load_node(<pk>)
in[2] node.exit_status
out[2] 418
in[2] node.exit_message
out[2] 'the process had an identity crisis'

This is useful, because the caller can now programmatically, based on the exit_status, decide how to proceed. This is an infinitely more robust way of communcating specific errors to a non-human then parsing text based logs or reports. Additionally, The exit codes make it also very easy to query for failed processes with specific error codes.

Process metadata

Each process, in addition to the normal inputs defined through its process specifcation, can take optional ‘metadata’. These metadata differ from inputs in the sense that they are not nodes that will show up as inputs in the provenance graph of the executed process. Rather, these are inputs that slightly modify the behavior of the process or allow to set attributes on the process node that represents its execution. The following metadata inputs are available for all process classes:

  • label: will set the label on the ProcessNode
  • description: will set the description on the ProcessNode
  • store_provenance: boolean flag, by default True, that when set to False, will ensure that the execution of the process is not stored in the provenance graph

Sub classes of the Process class can specify further metadata inputs, refer to their specific documentation for details. To pass any of these metadata options to a process, simply pass them in a dictionary under the key metadata in the inputs when launching the process. How a process can be launched is explained the following section.

Launching processes

Any process can be launched by ‘running’ or ‘submitting’ it. Running means to run the process in the current python interpreter in a blocking way, whereas submitting means to send it to a daemon worker over RabbitMQ. For long running processes, such as calculation jobs or complex workflows, it is best advised to submit to the daemon. This has the added benefit that it will directly return control to your interpreter and allow the daemon to save intermediate progress during checkpoints and reload the process from those if it has to restart. Running processes can be useful for trivial computational tasks, such as simple calcfunctions or workfunctions, or for debugging and testing purposes.

Process launch

To launch a process, one can use the free functions that can be imported from the aiida.engine module. There are four different functions:

As the name suggest, the first three will ‘run’ the process and the latter will ‘submit’ it to the daemon. Running means that the process will be executed in the same interpreter in which it is launched, blocking the interpreter, until the process is terminated. Submitting to the daemon, in contrast, means that the process will be sent to the daemon for execution, and the interpreter is released straight away.

All functions have the exact same interface launch(process, **inputs) where:

  • process is the process class or process function to launch
  • inputs are the inputs as keyword arguments to pass to the process.

What inputs can be passed depends on the exact process class that is to be launched. For example, when we want to run an instance of the ArithmeticAddCalculation process, which takes two Int nodes as inputs under the name x and y [1], we would do the following:

from aiida import orm
from aiida.engine import submit

ArithmeticAddCalculation = CalculationFactory('arithmetic.add')
node = submit(ArithmeticAddCalculation, x=orm.Int(1), y=orm.Int(2))

The function will submit the calculation to the daemon and immediately return control to the interpreter, returning the node that is used to represent the process in the provenance graph.

Warning

Process functions, i.e. python functions decorated with the calcfunction or workfunction decorators, cannot be submitted but can only be run.

The run function is called identically:

from aiida import orm
from aiida.engine import run

ArithmeticAddCalculation = CalculationFactory('arithmetic.add')
result = run(ArithmeticAddCalculation, x=orm.Int(1), y=orm.Int(2))

except that it does not submit the process to the daemon, but executes it in the current interpreter, blocking it until the process is terminated. The return value of the run function is also not the node that represents the executed process, but the results returned by the process, which is a dictionary of the nodes that were produced as outputs. If you would still like to have the process node or the pk of the process node you can use one of the following variants:

from aiida import orm
from aiida.engine import run_get_node, run_get_pk

ArithmeticAddCalculation = CalculationFactory('arithmetic.add')
result, node = run_get_node(ArithmeticAddCalculation, x=orm.Int(1), y=orm.Int(2))
result, pk = run_get_pk(ArithmeticAddCalculation, x=orm.Int(1), y=orm.Int(2))

Finally, the run() launcher has two attributes get_node and get_pk that are simple proxies to the run_get_node() and run_get_pk() methods. This is a handy shortcut, as now you can choose to use any of the three variants with just a single import:

from aiida import orm
from aiida.engine import run

ArithmeticAddCalculation = CalculationFactory('arithmetic.add')
result = run(ArithmeticAddCalculation, x=orm.Int(1), y=orm.Int(2))
result, node = run.get_node(ArithmeticAddCalculation, x=orm.Int(1), y=orm.Int(2))
result, pk = run.get_pk(ArithmeticAddCalculation, x=orm.Int(1), y=orm.Int(2))

If you want to launch a process class that takes a lot more inputs, often it is useful to define them in a dictionary and use the python syntax ** that automatically expands it into keyword argument and value pairs. The examples used above would look like the following:

from aiida import orm
from aiida.engine import submit

ArithmeticAddCalculation = CalculationFactory('arithmetic.add')
inputs = {
    'x': orm.Int(1),
    'y': orm.Int(2)
}
node = submit(ArithmeticAddCalculation, **inputs)

Process functions, i.e. calculation functions and work functions, can be launched like any other process as explained above, with the only exception that they cannot be submitted. In addition to this limitation, process functions have two additional methods of being launched:

  • Simply calling the function
  • Using the internal run method attributes

Using a calculation function to add two numbers as an example, these two methods look like the following:

from aiida.engine import calcfunction
from aiida.orm import Int

@calcfunction
def add(x, y):
    return x + y

x = Int(1)
y = Int(2)

result = add(x, y)
result, node = add.run_get_node(x, y)
result, pk = add.run_get_pk(x, y)

Process builder

As explained in a previous section, the inputs for a CalcJob and WorkChain are defined in the define() method. To know then what inputs they take, one would have to read the implementation, which can be annoying if you are not a developer. To simplify this process, these two process classes provide a utility called the ‘process builder’. The process builder is essentially a tool that helps you build the inputs for the specific process class that you want to run. To get a builder for a particular CalcJob or a WorkChain implementation, all you need is the class itself, which can be loaded through the CalculationFactory and WorkflowFactory, respectively. Let’s take the ArithmeticAddCalculation as an example:

ArithmeticAddCalculation = CalculationFactory('arithmetic.add')
builder = ArithmeticAddCalculation.get_builder()

The string arithmetic.add is the entry point of the ArithmeticAddCalculation and passing it to the CalculationFactory will return the corresponding class. Calling the get_builder method on that class will return an instance of the ProcessBuilder class that is tailored for the ArithmeticAddCalculation. The builder will help you in defining the inputs that the ArithmeticAddCalculation requires and has a few handy tools to simplify this process.

To find out which inputs the builder exposes, you can simply use tab completion. In an interactive python shell, by simply typing builder. and hitting the tab key, a complete list of all the available inputs will be shown. Each input of the builder can also show additional information about what sort of input it expects. In an interactive shell, you can get this information to display as follows:

builder.code?
Type:        property
String form: <property object at 0x7f04c8ce1c00>
Docstring:
    "name": "code",
    "required": "True"
    "non_db": "False"
    "valid_type": "<class 'aiida.orm.nodes.data.code.Code'>"
    "help": "The Code to use for this job.",

In the Docstring you will see a help string that contains more detailed information about the input port. Additionally, it will display a valid_type, which when defined shows which data types are expected. If a default value has been defined, that will also be displayed. The non_db attribute defines whether that particular input will be stored as a proper input node in the database, if the process is submitted.

Defining an input through the builder is as simple as assigning a value to the attribute. The following example shows how to set the parameters input, as well as the description and label metadata inputs:

builder.metadata.label = 'This is my calculation label'
builder.metadata.description = 'An example calculation to demonstrate the process builder'
builder.x = Int(1)
builder.y = Int(2)

If you evaluate the builder instance, simply by typing the variable name and hitting enter, the current values of the builder’s inputs will be displayed:

builder
{
    'metadata': {
        'description': 'An example calculation to demonstrate the process builder',
        'label': 'This is my calculation label',
        'options': {},
    },
    'x': Int<uuid='a1798492-bbc9-4b92-a630-5f54bb2e865c' unstored>,
    'y': Int<uuid='39384da4-6203-41dc-9b07-60e6df24e621' unstored>
}

In this example, you can see the value that we just set for the description and the label. In addition, it will also show any namespaces, as the inputs of processes support nested namespaces, such as the metadata.options namespace in this example. Note that nested namespaces are also all autocompleted, and you can traverse them recursively with tab-completion.

All that remains is to fill in all the required inputs and we are ready to launch the process builder. When all the inputs have been defined for the builder, it can be used to actually launch the Process. The process can be launched by passing the builder to any of the free functions launch module, just as you would do a normal process as described above, i.e.:

from aiida import orm
from aiida.engine import submit

ArithmeticAddCalculation = CalculationFactory('arithmetic.add')

builder = ArithmeticAddCalculation.get_builder()
builder.x = orm.Int(1)
builder.y = orm.Int(2)

node = submit(builder)

Note that the process builder is in principle designed to be used in an interactive shell, as there is where the tab-completion and automatic input documentation really shines. However, it is perfectly possible to use the same builder in scripts where you simply use it as an input container, instead of a plain python dictionary.

Monitoring processes

When you have launched a process, you may want to investigate its status, progression and the results. The verdi command line tool provides various commands to do just this.

verdi process list

Your first point of entry will be the verdi command verdi process list. This command will print a list of all active processes through the ProcessNode stored in the database that it uses to represent its execution. A typical example may look something like the following:

  PK  Created     State           Process label                 Process status
----  ----------  ------------    --------------------------    ----------------------
 151  3h ago      ⏵ Running       ArithmeticAddCalculation
 156  1s ago      ⏹ Created       ArithmeticAddCalculation


Total results: 2

The ‘State’ column is a concatenation of the process_state and the exit_status of the ProcessNode. By default, the command will only show active items, i.e. ProcessNodes that have not yet reached a terminal state. If you want to also show the nodes in a terminal states, you can use the -a flag and call verdi process list -a:

  PK  Created     State              Process label                  Process status
----  ----------  ---------------    --------------------------     ----------------------
 143  3h ago      ⏹ Finished [0]     add
 146  3h ago      ⏹ Finished [0]     multiply
 151  3h ago      ⏵ Running          ArithmeticAddCalculation
 156  1s ago      ⏹ Created          ArithmeticAddCalculation


Total results: 4

For more information on the meaning of the ‘state’ column, please refer to the documentation of the process state. The -S flag let’s you query for specific process states, i.e. issuing verdi process list -S created will return:

  PK  Created     State           Process label                  Process status
----  ----------  ------------    --------------------------     ----------------------
 156  1s ago      ⏹ Created       ArithmeticAddCalculation


Total results: 1

To query for a specific exit status, one can use verdi process list -E 0:

  PK  Created     State             Process label                 Process status
----  ----------  ------------      --------------------------    ----------------------
 143  3h ago      ⏹ Finished [0]    add
 146  3h ago      ⏹ Finished [0]    multiply


Total results: 2

This simple tool should give you a good idea of the current status of running processes and the status of terminated ones. For a complete list of all the available options, please refer to the documentation of verdi process.

If you are looking for information about a specific process node, the following three commands are at your disposal:

  • verdi process report gives a list of the log messages attached to the process
  • verdi process status print the call hierarchy of the process and status of all its nodes
  • verdi process show print details about the status, inputs, outputs, callers and callees of the process

In the following sections, we will explain briefly how the commands work. For the purpose of example, we will show the output of the commands for a completed PwBaseWorkChain from the aiida-quantumespresso plugin, which simply calls a PwCalculation.

verdi process report

The developer of a process can attach log messages to the node of a process through the report() method. The verdi process report command will display all the log messages in chronological order:

2018-04-08 21:18:51 [164 | REPORT]: [164|PwBaseWorkChain|run_calculation]: launching PwCalculation<167> iteration #1
2018-04-08 21:18:55 [164 | REPORT]: [164|PwBaseWorkChain|inspect_calculation]: PwCalculation<167> completed successfully
2018-04-08 21:18:56 [164 | REPORT]: [164|PwBaseWorkChain|results]: work chain completed after 1 iterations
2018-04-08 21:18:56 [164 | REPORT]: [164|PwBaseWorkChain|on_terminated]: remote folders will not be cleaned

The log message will include a timestamp followed by the level of the log, which is always REPORT. The second block has the format pk|class name|function name detailing information about, in this case, the work chain itself and the step in which the message was fired. Finally, the message itself is displayed. Of course how many messages are logged and how useful they are is up to the process developer. In general they can be very useful for a user to understand what has happened during the execution of the process, however, one has to realize that each entry is stored in the database, so overuse can unnecessarily bloat the database.

verdi process status

This command is most useful for WorkChain instances, but also works for CalcJobs. One of the more powerful aspect of work chains, is that they can call CalcJobs and other WorkChains to create a nested call hierarchy. If you want to inspect the status of a work chain and all the children that it called, verdi process status is the go-to tool. An example output is the following:

PwBaseWorkChain <pk=164> [ProcessState.FINISHED] [4:results]
    └── PwCalculation <pk=167> [FINISHED]

The command prints a tree representation of the hierarchical call structure, that recurses all the way down. In this example, there is just a single PwBaseWorkChain which called a PwCalculation, which is indicated by it being indented one level. In addition to the call tree, each node also shows its current process state and for work chains at which step in the outline it is. This tool can be very useful to inspect while a work chain is running at which step in the outline it currently is, as well as the status of all the children calculations it called.

verdi process show

Finally, there is a command that displays detailed information about the ProcessNode, such as its inputs, outputs and the optional other processes it called and or was called by. An example output for a PwBaseWorkChain would look like the following:

Property       Value
-------------  ------------------------------------
type           WorkChainNode
pk             164
uuid           08bc5a3c-da7d-44e0-a91c-dda9ddcb638b
label
description
ctime          2018-04-08 21:18:50.850361+02:00
mtime          2018-04-08 21:18:50.850372+02:00
process state  ProcessState.FINISHED
exit status    0
code           pw-v6.1

Inputs            PK  Type
--------------  ----  -------------
parameters       158  Dict
structure        140  StructureData
kpoints          159  KpointsData
pseudo_family    161  Str
max_iterations   163  Int
clean_workdir    160  Bool
options          162  Dict

Outputs              PK  Type
-----------------  ----  -------------
output_band         170  BandsData
remote_folder       168  RemoteData
output_parameters   171  Dict
output_array        172  ArrayData

Called      PK  Type
--------  ----  -------------
CALL       167  PwCalculation

Log messages
---------------------------------------------
There are 4 log messages for this calculation
Run 'verdi process report 164' to see them

This overview should give you all the information if you want to inspect a process’ inputs and outputs in closer detail as it provides you their pk’s.

Manipulating processes

To understand how one can manipulate running processes, one has to understand the principles of the process/node distinction and a process’ lifetime first, so be sure to have read those sections first.

verdi process pause/play/kill

The verdi command line interface provides three commands to interact with ‘live’ processes.

  • verdi process pause
  • verdi process play
  • verdi process kill

The first pauses a process temporarily, the second resumes any paused processes and the third one permanently kills them. The sub command names might seem to tell you this already and it might look like that is all there is to know, but the functionality underneath is quite complicated and deserves additional explanation nonetheless.

As the section on the distinction between the process and the node explained, manipulating a process means interacting with the live process instance that lives in the memory of the runner that is running it. By definition, these runners will always run in a different system process then the one from which you want to interact, because otherwise, you would be the runner, given that there can only be a single runner in an interpreter and if it is running, the interpreter would be blocked from performing any other operations. This means that in order to interact with the live process, one has to interact with another interpreter running in a different system process. This is once again facilitated by the RabbitMQ message broker. When a runner starts to run a process, it will also add listeners for incoming messages that are being sent for that specific process over RabbitMQ.

Note

This does not just apply to daemon runners, but also normal runners. That is to say that if you were to launch a process in a local runner, that interpreter will be blocked, but it will still setup the listeners for that process on RabbitMQ. This means that you can manipulate the process from another terminal, just as if you would do with a process that is being run by a daemon runner.

In the case of ‘pause’, ‘play’ and ‘kill’, one is sending what is called a Remote Procedure Call (RPC) over RabbitMQ. The RPC will include the process identifier for which the action is intended and RabbitMQ will send it to whoever registered itself to be listening for that specific process, in this case the runner that is running the process. This immediately reveals a potential problem: the RPC will fall on deaf ears if there is no one listening, which can have multiple causes. For example, as explained in the section on a process’ lifetime, this can be the case for a submitted process, where the corresponding task is still queued, as all available process slots are occupied. But even if the task were to be with a runner, it might be too busy to respond to the RPC and the process appears to be unreachable. Whenever a process is unreachable for an RPC, the command will return an error:

Error: Process<100> is unreachable

Depending on the cause of the process being unreachable, the problem may resolve itself automatically over time and one can try again at a later time, as for example in the case of the runner being too busy to respond. However, to prevent this from happening, the runner has been designed to have the communication happen over a separate thread and to schedule callbacks for any necessary actions on the main thread, which performs all the heavy lifting. This should make occurrences of the runner being too busy to respond very rare. If you think the The problem is, however, there is unfortunately no way of telling what the actual problem is for the process not being reachable. The problem will manifest itself identically if the runner just could not respond in time or if the task has accidentally been lost forever due to a bug, even though these are two completely separate situations.

This brings us to another potential unintuitive aspect of interacting with processes. The previous paragraph already mentioned it in passing, but when a remote procedure call is sent, it first needs to be answered by the responsible runner, if applicable, but it will not directly execute the call. This is because the call will be incoming on the communcation thread who is not allowed to have direct access to the process instance, but instead it will schedule a callback on the main thread who can perform the action. The callback will however not necessarily be executed directly, as there may be other actions waiting to be performed. So when you pause, play or kill a process, you are not doing so directly, but rather you are scheduling a request to do so. If the runner has successfully received the request and scheduled the callback, the command will therefore show something like the following:

Success: scheduled killing Process<100>

The ‘scheduled’ indicates that the actual killing might not necessarily have happened just yet. This means that even after having called verdi process kill and getting the success message, the corresponding process may still be listed as active in the output of verdi process list.

By default, the pause, play and kill commands will only ask for the confirmation of the runner that the request has been scheduled and not actually wait for the command to have been executed. This is because, as explained, the actual action being performed might not be instantaneous as the runner may be busy working with other processes, which would mean that the command would block for a long time. If you want to send multiple requests to a lot of processes in one go, this would be ineffective, as each one would have to wait for the previous one to be completed. To change the default and actually wait for the action to be completed and await its response, you can use the --wait flag. If you know that your daemon runners may be experiencing a heavy load, you can also increase the time that the command waits before timing out, with the -t/--timeout flag.

Footnotes

[1]Note that the ArithmeticAddCalculation process class also takes a code as input, but that has been omitted for the purposes of the example.