Caching: implementation details¶
This section covers some details of the caching mechanism which are not discussed in the user guide. If you are developing a plugin and want to modify the caching behavior of your classes, we recommend you read this section first.
Controlling hashing¶
Below are some methods you can use to control how the hashes of calculation and data classes are computed:
To ignore specific attributes, a
Node
subclass can have a_hash_ignored_attributes
attribute. This is a list of attribute names, which are ignored when creating the hash.For calculations, the
_hash_ignored_inputs
attribute lists inputs that should be ignored when creating the hash.To add things which should be considered in the hash, you can override the
_get_objects_to_hash()
method. Note that doing so overrides the behavior described above, so you should make sure to use thesuper()
method.Pass a keyword argument to
get_hash()
. These are passed on tomake_hash()
.
Controlling caching¶
There are two methods you can use to disable caching for particular nodes:
The
is_valid_cache()
property determines whether a particular node can be used as a cache. This is used for example to disable caching from failed calculations.Node classes have a
_cachable
attribute, which can be set toFalse
to completely switch off caching for nodes of that class. This avoids performing queries for the hash altogether.
The WorkflowNode
example¶
As discussed in the user guide, nodes which can have RETURN
links cannot be cached.
This is enforced on two levels:
The
_cachable
property is set toFalse
in theProcessNode
, and only re-enabled inCalcJobNode
andCalcFunctionNode
. This means that aWorkflowNode
will not be cached.The
_store_from_cache
method, which is used to “clone” an existing node, will raise an error if the existing node has anyRETURN
links. This extra safe-guard prevents cases where a user might incorrectly override the_cachable
property on aWorkflowNode
subclass.
Design guidelines¶
When modifying the hashing/caching behaviour of your classes, keep in mind that cache matches can go wrong in two ways:
False negatives, where two nodes should have the same hash but do not
False positives, where two different nodes get the same hash by mistake
False negatives are highly preferrable because they only increase the runtime of your calculations, while false positives can lead to wrong results.