AiiDA graphs

AiiDA offers tools for exporting selected parts of an AiiDA graph to a file for backup or sharing purposes.

Export

Use verdi export create to export a selection of nodes, users or computers.

  • Selection: Nodes can be selected via PK, or by exporting predefined groups of nodes.
  • Augmentation: By default, the export function augments a selection of nodes by their parents in order to preserve the provenance. For calculation nodes, the direct outputs are also added.
  • File content: The export file contains all information pertaining to the exported nodes: both the information stored in the database and files stored in the aiida repository.
  • Compression: By default, the export file is compressed using zip. Other options are available.

See verdi export create -h for a full list of available options.

Import

Use verdi import to import an AiiDA export file generated by verdi export.

  • Duplication: AiiDA will avoid identifier collisions and node duplication.

See verdi import -h for a full list of available options.

Export File format

An AiiDA export file is an archive of .zip or .tar.gz format with the following content:

  • metadata.json file containing information on the version of AiiDA as well as the database schema.
  • data.json file containing the exported nodes and their links.
  • nodes/ directory containing the repository files corresponding to the exported nodes.

metadata.json

This file contains important information and it is necessary for the correct interpretation of the data.json. Apart from the data schema, the AiiDA code and the export file version are also mentioned. This is used to avoid any incompatibilities among different versions of AiiDA. It should be noted that the schema described in metadata.json is related to the data itself - abstracted schema focused on the extracted information - and not how the data is stored in the database (database schema). This makes the import/export mechanism to be transparent to the database system used, backend selected and how the data is organised in the database (database schema).

Let’s have a look at the contents of the metadata.json:

{
    "conversion_info": [
        "Converted from version 0.2 to 0.3 with external script"
    ],
    "export_version": "0.3",
    "aiida_version": "0.10.0",
    "unique_identifiers": {
        "Computer": "uuid",
        "Group": "uuid",
        "User": "email",
        "Node": "uuid",
        "Attribute": null,
        "Link": null
    },
    "all_fields_info": {
        "Computer": {
            "description": {},
            "transport_params": {},
            "hostname": {},
            "enabled": {},
            "name": {},
            "transport_type": {},
            "metadata": {},
            "scheduler_type": {},
            "uuid": {}
        },
        "Link": {
            "input": {
                "related_name": "output_links",
                "requires": "Node"
            },
            "label": {},
            "type": {},
            "output": {
                "related_name": "input_links",
                "requires": "Node"
            }
        },
        "User": {
            "first_name": {},
            "last_name": {},
            "email": {},
            "institution": {}
        },
        "Node": {
            "nodeversion": {},
            "description": {},
            "dbcomputer": {
                "related_name": "dbnodes",
                "requires": "Computer"
            },
            "ctime": {
                "convert_type": "date"
            },
            "user": {
                "related_name": "dbnodes",
                "requires": "User"
            },
            "mtime": {
                "convert_type": "date"
            },
            "label": {},
            "type": {},
            "public": {},
            "uuid": {}
        },
        "Attribute": {
            "dbnode": {
                "related_name": "dbattributes",
                "requires": "Node"
            },
            "dval": {
                "convert_type": "date"
            },
            "datatype": {},
            "fval": {},
            "tval": {},
            "key": {},
            "ival": {},
            "bval": {}
        },
        "Group": {
            "description": {},
            "name": {},
            "user": {
                "related_name": "dbgroups",
                "requires": "User"
            },
            "time": {
                "convert_type": "date"
            },
            "type": {},
            "uuid": {}
        }
    }
}

At the beginning of the file, we see the version of the export file and the versions of the AiiDA code.

The entities that are exported are mentioned in the sequel with their unique identifiers. Knowing the unique IDs is useful for duplicate avoidance (in order to avoid the insertion of the node multiple times).

Then in the all_fields_info, the properties of each entity are mentioned. It is also mentioned the correlations with other entities. For example, the entity Node is related to a computer and a user. The corresponding entity names appear nested next to the properties to show this correlation.

data.json

A sample of the data.json file follows:

{
    "links_uuid": [
        {
            "output": "c208c9da-23b4-4c32-8f99-f9141ab28363",
            "label": "parent_calc_folder",
            "input": "eaaa114d-3d5b-42eb-a269-cf0e7a3a935d"
            "type": "inputlink"
        },
        ...
    ],
    "export_data": {
        "User": {
            "2": {
                "first_name": "AiiDA",
                "last_name": "theossrv2",
                "institution": "EPFL, Lausanne",
                "email": "aiida@theossrv2.epfl.ch"
            },
            ...
        },
        "Computer": {
            "1": {
                "name": "theospc14-direct_",
                "transport_params": "{}",
                "description": "theospc14 (N. Mounet's PC) with direct scheduler",
                "hostname": "theospc14.epfl.ch",
                "enabled": true,
                "transport_type": "ssh",
                "metadata": "{\"default_mpiprocs_per_machine\": 8, \"workdir\": \"/scratch/{username}/aiida_run/\", \"append_text\": \"\", \"prepend_text\": \"\", \"mpirun_command\": [\"mpirun\", \"-np\", \"{tot_num_mpiprocs}\"]}",
                "scheduler_type": "direct",
                "uuid": "fb7729ff-8254-4bc0-bbec-acbdb573cfe2"
            },
            ...
        },
        "Node": {
            "5921143": {
                "uuid": "628ba258-ccc1-47bf-bab7-8aee64b563ea",
                "description": "",
                "dbcomputer": null,
                "label": "",
                "user": 2,
                "mtime": "2016-08-21T11:55:53.132925",
                "nodeversion": 1,
                "type": "data.parameter.ParameterData.",
                "public": false,
                "ctime": "2016-08-21T11:55:53.118306"
            },
            "20063": {
                "uuid": "1024e35e-166b-4104-95f6-c1706df4ce15",
                "description": "",
                "dbcomputer": 1,
                "label": "",
                "user": 2,
                "mtime": "2016-02-16T10:33:54.095973",
                "nodeversion": 16,
                "type": "calculation.job.codtools.ciffilter.CiffilterCalculation.",
                "public": false,
                "ctime": "2015-10-02T20:08:06.628472"
            },
            ...
        }
    },
    "groups_uuid": {

    },
    "node_attributes_conversion": {
        "5921143": {
            "CONTROL": {
                "calculation": null,
                "restart_mode": null,
                "max_seconds": null
            },
            "ELECTRONS": {
                "electron_maxstep": null,
                "conv_thr": null
            },
            "SYSTEM": {
                "ecutwfc": null,
                "input_dft": null,
                "occupations": null,
                "degauss": null,
                "smearing": null,
                "ecutrho": null
            }
        },
        "20063": {
            "retrieve_list": [
                null,
                null,
                null,
                null
            ],
            "last_jobinfo": null,
            "scheduler_state": null,
            "parser": null,
            "linkname_retrieved": null,
            "jobresource_params": {
                "num_machines": null,
                "num_mpiprocs_per_machine": null,
                "default_mpiprocs_per_machine": null
            },
            "remote_workdir": null,
            "state": null,
            "max_wallclock_seconds": null,
            "retrieve_singlefile_list": [

            ],
            "scheduler_lastchecktime": "date",
            "job_id": null
        },
        ...
    },
    "node_attributes": {
        "5921143": {
            "CONTROL": {
                "calculation": "vc-relax",
                "restart_mode": "from_scratch",
                "max_seconds": 83808
            },
            "ELECTRONS": {
                "electron_maxstep": 100,
                "conv_thr": 3.6e-10
            },
            "SYSTEM": {
                "ecutwfc": 90.0,
                "input_dft": "vdw-df2-c09",
                "occupations": "smearing",
                "degauss": 0.02,
                "smearing": "cold",
                "ecutrho": 1080.0
            }
        },
        "20063": {
            "retrieve_list": [
                "aiida.out",
                "aiida.err",
                "_scheduler-stdout.txt",
                "_scheduler-stderr.txt"
            ],
            "last_jobinfo": "{\"job_state\": \"DONE\", \"detailedJobinfo\": \"AiiDA MESSAGE: This scheduler does not implement the routine get_detailed_jobinfo to retrieve the information on a job after it has finished.\", \"job_id\": \"13489\"}",
            "scheduler_state": "DONE",
            "parser": "codtools.ciffilter",
            "linkname_retrieved": "retrieved",
            "jobresource_params": {
                "num_machines": 1,
                "num_mpiprocs_per_machine": 1,
                "default_mpiprocs_per_machine": 8
            },
            "remote_workdir": "/scratch/aiida/aiida_run/10/24/e35e-166b-4104-95f6-c1706df4ce15",
            "state": "FINISHED",
            "max_wallclock_seconds": 900,
            "retrieve_singlefile_list": [

            ],
            "scheduler_lastchecktime": "2015-10-02T20:30:36.481951",
            "job_id": "13489"
        "6480111": {
        },
        ...
    }
}

At the start of the json file the links among the various AiiDA nodes are stated (links_uuid field). For every link the UUID (Universal unique identifiers) of the connected nodes but also the name of the link is mentioned.

Then the export data follows where for every entity the data appear. It is worth observing the references between the instances of the various entities. For example the DbNode with identifier 5921143 belongs to the user with identifier 2 and was generated by the computer with identifier 1.

The name of the entities is, for the moment, a reference to the model class of the Django backend. This stands for both backends (Django and SQLAlchemy) ensuring that the export files are cross-backend compatible. These names will change in the future to more abstract names.

If any groups are extracted, then they are mentioned in corresponding field (groups_uuid).

Attributes of the extracted nodes, are described in the ending part of the json file. The identifier of the corresponding node is used as a key for the attribute. The field node_attributes_conversion contains information regarding the type of the attribute. For example the dates are not inherently supported by JSON, so it is specified explicitly in the schema if the value of an attribute is of that specific type. After the node_attributes_conversion the node_attributes section follows with the actual values.