How to publish reconstructed volume to the data portal?#
This tutorial explains how to publish a reconstructed volume (done by nabu) to the (ESRF) data portal.
Today the ESRF data portal catalog is based on DRAC (successor of ICAT). As switching is fresh, you should consider ICAT == DRAC as both names can be used interchangeably.
DRAC processed dataset#
There are two types of datasets in DRAC: raw datasets (published automatically by Bliss-tomo), and processed datasets (the ones we will publish in this tutorial)
A DRAC processed dataset is related to:
one or several raw datasets (usually one, but it can be several in the case of a stitching, for example) (a)
a set of metadata keys (voxel size, phase retrieval options, etc.) (b)
one beamline (c)
one proposal (d)
one dataset (e)
one folder (the folder containing the reconstructed volume) (f)
Retrieve all the data needed for ICAT#
(a) raw
parameter#
Path to the raw datasets. Source(s) of the processed dataset. It should be a tuple, but it can be a tuple of a single element.
You can get the original dataset path from an instance of TomoScanBaseInstance
by calling get_bliss_original_files()
.
Warning: this path can contain some ‘/mnt/multipath-shares’ prefix that shouldn’t be passed to ICAT/DRAC. To filter this you can use the ‘from_bliss_original_file_to_raw’ helper function.
from tomoscan.esrf.scan.utils import from_bliss_original_file_to_raw
(b) metadata
parameter#
The metadata to be published to ICAT can be obtained from an instance of VolumeBase
by calling the build_drac_metadata
function.
For example, for an HDF5Volume
you can have:
volume = HDF5Volume(
file_path=...,
data_path=...,
)
drac_metadata = volume.build_drac_metadata()
Note: there is a tutorial on volumes for more information.
Warning: at the moment, the DRAC metadata will not contain the ‘Sample_name’ field, which is mandatory (without it, there will be no processing done). So you will need to add it.
drac_metadata["Sample_name"] = ...
It can be obtained from the TomoScanBaseInstance
by calling scan.sample_name
.
Note: Available DRAC keys are defined here (see Tomo
group, reconstruction
section).
(c) beamline
parameter#
This is the name of the beamline, like ‘bm05’, ‘bm18’… (in lower case)
(d) proposal
parameter#
Name of the proposal.
(e) dataset
parameter#
Name of the dataset. This is the (processed) dataset in the DRAC context.
This dataset will create a key with the folder path at the DRAC level and it must be unique.
The default value we propose is ‘reconstructed_volumes’.
(f) path
parameter#
This is the path to the folder containing the reconstructed volume (by Nabu).
Warning 1: path should be cleaned of any ‘esrf mounting points’ like ‘/mnt/multipath-shares’ or ‘/gpfs/easy’. If needed you can use the ‘filter_esrf_mounting_points’ from tomoscan.esrf.scan.utils.
Warning 2: All files contained in this folder will be published to ICAT. There is no mechanism to publish a single file or a set of files.
Here is the recommended structure if path == ‘reconstructed_volumes’ and for an HDF5 reconstruction:
reconstructed_volumes
|
|------ nabu_rec.hdf5 - nabu reconstructed volume master file (1)
|------ nabu_rec
| |---------- nabu_rec_0000_0256.hdf5 - nabu reconstructed volume sub file 1
|------ gallery - gallery related to the processed dataset (2)
| |------ screenshot_1.png
| |------ screenshot_2.png
|------ nabu_cfg_files - folder containing nabu configuration files (3)
|------ nabu_config.cfg
The Nabu reconstructions. It can be replaced by a folder containing a volume with .tiff files.
Optional. A set of images (.png or .jpg) linked to the reconstructed volume, like 3 slices along each axis.
nabu_cfg_files: location of the configuration used to obtain the volume(s). In the future, it should be used to reprocess a volume.
Publication to DRAC / ICAT#
To publish a processed dataset to ICAT, we use pyicat_plus.
Instantiate the IcatClient
#
from pyicat_plus.client.main import IcatClient
icat_client = IcatClient(
metadata_urls=("bcu-mq-01.esrf.fr:61613", "bcu-mq-02.esrf.fr:61613")
)
Publish to ICAT#
icat_client.store_processed_data(
raw=raw, # (a)
metadata=metadata, # (b)
beamline="id16a", # (c)
proposal=self.inputs.proposal, # (d)
dataset="reconstructed_volumes",
path=path,
)
[ ]: