CI datasets#

All the datasets for the Continuous Integration (CI) have been moved to https://gitlab.esrf.fr/tomotools/ci_datasets

Tomoscan embed the necessary code to retrieve dataset from this project. The class to use is “GitlabProject”. Here is an example:

from tomoscan.tests.datasets import GitlabProject

# create the class to access the ci_datasets from the 'tomoscan' branch:
GitlabDataset = _GitlabProject(
    branch_name="tomoscan",
    host="https://gitlab.esrf.fr",
    cache_dir=os.path.join(
        os.path.dirname(__file__),
        "__archive__",
    ),
    token=None,
    project_id=4299,  # id of the project https://gitlab.esrf.fr/tomotools/ci_datasets
)

# access the dataset from
file = GitlabDataset.get_dataset(
    "h5_datasets/frm_edftomomill_twoentries.nx",
)

Cache#

Downloaded files will be saved under the ‘cache_dir’. When user request for a file if this is already in the cache then this won;t be downloaded again. Cache can be clear using clear_cache function

How to add new datasets#

The first target of the project is to give access to large dataset for test. So files are expected to use the LFS. If the extension is not already part of the track extension please run

git lfs track "*.iso"

To make sure the file expected to use LFS are correctly registered you can use:

git lfs ls-files

Your large file should be there.

Warning

Please before pushing the modifications to gitlab check that the branch size is still ‘sustainable’ (not taking too much space).