Current tomography processing software state

We now proceed to the analysis of the current ESRF tomography software, the solutions used outside of the ESRF, and the guidelines for software development at the ESRF.

Current situation: limitations and opportunities

Currently, at the ESRF, numerous codes and solutions exist for the processing of tomographic data (similar situation to data acquisition). Over time, each beamline (in some cases, each person on the same beamline) has adopted different solutions for the processing of the collected tomographic data. This is due to differences in the types of data, data collection modalities, and in the approach to the data processing.
The current software is scattered over different repositories and machines, built on different languages (namely: Matlab, Octave, and Python), and generally exhibiting poor maintainability and performance.

The ESRF is also currently the main developer and maintainer of PyHST: a monolithic reconstruction code optimized for full-field tomography, that is largely based on a C core, with CUDA accelerated functions, and a Python interface. Despite having supported highly impactful studies for more than a decade, the now outdated design, the lack of systematic testing, and the scarce currently available resources do not allow proper maintenance of the code. Thus, in spite of the remarkable effort from DAU, many existing features are now broken beyond repair or outright disabled (like HDF5 file support). Moreover, it currently only offers a few specific reconstruction algorithms, and some limited pre-processing functionality.

The resources required for the maintenance and acceleration of all the existing solutions are large and unjustifiable. The indicated tomography based techniques (offered on the ESRF beamlines) present many common points, and allow a vast re-organization and unification of the processing codes.

Reference software outside ESRF

The currently most used software for tomographic data processing is TomoPy, which takes a rather simple, but effective approach. It bundles all the processing code in one package (in complete opposition to the current situation at the ESRF), and it organizes the said code in functional modules, with self-explanatory names. Most of TomoPy’s code is in Python, which greatly reduces the maintenance cost and increases accessibility for external users. The basic reconstruction routines are in C++ (based on the grid-rec algorithm), with python bindings, but it allows to transparently incorporate other reconstruction codes, based on external reconstruction engines. Typical examples are the use of UFO from KIT, and algorithms based on the ASTRA toolbox from UA (University of Antwerp, Belgium) and CWI (Centrum Wiskunde & Informatica, Amsterdam, The Netherlands).

In particular, the ASTRA toolbox is experiencing widespread adoption, as a back-end solution, for tomographic reconstruction across Europe. Key factors for this are: its extreme flexibility, very good computational performance, detailed documentation, good and responsive community support, and frequent trainings. The modularity and flexibility aspects, allow to employ it in very diverse scenarios, ranging from classic absorption and phase contrast applications, as in the Syrmep tomography code from Elettra (which uses TomoPy, with the ASTRA backend), to more exotic applications like the 6D grain reconstructions of DCT.

ESRF software guidelines

The ESRF promotes free software and the free exchange of such software. This translates in the preferred choice of open programming platforms and open software licenses. This policy allows to easily share the developments done at the ESRF with the users, researchers, and other synchrotrons.
For instance, the Silx library is distributed under an open source licence, and it is written in the Python language (open source platform). The dependency on closed source packages, libraries or platforms should be avoided whenever possible. This does not limit the possibility of using proprietary solutions as optional replacements of the basic core logic offered by the ESRF code.

ESRF software development request procedure

Software development performed by DAU needs to be approved and scheduled through a well defined procedure. Software development requests are usually collected in the form of ECAPS (ESRF Collaborative Activities and Project System) projects. The original intent of ECAPS projects was to allow global project organization, better tracking of resources or progress, and to ease team work.
In practice, the current inter division developments are regulated by the following procedure:

  1. Have a development idea (any time throughout the year).
  2. Submit ECAPS project proposal in October every year.
  3. Wait for directors’ approval in March.
  4. Wait for up to a year for the project to be tackled and concluded.

This procedure introduces a variable delay in the range between 6 months and 30 months (0.5 to 2.5 years) for any code development from DAU. Moreover, this procedure is incompatible with the planning of events in the short term, and with quick reaction to user/beamline scientist/collaborator requests.
For these reasons, DAU does not allocate 100% of employes’ time with ECAPS projects. This, in turn, allows for the allocation of time for tempestive intervention, in case urgent fixes to existing code are required. However, this is still not compatible with tempestive scheduling of larger projects.

In software development, the just described procedure of large ECAPS projects is identified with the name of waterfall development (WD). Smaller development cycles are also possible through "trouble tickets", but only for bug fixes.
However, the ESRF also allows to reduce the level of details tracked through ECAPS, and to integrate it with other tools. The multi-year ECAPS project still provides milestones for every year (or even the six months term), and it specifies the steering process. The management of tasks is instead done with tools like gitlab (the software revision control system used at the ESRF), which allows to organize "packets of work" in the form of issues and to track them on a dashboard.
We propose to use the second solution, which borrows some concepts from agile development (AD).