Current tomography processing software state
We now proceed to the analysis of the current ESRF tomography software, the solutions used outside of the ESRF, and the guidelines for software development at the ESRF.
Current situation: limitations and opportunities
Currently, at the ESRF, numerous codes and solutions exist for the processing of
tomographic data (similar situation to data acquisition).
Over time, each beamline (in some cases, each person on the same
beamline) has adopted different solutions for the processing of the collected tomographic
data. This is due to differences in the types of data, data collection modalities, and in the
approach to the data processing.
The current software is scattered over different repositories and machines, built on
different languages (namely: Matlab, Octave, and Python), and generally exhibiting poor
maintainability and performance.
The ESRF is also currently the main developer and maintainer of PyHST: a monolithic reconstruction code optimized for full-field tomography, that is largely based on a C core, with CUDA accelerated functions, and a Python interface. Despite having supported highly impactful studies for more than a decade, the now outdated design, the lack of systematic testing, and the scarce currently available resources do not allow proper maintenance of the code. Thus, in spite of the remarkable effort from DAU, many existing features are now broken beyond repair or outright disabled (like HDF5 file support). Moreover, it currently only offers a few specific reconstruction algorithms, and some limited pre-processing functionality.
The resources required for the maintenance and acceleration of all the existing solutions are large and unjustifiable. The indicated tomography based techniques (offered on the ESRF beamlines) present many common points, and allow a vast re-organization and unification of the processing codes.
Reference software outside ESRF
The currently most used software for tomographic data processing is TomoPy, which takes a rather simple, but effective approach. It bundles all the processing code in one package (in complete opposition to the current situation at the ESRF), and it organizes the said code in functional modules, with self-explanatory names. Most of TomoPy’s code is in Python, which greatly reduces the maintenance cost and increases accessibility for external users. The basic reconstruction routines are in C++ (based on the grid-rec algorithm), with python bindings, but it allows to transparently incorporate other reconstruction codes, based on external reconstruction engines. Typical examples are the use of UFO from KIT, and algorithms based on the ASTRA toolbox from UA (University of Antwerp, Belgium) and CWI (Centrum Wiskunde & Informatica, Amsterdam, The Netherlands).
In particular, the ASTRA toolbox is experiencing widespread adoption, as a back-end solution, for tomographic reconstruction across Europe. Key factors for this are: its extreme flexibility, very good computational performance, detailed documentation, good and responsive community support, and frequent trainings. The modularity and flexibility aspects, allow to employ it in very diverse scenarios, ranging from classic absorption and phase contrast applications, as in the Syrmep tomography code from Elettra (which uses TomoPy, with the ASTRA backend), to more exotic applications like the 6D grain reconstructions of DCT.
ESRF software guidelines
The ESRF promotes free software and the free exchange of such software. This translates
in the preferred choice of open programming platforms and open software licenses. This
policy allows to easily share the developments done at the ESRF with the users,
researchers, and other synchrotrons.
For instance, the Silx library is distributed under an open
source licence, and it is written in the Python language (open source platform).
The dependency on closed source packages, libraries or platforms should be avoided
whenever possible. This does not limit the possibility of using proprietary solutions as
optional replacements of the basic core logic offered by the ESRF code.
ESRF software development request procedure
Software development performed by DAU needs to be approved and scheduled through a
well defined procedure. Software development requests are usually collected in the form of
ECAPS (ESRF Collaborative Activities and Project System) projects. The original intent of
ECAPS projects was to allow global project organization, better tracking of resources or
progress, and to ease team work.
In practice, the current inter division developments are regulated by the following
procedure:
- Have a development idea (any time throughout the year).
- Submit ECAPS project proposal in October every year.
- Wait for directors’ approval in March.
- Wait for up to a year for the project to be tackled and concluded.
This procedure introduces a variable delay in the range between 6 months and 30 months
(0.5 to 2.5 years) for any code development from DAU. Moreover, this procedure is
incompatible with the planning of events in the short term, and with quick reaction to
user/beamline scientist/collaborator requests.
For these reasons, DAU does not allocate 100% of employes’ time with ECAPS projects.
This, in turn, allows for the allocation of time for tempestive intervention, in case urgent
fixes to existing code are required. However, this is still not compatible with tempestive
scheduling of larger projects.
In software development, the just described procedure of large ECAPS projects is
identified with the name of waterfall development (WD). Smaller development cycles are
also possible through "trouble tickets", but only for bug fixes.
However, the ESRF also allows to reduce the level of details tracked through ECAPS, and
to integrate it with other tools. The multi-year ECAPS project still provides milestones for
every year (or even the six months term), and it specifies the steering process. The
management of tasks is instead done with tools like gitlab (the software revision control
system used at the ESRF), which allows to organize "packets of work" in the form of
issues and to track them on a dashboard.
We propose to use the second solution, which borrows some concepts from agile
development (AD).