# Advanced usage


## Save checkpoints, resume from a checkpoint

Nabu allows to save arbitrary steps to a file. This is useful when you want to resume the processing from a "checkpoint" without re-doing everything. A good example is manual estimation of the center of rotation: it's best to restart from the sinogram.

In the [configuration file](nabu_config_file), there is a section `[pipeline]` where you can set the checkpoints to save in `save_steps`.

For example, if you want to save checkpoints after phase retrieval and after sinogram generation, you can set `save_steps = phase, sinogram`.

Each processing step has a name. The one that make sense to save are: flatfield, double_flatfield, ccd_correction, projs_rot, phase_retrieval, unsharp_mask, mlog, radios_movements, sino_normalization, sinogram.


```{note}
Each "checkpoint" is a dump of all the data used for reconstruction.
This is usually not problematic for single-slice reconstruction.
But for full-volume reconstruction, this ends up duplicating all the data!
```

To resume the processing from a checkpoint, simply speficy the name of the step (see above for the steps names).

Both `save_steps` and `resume_from_step` can be activated. In this case:
  - If the checkpoint is found and valid, nabu will resume from it
  - Otherwise, nabu restarts from the beginning and saves the checkpoint

Importantly, nabu ensures that the checkpoint is valid with respect to the current configuration file.
If the configuration differs between the configuration file and the checkpoint, then nabu will restart all the processing.
For example, if the checkpoint was saved for `start_z = 1070, end_z = 1089`, then it can be used to reconstruct any slice (or subvolume) between `z=1070` and `z=1089` - otherwise the checkpoint will be recreated.


## Processing modes

Currently, nabu has two "processing modes" (i.e two pipelines implementations):
  1. "chunked mode" (faster, default)
  2. "grouped mode" (slower, fall-back)

Let's details these two modes.

### Chunked pipeline

In this mode, nabu loads several lines of **all** the projections. It's usually possible for standard-sized datasets (currently 2k x 2k images, 4k projections).

The rationale is that to reconstruct one (horizontal slice), all the angles are needed (sinogram), therefore we need to load at least one line in all projections.
Loading/storing a couple of lines of all the projections is usually fine in terms of memory.

The processing is first done in the "projections domain" (filtering, phase retrieval), then in the sinogram domain (normalization, de-ring, reconstruction).


### Grouped pipeline

In certain cases (eg. phase retrieval), many lines of each projection have to be read/stored at once, which might exceed the amount of memory.
For example, a dataset with 24k projections of 16k pixels wide, with Paganin phase retrieval using 50 pixels margin, will need 155 GB of memory (each sinogram is 1.6 GB).

When the chunk of data does not fit the RAM or GPU memory, a solution is to load only a part of the projections.

In python syntax: "if `projections[:, :100, :]` does not fit the memory, process `projections[:500, :100, :]`, then `projections[500:100, :100, :], etc`".

This processing mode ("groups of radios") can handle very large datasets (in theory any size), at the expense of speed.

```{warning}
Currently, nabu somehow duplicates the data with this mode. The sinogram is saved to disk in addition to the data. You can remove the sinogram (`sinogram_{output_prefix}.hdf5` and folder `sinogram_{output_prefix}`) when you are satisfied with the reconstruction.
```


### In the command line

Normally, nabu automatically picks the adapted pipeline mode, i.e it detects when loading `N` lines of all the projections is not possible.

If for some reason nabu does not switch automatically to the second mode, you can force it to do so:

`nabu nabu.conf --force_use_grouped_pipeline`

You can also tweak `--max_chunk_size` and `--phase_margin` parameters.


## Fine-tune the CPU/GPU resources

By default, nabu automatically estimates the amount of memory (RAM and GPU) needed.
In certain cases, this estimation fails and results with a crash (cuda out of memory or killed process).

The `nabu` command provides several parameters:

  - To limit the amount of GPU memory used, you can tweak `--gpu_mem_fraction` (0.8 by default).
  - To limit the amount of host RAM used, you can tweak `--cpu_mem_fraction` (0.8 by default). Note that the CPU memory estimation normally accounts for virtual constraints (eg. SLURM).


## Understanding speed bottlenecks

Nabu is designed to be as fast as reasonably possible (where reasonable means "without writing unmaintainable code").
Yet, reconstructing a slice/volume can be unacceptably slow.

Experience shows that the main bottlenecks are the following:
  - Reading data
  - Reconstruction step (FBP)

Note that writing is usually fast, because data is written by large blocks with buffered I/O.

### Reading data

The reading operation is usually slow, because the data is usually stored image-wise, i.e
  - One file per projection
  - Image-wise HDF5 chunks.

There is unfortunately little that can be done on the nabu side for this.
It has been found that using fast network file systems (GPFS) improve things a lot in certain conditions.

Using HDF5 files with contiguous data chunk (i.e `chunk=None`) or "tomography-friendly" chunks dramatically speeds up the reading, but it might penalize the acquisition side.

In the future, nabu might come with a "interleaved read/process" approach to hide reading latency.

```{caution}

When reading HDF5 data with image-wise chunks, subsampling (eg. `projections[:, :, ::10]`) results in a fearfully slow read. Don't use subsampling with HDF5 if you want to speed up reading !
```

### Reconstruction step

When it comes to pure processing (no I/O), FBP is the most demanding step.
All other operations are pixel-wise, or convolution-like, or FFT-like.
When slices are large (more than 4000 pixels), the FBP implementation becomes sub-optimal.

There is ongoing work to implement alternative, faster reconstruction algorithms.