Architecture
Each software has some kind of architecture, and this is the place to describe it in broad terms, to make it easier for developers to get around the code. Following the scheme of a layered architecture as featured by the “Clean Architecture” (Robert Martin) or the “hexagonal architecture”, alternatively known as “ports and adapters” (Alistair Cockburn), the different layers are described successively from the inside out.
Just to make matters a bit more interesting, the evedata package has basically two disjunctive interfaces: one towards the data file format (eveH5), the other towards the users of the actual data, e.g. the radiometry
and evedataviewer
packages. See Fig. 3 for a first overview.
Fig. 3 The two interfaces of the evedata package: eveH5 is the persistence layer of the recorded data and metadata, and evedataviewer and radiometry are Python packages for graphically displaying and processing and analysing these data, respectively. Most people concerned with the actual data will either use evedataviewer or the radiometry package, but not evedata directly.
An alternative view that may be more helpful in the long run, leading to a better overall architecture, rests on the distinction between two dimensions of layers: functional and technical. While for long time, the first line of organisation in code was technical layers, grouping software into functional blocks (packages or whatever the name) that are each independently technically layered preserves the concepts of the application much better. A first idea for the evedata package is presented in Fig. 4.
Fig. 4 A two-dimensional view of the architecture, with technical and functional layers. The primary line of organisation in the code, according to different authors, should be the functional layers or packages, and each of the functional blocks should be organised into three technical layers: boundaries, controllers, entities. The idea and name of these three technical layers goes back to Ivar Jacobson (1992). The boundaries (sometimes called interfaces) contain two different kinds of elements: facades and resources. While resources are typically concerned with the persistence layer and similar things, facades are the user-facing elements providing access to the underlying entities and controllers.
There are three functional layers with dependencies in one direction: measurement, evefile, and scan. Each of the functional layers is divided into three technical layers, i.e. boundaries (“interfaces”), controllers, and entities (BCE). The boundaries layer contains both, the “interfaces” (“adapters”) pointing to the user (facades) and those pointing to the underlying infrastructure (resources). Here, “user” can be either actual human users or other functional layers. This means that high-level functional layers shall not directly depend on elements of the lower technical layers of lower-level functional layers, but use the respective facades of the lower-level functional layers. As an example for the evedata
package: the measurement functional layer uses the evefile and scan facades of the evefile and scan functional layers.
A corresponding UML package diagram for Fig. 4 is shown in the figure below. Here, only the dependencies within the individual functional layers are shown, not the dependencies between the functional layers. The latter proceed from left to right: measurement > evefile > scan.
Fig. 5 An UML package diagram of the evedata package following the organisation in functional layers that each contain three technical layers, as shown in Fig. 4. To hide the names of the technical layers from the user, one could think of importing the relevant classes (basically the facades) in the __init__.py
files of the respective top-level functional packages.
For each of the functional layers, the corresponding technical layers are described below. Deviating from the direction of dependencies as shown in Fig. 4, we start with the evefile functional layer, and for each of the layers we start with the entities and proceed via the controllers to the boundaries. From a user perspective interested in measured data, the journey starts with the data file (eveH5), represented on a low level by the evefile functional layer and on a high level as user interface by the measurement functional layer. The scan functional layer representing the information originally contained in the SCML file, while technically at the bottom of the dependencies chain, is the least interesting from a user’s perspective primarily interested in the data, and is probably the layer fully implemented last.
General remarks on the UML class diagrams
The UML class diagrams in this document try to consistently follow a series of conventions listed below. This list is not meant to be exhaustive and may change over time.
Capitalising attribute types
Attribute types that are default types of the (Python) language are not capitalised.
Attribute types that are instances of self-defined classes are capitalised and spelled exactly as the corresponding class.
Singular and plural forms of attributes
Scalar attributes have singular names.
Attributes containing containers (lists, dictionaries, …) have plural names.
Naming conventions
Generally, naming conventions follow PEP8: class names are in CamelCase, attributes and methods in snake_case.
Attributes of enumerations
No convention has yet been agreed upon. Possibilities would be ALLCAPS (as the attributes could be interpreted as constants) or snake_case.
Dictionaries
Attributes that contain dictionaries as container have the container type followed by curly braces
{}
, although this seems not to be part of the UML standard.
Important
Partly due to the conventions for the UML class diagrams outlined above and due to the reasons leading to these conventions in the first place, the data model described in the UML class diagrams differs often in subtle details of attribute names from the currently existing data models and, e.g., the SCML schema definition. Eventually, it would be good to agree upon a list of conventions and try to consistently apply them throughout the different interconnected parts (SCML, GUI, engine, evedata, …). These conventions are primarily concerned with a shared vocabulary for the concepts, not with CamelCase vs. snake_case and alike, as this will differ for different languages (and we can agree on mapping rules).
Evefile
Generally, the evefile functional layer, as mentioned already in the Concepts section, provides the interface towards the persistence layer (eveH5 files). This is a rather low-level interface focussing on a faithful representation of all information available in an eveH5 file as well as the corresponding scan description (SCML), as long as the latter is available.
Furthermore, the evefile functional layer provides a stable abstraction of the contents of an eveH5 file and is hence not concerned with different versions of both, the eveH5 file structure and the SCML schema. The data model provided via its entities needs to be powerful (and modular) enough to allow for representing all currently existing data files (regardless of their eveH5 and SCML schema versions) and future-proof to not change incompatibly (open-closed principle, the “O” in “SOLID) when new requirements arise.
Important
As the evefile functional layer is not meant as a (human-facing) user interface, it is not concerned with concepts such as “fill modes” (joining), but represents the data “as is”. This means that the different data can generally not be plotted against each other. This is a deliberate decision, as joining data for a (two-dimensional) data array, although generally desirable for (simple) plotting purposes, masks/removes some highly important information, e.g. whether a value has not been measured in the first place, or whether obtaining a value has failed for some reason.
Entities
Entities are the innermost technical layer: everything depends on the entities, but the entities depend on nothing but themselves. Furthermore, entities may have little to no behaviour (i.e., data classes). For the evefile functional layer, the entities consist of three modules: file, data, and metadata, in the order of their dependencies.
file module
Despite the opposite chain of dependencies, starting with the file
module seems sensible, as its File
class represents a single eveH5 file and provides kind of an entry point.
Fig. 6 Class hierarchy of the evefile.entities.file
module. The File
class is sort of the central interface to the entire subpackage, as this class provides a faithful representation of all information available from a given eveH5 file. To this end, it incorporates instances of classes of the other modules of the subpackage. Furthermore, “Scan” inherits from the identically named facade of the scan functional layer and contains the full information of the SCML file (if the SCML file is present in the eveH5 file).
Points to discuss further (without claiming to be complete)
Split “Scan” and “Station”, similarly to how they are stored in eveH5 in the future?
The
scan.boundaries.scan
subpackage now contains two facades:Scan
with the scan description andStation
with the description of all devices (machine, beamline, setup), currently “setup” in the SCML.The setup part of the SCML file sent to the engine is not necessarily identical to the XML file with the setup description loaded by the engine. Hence, it may make sense to have both stored separately, or have a “Scan” facade that contains only the scan description, and a “Station” facade containing the information on the relevant devices.
Although the engine saves the SCML file to the HDF5 file, it saves the SCML as obtained when submitting a scan. The “only” part the engine currently looks up in the station-related XML (
messplatz.xml
) is the information necessary for the dynamic snapshots. This somewhat unclear situation led and still leads sometimes to confusion. Which part of what file is saved where in theFile
class remains to be seen.
Some comments (not discussions any more):
“data”, “snapshots”, “monitors”: lists or dicts?
In the meantime, the three attributes are modelled as dictionaries. How about modelling them as dictionaries, with the keys being the names of the corresponding datasets (i.e., the last part of the path within the HDF5 file).
Organisation of datasets in main according to the scan module structure
Despite the current structure of the eveH5 files, datasets will be organised and split according to their use in the different scan modules. In case no SCML is available, a “dummy” scan module will be created containing all the datasets in main.
data module
Data are organised in “datasets” within HDF5, and the data
module provides the relevant entities to describe these datasets. Although currently (as of 08/2024, eve version 2.1) neither average nor interval detector channels save the individual data points, at least the former is a clear need of the engineers/scientists. Hence, the data model already respects this use case. As per position (count) there can be a variable number of measured points, the resulting array is no longer rectangular, but a “ragged array”. While storing such arrays is possible directly in HDF5, the implementation within evedata is entirely independent of the actual representation in the eveH5 file.
Fig. 7 Class hierarchy of the data
module. Each class has a corresponding metadata class in the metadata
module. While in this diagram, some child classes seem to be identical, they have a different type of metadata (see the metadata
module below). Generally, having different types serves to discriminate where necessary between detector channels and motor axes. For details on the ArrayChannelData
and AreaChannelData
classes, see Fig. 8 and Fig. 10, respectively. You may click on the image for a larger view.
Points to discuss further (without claiming to be complete)
Currently none… ;-)
Some comments (not discussions any more, though):
MonitorData with more than one value per time
MonitorData should have only one value per time, although it can currently not completely be excluded that the same value is monitored multiple times, most probably resulting in identical values at identical times, see #7688, note-11. However, this should be regarded as a bug (and if actually occurring in an eveH5 file, treated in some deterministic way). A special case are monitor data occurring before starting the actual scan, as these all get the special timestamp
-1
, see #7688, note-10. In this case, only the last (youngest) data point should be retained/used.Values of MonitorData
MonitorData can have textual (non-numeric) values. This should not be too much of a problem given that numpy can handle string arrays (though <v2.0 only fixed-size string values. Hence, evedata may need to depend on numpy>=2.0).
raw (i.e. individual) values of AverageChannelData and IntervalChannelData
Currently, the measurement program only collects the average values in both cases. However, there is the frequent request to collect the raw values as well. The data structure already supports this.
References to external data/files
There are measurements where for a given position count spectra (1D) or entire images (2D) are recorded. At least for the latter, the data usually reside in external files. Currently, the file name (including the full path, starting with which version of the eveH5 schema?) is stored as value in the dataset in these cases. For a discussion, see #7732. An additional complication: historically, the has been some mismatch between file number stored in the HDF5 dataset and actual file number. Hence, some way of correcting the mapping after reading the file needs to be possible.
Generally, spectra (1D data per position count) contained within an eveH5 file in the “arraydata” group are modelled as
ArrayChannelData
, with the_data
attribute being a 2D numpy array. In case of storing images (2D data per position count), these data are modelled asAreaChannelData
, with the_data
attribute being either a list of 2D/3D numpy arrays (containing the image data for one or different channels), a numpy array of arrays, or a 3D/4D array.While for a usual HDF5 dataset, the
DataImporter
object contains the eveH5 filename and dataset path for accessing the data, in case of external data, it contains a list of external filenames/references.Joined (“filled”) data
Only axis data can and will be filled, and they will be filled differently depending on the channel data they are plotted against.
If we allow several channels to be plotted against one axis, things will get slightly more involved, as the axis data need to be joined with respect to both channels in this case, and probably the channel data filled with NaN values as well. Alternatives would be to have the axis data joined individually for the individual channels, or to delete those points in the channel datasets where the other channel(s) don’t have corresponding values (hooray, there we are again with our different “fill modes”…).
Joining takes place by objects located in the controllers technical layer of the measurement functional layer, and the joined data will be stored in a separate attribute, retaining the original unfilled data.
Detector channels that are redefined within an experiment/scan
Generally, detector channels can be redefined within an experiment/scan, i.e. can have different operational modes (standard/average vs. interval) in different scan modules. Currently (eveH5 v7), all data are stored in the identical dataset on HDF5 level and only by “informed guessing” (if at all possible) can one deduce that they served different purposes. Generally, we need separate datasets on the HDF5 level for detector channels that change their type or attributes within a scan, see #6879, note 16.
The current state of affairs (as of 06/2024) regarding a new eveH5 scheme (v8) is to separate single-point channels from average and interval channels and have average and interval channel datasets per se be suffixed by the scan module ID. Given that one and the same channel can only be used once in a scan module, this should be unique.
While the future way of storing those detector channels in eveH5 files is discussed in #7726, we need a solution for legacy data solving two problems:
separating the values for the different channels into separate datasets
This is rather complicated, but probably possible by looking at the different HDF5 datasets where present – although this would require reading the data of the HDF5 datasets if corresponding datasets are available in the “averagemeta” or/and “standarddev” group to check for changes in these data.
Separating the data is but only necessary if corresponding datasets are available in the “averagemeta” or/and “standarddev” groups. I.e., loading the data needs only to happen once this condition is met. However, as soon as this condition is met, data for legacy files need to be loaded to separate the data into separate datasets and not to have the surprise afterwards when first accessing the presumably single detector channel to all of a sudden have it split into several datasets.
sensibly naming the resulting multiple datasets.
Generally, the same strategy as proposed for the new eveH5 scheme should be used here, i.e. suffixing the average and interval detector channels with the scan module ID. Given that one and the same channel can only be used once in a scan module, this should be unique. The type of detector channel can be deduced from the class type.
Getting the scan module ID requires to read the SCML, though, as usually, the SMCounter pseudo-detector channel will not be present. Furthermore, mapping position counts to scan modules is far from simple. Hence, an alternative option may be to suffix the respective datasets with increasing integer numbers, without relation to the scan module ID.
Additional class
DeviceData
, but notOptionData
Devices seem currently only to be saved as monitors in the “device” section of the eveH5 file and appear as
MonitorData
. Generally, starting with eve v1.32, all pre-/postscan devices (and options) are automatically stored as monitors, i.e. in the “devices” section of the eveH5 file.When timestamps of monitor data should be mapped to position counts (while retaining the original monitor data), this most probably means to create new instances of (subclasses of)
MeasureData
. If these monitor data are devices, this is the case forDeviceData
.Options should generally be mapped to the respective classes the options belong to. For options, we additionally need to distinguish between “scalar” options that do not change within one scan module (and should in the future appear as attributes on the HDF5 level), and options whose values need to be saved for each individual position count (and should in the future appear as additional dataset columns on the HDF5 level).
As of now, scalar options appear as dictionary
options
in theMetadata
class hierarchy, while variable options with individual values per position count appear as dictionaryoptions
in theData
class hierarchy. Note that this only applies to options that are not mapped to attributes of an explicit class in the data model.Dealing with axes read-back values (RBV)
With the advent of precise optical encoders, it turned out that axes do move after arriving at their set point. Hence, for some measurements, axes RBVs are read as pseudo-detector channels. Currently (eveH5 v7), these data end up as detector channels in distinct HDF5 datasets. However, logically they belong to the corresponding axes. Further complications may arise from the fact that there exists the use case for recording these axes RBVs during averaging of detector channel data.
The data model distinguishes between axes with encoders and those without. In the future, for axes with encoders the RBV will be read synchronously to every detector channel readout. This makes filling for those axes unnecessary, as for every detector channel readout there exists a “true” axis value. For average channels, this further means that as many axes RBVs are present as maximum detector channel readouts for this position, resulting in general in a “ragged array”.
As for axes without encoder, the RBV does not change after the axis has arrived at its position, additionally reading these axes RBVs would not make sense, as this would suggest real values.
Sorting non-monotonic positions in eveH5 datasets
Due to the (intrinsic) way the engine handles scans, position counts can be non-monotonic (#4562, #7722). However, this will usually be a problem for the analysis. Therefore, positions need to be sorted monotonically, and this is done during data import.
Array channels
Array channels in their general form are channels collecting 1D data. Typical devices used here are MCAs, but oscilloscopes and vector signal analysers (VSA) would be other typical array channels. Hence, for these quite different types of array channels, distinct subclasses of the generic ArrayChannelData
class exist, see Fig. 8.
Fig. 8 Preliminary data model for the ArrayChannelData
classes. The basic hierarchy is identical to Fig. 7. Details for the MCAChannelData
class can be found in Fig. 9. The ScopeChannelData
class is a container for scopes with several channels read simultaneously. Further array detector channels can be added by subclassing ArrayChannelData
. Probably the next class will be VSAChannelData
for Vector Signal Analyser (VSA) data.
Multi Channel Analysers (MCA) generally collect 1D data and typically have separate regions of interest (ROI) defined, containing the sum of the counts for the given region. For the EPICS MCA record, see https://millenia.cars.aps.anl.gov/software/epics/mcaRecord.html.
Fig. 9 Preliminary data model for the MCAChannelData
classes. The basic hierarchy is identical to Fig. 7, and here, the relevant part of the metadata class hierarchy from Fig. 12 is shown as well. Separating the MCAChannelCalibration
class from the ArrayChannelMetadata
allows to add distinct behaviour, e.g. creating calibration curves from the parameters.
Note: The scalar attributes for ArrayChannelROIs will currently be saved as snapshots regardless of whether the actual ROI has been defined/used. Hence, the evedata package needs to decide based on the existence of the actual data whether to create a ROI object and attach it to ArrayChannelData
.
The calibration parameters are needed to convert the x axis of the MCA spectrum into a real energy axis. Hence, the MCAChannelCalibration
class has a method calibrate()
for performing exactly this conversion. The relationship between calibrated units (cal) and channel number (chan) is defined as cal=CALO + chan*CALS + chan^2*CALQ. The first channel in the spectrum is defined as chan=0. However, not all MCAs/SDDs have these calibration values: Ketek SDDs seem to not have these values (internal calibration?).
The real_time and life_time values can be used to get an idea of the amount of pile up occurring, i.e. having two photons with same energy within a short time interval reaching the detector being detected as one photon with twice the energy. Hence, latest in the radiometry package, distinct methods for this kind of analysis should be implemented.
Area channels
Area channels are basically 2D channels, i.e., cameras. There are (at least) two distinct types of cameras in use, namely scientific cameras and standard consumer digital cameras for taking pictures (of sample positions in the setup). While scientific cameras usually record only greyscale images, but have additional parameters and can define regions of interest (ROI), consumer cameras are much simpler in terms of their data model and typically record RGB images. These different types of images need to be handled differently in the data processing and analysis pipeline.
Fig. 10 Preliminary data model for the AreaChannelData
class. The basic hierarchy is identical to Fig. 7. As scientific cameras are quite different from standard consumer digital cameras for taking pictures, but both are used from within the measurement program, distinct subclasses of the basic AreaChannelData
class exist. For details on the ScientificCameraData
classes, see Fig. 11.
Fig. 11 Preliminary data model for the ScientificCameraData
classes. The
basic hierarchy is identical to Fig. 7, and here, the relevant part of the
metadata class hierarchy from Fig. 12 is shown as well. As different
area detectors (scientific cameras) have somewhat different options,
probably there will appear a basic AreaChannelData
class with
more specific subclasses.
Regarding file names/paths: Some of the scientific cameras are operated from Windows, hence there is usually no unique mapping of paths to actual places of the files, particularly given that Windows allows to map a drive letter to arbitrary network paths. It seems as if these paths are quite different for the different detectors, and therefore, some externally configurable mapping should be used.
Note for Pilatus cameras: Those cameras seem to have three sensors each for temperature and humidity. Probably the simplest solution would be to store those values in an array rather than having three individual fields each. In case of temperature (and humidity) readings for each individual image, the array would become a 2D array.
Points to discuss further (without claiming to be complete)
Label for the ROIs
The camera controller seems to provide the possibility to set labels for ROIs. Is this supported by the EPICS driver currently in use, and could we just add the PV to the template file(s)?
Marker for the ROIs
The camera controller seems to set MinX, MinY, SizeX, SizeY. Is this the generally agreed and consistent way of defining the marker area? What should the four elements of the
ScientificCameraROIData.marker
attribute represent?
Some comments (not discussions any more, though):
Sample cameras: additional fields
The parameters
beam_x
andbeam_y
don’t change between images within a scan module, but are set once when adjusting the camera, and otherwise never change. Hence, they have been moved to the metadata.Note, however, that currently, these parameters are marked as
autoacquire=measurement
in the template file.
metadata module
Data without context (i.e. metadata) are mostly useless. Hence, to every class (type) of data in the evefile.data module, there exists a corresponding metadata class.
Note
As compared to the UML schemata for the IDL interface, the decision of whether a certain piece of information belongs to data or metadata is slightly different here. The main reason for this is the problem in current (as of eveH5 v7) files and redefined detector channels, leading to a loss of information that needs to be changed anyway and is discussed above for the data module. With separate datasets for different detector channels, the problem is solved and the immutable metadata belong to the metadata classes (and are converted to attributes on the HDF5 level in the future scheme, v8).
Fig. 12 Class hierarchy of the evefile.metadata module. Each concrete class in the evefile.data module has a corresponding metadata class in this module. You may click on the image for a larger view.
A note on the AbstractDeviceMetadata
interface class: The eveH5 dataset corresponding to the TimestampMetadata class is special in sense of having no PV and transport type nor an id. Several options have been considered to address this problem:
Moving these three attributes down the line and copying them multiple times (feels bad).
Leaving the attributes blank for the “special” dataset (feels bad, too).
Introduce another class in the hierarchy, breaking the parallel to the Data class hierarchy (potentially confusing).
Create a mixin class (abstract interface) with the three attributes and use multiple inheritance/implements.
As obvious from the UML diagram, the last option has been chosen. The name “DeviceMetadata” resembles the hierarchy in the scan.entities.setup
module and clearly distinguishes actual devices from datasets not containing data read from some instrument.
Points to discuss further (without claiming to be complete)
Currently none… ;-)
Some comments (not discussions any more, though):
Attributes “pv” and “access_mode”
“pv” is the EPICS process variable, “access_mode” refers to the access mode (local vs. ca, in the future additionally pva). Both are currently (as of eveH5 v7) stored as one attribute “access” in the eveH5 datasets, separated by “:” in the form
<access_mode>:<pv>
. In the new eveH5 schema (v8), these attributes will be split into two attributes with the corresponding names.Attributes for
AverageChannelMetadata
andIntervalChannelMetadata
The current model in the UML schemas of data and metadata assumes different data(sets) in case a detector channel gets redefined within a scan, see #7726 and the discussion above. This should be verified and specified.
Todo
Extend Metadata classes to contain all relevant attributes from the SCML/setup description. This should already include metadata regarding the actual devices used not yet available from the SCML/XML but relevant for a true traceability of changes in the setup.
Controllers
Code in the controllers technical layer operate on the entities and provide the required behaviour (the “business logic”).
What may be in here:
Mapping different versions of eveH5 files to the entities
Mapping timestamps to position counts (=> move to measurement subpackage)
Converting MPSKIP scans into average detector channel with adaptive number of recorded points (=> move to measurement subpackage?)
Separating datasets for channels redefined within one scan and currently (up to eveH5 v7) stored in one HDF5 dataset
Extract set values for axes (requires access to SCML)
Correct mapping of file numbers for external files
version_mapping module
For details, see the documentation of the version_mapping
module.
Being version agnostic with respect to eveH5 and SCML schema versions is a central aspect of the evedata package. This requires facilities mapping the actual eveH5 files to the data model provided by the entities technical layer of the evefile subpackage. The EveFile
facade obtains the correct VersionMapper
object via the VersionMapperFactory
, providing an HDF5File
resource object to the factory. It is the duty of the factory to obtain the “version” attribute from the HDF5File
object (possibly requiring to explicitly get the attributes of the root group of the HDF5File
object).
Fig. 13 Class hierarchy of the evefile.controllers.version_mapping module, providing the functionality to map different eveH5 file schemas to the data structure provided by the HDF5File
class. The VersionMapperFactory
is used to get the correct mapper for a given eveH5 file. For each eveH5 schema version, there exists an individual VersionMapperVx
class dealing with the version-specific mapping. The idea behind the Mapping
class is to provide simple mappings for attributes and alike that need not be hard-coded and can be stored externally, e.g. in YAML files. This would make it easier to account for (simple) changes.
For each eveH5 schema version, there exists an individual VersionMapperVx
class dealing with the version-specific mapping. That part of the mapping common to all versions of the eveH5 schema takes place in the VersionMapper
parent class, e.g. removing the chain. The idea behind the Mapping
class is to provide simple mappings for attributes and alike that can be stored externally, e.g. in YAML files. This would make it easier to account for (simple) changes.
Converting MPSKIP scans into average detector channel
- Module name:
- Dependencies:
Scan
, and here in particular the channels used in the scan module with the MPSKIP channel
Given the data model to not correspond to the current eveH5 structure (v7), it makes sense to convert scans using MPSKIP to “fake” average detector channels storing the individual data points on this level.
Important
MPSKIP is exclusively used by one group, and not only for storing the individual data points for averaging, but for recording axes RBVs for each individual detector channel readout as well, due to motor axes changing their position slightly. The axes RBVs are recorded for using pseudo-detectors are/should be equipped with encoders to ensure actual values being read. The data model now supports axes objects to have more than one value for a position to account for this situation. This means, however, to convert the MPSKIP scans with individual position counts for each detector readout to scans with multiple values available per individual position.
A typical scan using MPSKIP uses the MPSKIP detector in the innermost scan module. Here, the only motor axis is a counter (defining the maximum number of attempts to record data for a single averaging), and the detector channels are the primary readout (electrometer), often a secondary electrometer, the ring current and lifetime, and a series of RBVs from motor axes. Additionally, the SM-Counter detector is used in this scan module. All the actual motor axes are set in outer scan modules, typically an overall positioning of the sample (by means of a goniometer) in the outer scan module and the monochromator in the inner scan module. This accounts for a doubly nested scan in total, and the actual detector values having positions where no motor axis (besides the RBVs that are currently pseudo-detector channels, hence marked as channel) has corresponding positions.
The MPSKIP detector channel datasets get mapped to a SkipData
object, and if such an object is present, all detectors in the same scan module (i.e., with identical positions) need to be converted:
Actual detectors (not RBVs as pseudo-detector channels) need to be mapped to either
AverageChannelData
orAverageNormalizedChannelData
.The Counter (axis) data can be used to conveniently determine the positions for each individual averaging, the data object should afterwards be removed.
Axes RBVs (present als pseudo-detector channels) need to be mapped to
AxisData
objects with the individual axis values stored as ragged array.The SM-Counter (channel) data can be removed.
What about the Time(r) data? This is the (cumulative) time in seconds.
If the SCML is present, reading the scan part of the SCML and inferring the motor axes and detector channels where the MPSKIP detector is present makes it much easier to get the names of the data objects that need to be modified. Hence, it might be sensible to (i) implement the minimum functionality of the scan
subpackage necessary and (ii) rely on the SCML to be present for the time being. It might be a sensible option to check for the SCML to be present if a SkipData
object has been created, and in those (probably rare) cases to issue a warning that this is currently not supported.
Separating datasets for redefined channels
- Module name:
dataset_separation
- Dependencies:
Scan
, and here the channels defined in a scan module as well as the corresponding scan module ID
Generally, detector channels can be redefined within an experiment/scan, i.e. can have different operational modes (standard/average vs. interval) in different scan modules. Currently (eveH5 v7), all data are stored in the identical dataset on HDF5 level and only by “informed guessing” (if at all possible) can one deduce that they served different purposes. Generally, we need separate datasets on the HDF5 level for detector channels that change their type or attributes within a scan, see #6879, note 16.
The current state of affairs (as of 09/2024) regarding a new eveH5 scheme (v8) is to separate single-point channels from average and interval channels and have average and interval channel datasets per se be suffixed by the scan module ID. Given that one and the same channel can only be used once in a scan module, this should be unique.
While the future way of storing those detector channels in eveH5 files is discussed in #7726, we need a solution for legacy data solving two problems:
separating the values for the different channels into separate datasets
This is rather complicated, but probably possible by looking at the different HDF5 datasets where present – although this would require reading the data of the HDF5 datasets if corresponding datasets are available in the “averagemeta” or/and “standarddev” group to check for changes in these data.
Separating the data is but only necessary if corresponding datasets are available in the “averagemeta” or/and “standarddev” groups. I.e., loading the data needs only to happen once this condition is met. However, as soon as this condition is met, data for legacy files need to be loaded to separate the data into separate datasets and not to have the surprise afterwards when first accessing the presumably single detector channel to all of a sudden have it split into several datasets.
sensibly naming the resulting multiple datasets.
Generally, the same strategy as proposed for the new eveH5 scheme should be used here, i.e. suffixing the average and interval detector channels with the scan module ID. Given that one and the same channel can only be used once in a scan module, this should be unique. The type of detector channel can be deduced from the class type.
Getting the scan module ID requires to read the SCML, though, as usually, the SMCounter pseudo-detector channel will not be present. Furthermore, mapping position counts to scan modules is far from simple. Hence, an alternative option may be to suffix the respective datasets with increasing integer numbers, without relation to the scan module ID.
Extract set values for axes
- Module name:
axes_set_values
- Dependencies:
Scan
, and here the details of the axes positions defined in each scan module – thus requiring parsing of the different ways how to define axes positions.
The axes positions stored in the HDF5 file are the RBVs after positioning. If, however, an axis never reached the set value due to limit violation or other constraints, this is usually not visible from the HDF5 file, as the severity is typically not recorded. However, the set values for each axis can be inferred from the scan description. Having this information would be helpful for routine checks whether a scan ran as expected. Set values are stored in the set_values
attribute of the AxisData
class.
Note
In case the axis set values in the SCML are only a reference to an external file, these values cannot reliably be read afterwards. Hence, in this case, probably either a warning should be issued or the situation silently ignored. In a future eveH5 scheme, it may be sensible to store those values in some way in the file.
Correct mapping of file numbers for external files
In the past, there has been some occasions where the stored file numbers for external files (usually images from 2D detectors) do not match with the correct files. Hence, we need some mechanism to modify the file numbers after loading a file for a correct mapping. It is unclear so far whether there is a way to automatically and reliably detect when to apply this correction.
Boundaries
What may be in here:
facade:
resources:
Interfaces towards additional files, e.g. images
Images in particular are usually not stored in the eveH5 files, but only pointers to these files.
Import routines for the different files (or at least a sensible modular mechanism involving an importer factory) need to be implemented.
Is the
evedata
package the correct place for these importers? One could think of theradiometry
package as the better place, but on the other hand, theevedataviewer
package would need to be able to display those data as well, hence need the import to be done.Given the
evedata
package to act as a (complicated) importer, all importer mechanisms for additional data not stored in the HDF5 file should be implemented here.
Interfaces towards other file formats
One potential candidate for an exchange format would be the NeXus format. However, there is not one NeXus file format, but there are several schemas for different types of experiments. For details, see the NeXus application definitions. Hence, those exporters may better be located in the
radiometry
package.
evefile module (facade)
Fig. 14 Class hierarchy of the evefile.boundaries.evefile module, providing the facade for an eveH5 file. Currently, the basic idea is to inherit from the File
entity and extend it accordingly, adding behaviour.
As per Fig. 14, the EveFile
class inherits from the File
class of the entities
subpackage. Reading (loading) an eveH5 file results in calling out to HDF5File.read()
, followed by mapping the eveH5 contents to the data model. Additionally, for eveH5 v7 and below, datasets for detector channels that have been redefined within one scan and scans using MPSKIP are mapped to the respective datasets accordingly. Last but not least, the corresponding SCML (and setup description, where applicable) is loaded and the metadata contained therein mapped to the metadata of the corresponding datasets.
Some comments (not discussions any more, though):
Metadata from SCML file
There is more information available from the SCML file (and the measurement station/beam line description - but that is generally not available when reading eveH5 files if it is not contained in the SCML). This information needs to be mapped to the respective metadata classes (and those classes be extended accordingly). This mapping will take place here, as per schema of the functional and technical layers, the
evefile
subpackage depends on thescan
subpackage.Non-monotonic position counts in eveH5 datasets
Due to the (intrinsic) way the engine handles scans, position counts can be non-monotonic (#4562, #7722). However, this will usually be a problem for the analysis. Therefore, the sorting logic is implemented in the entities layer in the load method(s).
eveh5 module (resource)
The aim of this module is to provide a Python representation (in form of a hierarchy of objects) of the contents of an eveH5 file that can be mapped to both, the evefile and measurement interfaces. While the Python h5py package already provides the low-level access and gets used, the eveh5 module contains Python objects that are independent of an open HDF5 file, represent the hierarchy of HDF5 items (groups and datasets), and contain the attributes of each HDF5 item in form of a Python dictionary. Furthermore, each object contains a reference to both, the original HDF5 file and the HDF5 item, thus making reading dataset data on demand as simple as possible.
Fig. 15 Class hierarchy of the evedata.evefile.boundaries.eveh5
module. The HDF5Item
class and children represent the individual HDF5 items on a Python level, similarly to the classes provided in the h5py package, but without requiring an open HDF5 file. Furthermore, reading actual data (dataset values) is deferred by default.
As such, the HDF5Item
class hierarchy shown above is pretty generic and should work with all eveH5 versions. However, it is not meant as a generic HDF5 interface, as it does make some assumptions based on the eveH5 file structure and format.
Some comments (not discussions any more, though):
Reading the entire content of an eveH5 file at once vs. deferred reading?
Reading relevant metadata (e.g., to decide about what data to plot) should be rather fast. And generally, only two “columns” will be displayed (as f(x,y) plot) at any given time – at least if we don’t radically change the way data are looked at compared to the IDL Cruncher.
References to the internal datasets of a given HDF5 file are stored in the corresponding Python data structures (together with the HDF5 file name). Hence, HDF5 files are closed after each operation, such as not to have open file handles that may be problematic (but see the quote from A. Collette below).
Plotting requires data to be properly filled, although filling will most probably not take place globally once, but on a per plot base. See the discussion on fill modes, currently below in the Dataset subpackage section.
From the book “Python and HDF5” by Andrew Collette:
You might wonder what happens if your program crashes with open files. If the program exits with a Python exception, don’t worry! The HDF library will automatically close every open file for you when the application exits.
—Andrew Collette, 2014 (p. 18)
Measurement
Generally, the measurement
subpackage, as mentioned already in the Concepts section, provides the interface towards the “user”, where user mostly means the evedataviewer
and radiometry
packages. However, besides these two Python packages, human users will want to use the evedata
package as well. Hence, it should be as human-friendly as possible.
The overall package structure of the evedata package is shown in Fig. 5. Furthermore, a series of (still higher-level) UML schemata for the measurement
subpackage are shown below, reflecting the current state of affairs (and thinking).
Note
The mapping of the information contained in both, the HDF5 and SCML layers of an eveH5 file, to the measurement is far from being properly modelled or understood. This is partly due to the step-wise progress in understanding. On a rather fundamental level, it remains to be decided whether a Measurement
should allow for reconstructing how a measurement has actually been carried out (i.e., provide access to the SCML and hence the anatomy of the scan).
What is the main difference between the evefile
and the measurement
subpackages? Basically, the information contained in an eveH5 file needs to be “interpreted” to be able to process, analyse, and plot the data. While the evefile
subpackage provides the necessary data structures to faithfully represent all information contained in an eveH5 file, the measurement
subpackage provides the result of an “interpretation” of this information in a way that facilitates data processing, analysis and plotting.
However, the measurement
subpackage is still general enough to cope with all the different kinds of measurements the eve measurement program can deal with. Hence, it may be a wise idea to create dedicated dataset classes in the radiometry
package for different types of experiments. The NeXus file format may be a good source of inspiration here, particularly their application definitions. The evedataviewer
package in contrast aims at displaying whatever kind of measurement has been performed using the eve measurement program. Hence it will deal directly with Measurement
(facade) objects of the measurement
subpackage.
Arguments against the 2D data array as sensible representation
Currently, one very common and heavily used abstraction of the data contained in an eveH5 file is a two-dimensional data array (basically a table with column headers, implemented as pandas dataframe). As it stands, many problems in the data analysis and preprocessing of data come from the inability of this abstraction to properly represent the data. Two obvious cases, where this 2D approach simply breaks down, are:
subscans – essentially a 2D dataset on its own
adaptive average detector channel saving the individual, non-averaged values (implemented using MPSKIP)
Furthermore, as soon as spectra (1D) or images (2D) are recorded for a given position (count), the 2D data array abstraction breaks down as well.
Other problems inherent in the 2D data array abstraction are the necessary filling of values that have not been obtained. Currently, once filled there is no way to figure out for an individual position whether values have been recorded (in case of LastFill) or whether a value has not been recorded or recording failed (in case of NaNFill).
A few ideas/comments for further modelling this subpackage:
evefile
represents the eveH5 file, whilemeasurement
maps the different datasets to more sensible abstractions.Not all abstractions will necessarily be reflected in the future in the eveH5 file. Currently (eveH5 v7), most of the abstractions are clearly not visible there. To deal with this situation, the entities in the
evefile
subpackage should reflect the future eveH5 scheme and abstractions therein, with the version mapping from the controller technical layer responsible for the mapping of older eveH5 schemata to the entities.
Entities
A few general remarks:
Measurement (entity)
is still distinct from the “dataset” concept used in theevedataviewer
andradiometry
packages.However, the
Measurement (facade)
is very close to the “dataset” concept of the ASpecD framework and used in theevedataviewer
andradiometry
packages.
Metadata need to be attached to the individual data objects, as is the case in the
evedata.evefile.entities
subpackage.Clear differences between the
measurement
andevefile
subpackages:measurement
takes care of data joining (“filling”), subscans, and probably mapping of monitors to datasets with positions.The distinction between “main”, “snapshot”, and “monitor” are discarded in favour of more useful abstractions (“devices/setup”, “beamline”, “machine”). This requires, however, a thorough discussion of the concepts of monitors and snapshots, how they are currently used and how they should be used (and mapped) in the future.
measurement module
A Measurement
generally reflects all the data obtained during a measurement. However, to plot data or to perform some analysis, usually we need data together with an axis (or multiple axes for nD data), where data (“intensity values”) and axis values come from different HDF5 datasets. Furthermore, there may generally be the need/interest to plot arbitrary data against each other, i.e. channel data vs. channel data and axis data vs. axis data, not only channel data vs. axis data. This is the decisive difference between the Measurement entity
and the Measurement facade
, with the latter having the additional attributes for storing the data to work on.
Fig. 16 Class hierarchy of the measurement.entities.measurement
module. Devices are all the detector(channel)s and motor(axe)s used in a scan. Distinguishing between detector(channel)s/motor(axe)s, beamline, and machine can (at least partially) happen based on the data type: detector(channel)s are ChannelData
, motor(axe)s are AxisData
. Machine and beamline parameters are more tricky, as they can be DeviceData
(from the “monitor” section) as well as ChannelData
(and AxisData
for shutters and alike?) from the original “main” and “snapshot” sections. Scan
and Station
inherit directly from their counterparts in the evefile.entities.file
module. Data
and Axis
seem not to be used here, but become essential in the corresponding Measurement
facade. See Fig. 18 for details.
Points to discuss further (without claiming to be complete)
How to deal with snapshots and monitors?
Snapshots and monitors mostly provide information (“metadata”) on the state of beamline and machine. As such, usually they are not used in plots. This would mean that the current/energy that is used as the one standard detector in (nearly) every measurement shall not be part of the
machine
section, but ratherdetectors
(or whatever this section may be named eventually).How to reliably identify those HDF5 datasets that belong to the
machine
andbeamline
sections?Basically, all of these datasets should come from either the
snapshot
ordevice
(aka monitor) section of the eveH5 file. How about (in the future) explicitly mark the corresponding devices in the SCML and setup file as belonging to either machine or beamline, record them automatically, and place them in distinct groups in the eveH5 file?
Some comments (not discussions any more, though):
Combine
detectors
andmotors
to one sectiondevices
.The distinction between detector(channel)s and motor(axe)s is somewhat blurred when we start allowing options to be handled like currently only axes, i.e. to record detector channel data as function of an option value (exposure/acquisition time and similar).
On the other hand, there still is a clear distinction between motor axes and detector channels, but this is apparent by the type of the individual objects. Furthermore, the difference between
devices
on one side andbeamline
andmachine
on the other side would more clearly reflect the current distinction betweenmain
andsnapshot
/monitor
.The name
devices
may somewhat collide with the current use of the term for devices that are neither motor axes nor detector channels. However, this collision may be leveraged in the future, when the concepts and abstractions available in the measurement program change as well.Reproducibility, i.e. history of tasks performed on the data.
Background: In the ASpecD framework, reproducibility is an essential concept, and this revolves about having a dataset with one clear data array and n corresponding axes. The original data array is stored internally, making undo and redo possible, and each processing and analysis step always operates on the (current state of the) data array. In case of the datasets we deal with here, there is usually no such thing as the one obvious data array, and users can at any time decide to focus on another set of “axes”, i.e. data and corresponding axis values, to operate on.
The
evedata
package does not deal with the concept of reproducibility, but delegates this to theradiometry
package. There, the first step would be to decide which of the available channels accounts as the “primary” data (if not set as preferred in the scan already and read from the eveH5 file accordingly).
Dealing with images stored in files separate from the eveH5 file.
The eveH5 file contains only the links (i.e. filenames) to these files. The
evefile
subpackage comes with importers for images as well. Hence, in anEveFile
object, the images will be directly accessible.Accessing images of course means having access to the image files stored separately from the eveH5 file.
Regarding deferred loading: Loading images is a rather cheap operation (typically <1 ms per image), hence one could load all images at once when first accessing the data.
metadata module
Metadata are a crucial part of reproducibility. Furthermore, metadata allow analysis routines to gain a “semantic” understanding of the data and hence perform (at least some) actions unattended and fully automatically. While all metadata corresponding to the individual devices used in recording data are stored in the metadata
attribute of the Data
class (and its descendants), there are other types of metadata that belong to the measurement as such itself. These are modelled in the metadata
module in the Metadata
class.
Fig. 17 While the Metadata
class inherits directly from its counterpart from the evedata.evefile.entities.file
module, it is extended in crucial ways, reflecting the aim for more reproducible measurements and having datasets containing all crucial information in one place. This involves information on both, the machine (BESSY-II, MLS) and the beamline, but on the sample(s) as well. Perhaps the sample
attribute should be a dictionary rather than a plain list, with (unique) labels for each sample as keys.
As of now, the eveH5 files do not contain any metadata regarding machine, beamline, or sample(s). The names of the machine and beamline can most probably be inferred, having the name of the beamline obviously allows to assign the machine as well.
Given that one measurement (i.e. one scan resulting in one eveH5 file) can span multiple samples, and will often do, in the future, at least some basic information regarding the sample(s) should be added to the eveH5 file and read and mapped accordingly to instances of the Sample
class, one instance per sample. Potentially, this allows to assign parts of the measured data to individual samples and hence automate data processing and analysis, e.g. splitting the data for the different samples into separate datasets/files.
Controllers
What may be in here:
Fill modes, now termed “joining”
Mapping monitor time stamps to position counts
Results probably in
DeviceData
objects.
Converting scan with subscans into appropriate subscan data structure
Joining: “fill modes”
For details see the joining
module.
For each motor axis and detector channel, in the original eveH5 file only those values appear—typically together with a “position counter” (PosCount) value—that are actually set or measured. Hence, the number of values (i.e., the length of the data vector) will generally be different for different detectors/channels and devices/axes. To be able to plot arbitrary data against each other, the corresponding data vectors need to be brought to the same dimensions (i.e., “joined”, originally somewhat misleadingly termed “filled”).
For numpy set operations, see in particular numpy.intersect1d()
and numpy.union1d()
. Operating on more than two arrays can be done using functools.reduce()
, as mentioned in the numpy documentation (with examples). Helpful may be numpy.digitize()
as well.
Points to discuss further (without claiming to be complete)
Which fill modes are relevant/needed?
It seems that LastNaNFill is widely used as a default fill mode. Depending on the origin of the data, additional post-processing (see below) is necessary to have usable data.
As NoFill does not only not fill, but actually reduce data, “fill mode” may not be the ideal term. Other opinions/ideas/names?
Given that the
EveFile
class provides a faithful representation of the actual data contained in an eveH5 file, one could think of mechanisms to highlight those values that were actually recorded (as compared to filled afterwards). Would this help to reduce the number of fill modes available?Will there always be only one fill mode for one dataset?
Currently, this seems to be the case for the interfaces (IDL, eveFile) used, although one could probably create multiple datasets with different fill modes (and different channels/detectors and axes/motors involved) from a single
EveFile
object.How to deal with monitors?
It seems that currently, the monitors are not used at all/too much by the users, as they are not part of the famous pandas dataframe.
How to deal with channel/detector snapshots?
Currently, fill modes do not care about channel/detector snapshots, as channel/detector values are never filled. So what is the purpose of these snapshots, and are they (currently) used in any sensible way beyond recording the data? (Technically speaking, people should be able to read the data using eveFile, though…)
How to deal with “fancy” scans “monitoring” axes as pseudo-detectors?
Some scans additionally “monitor” an axis by means of a pseudo-detector. This generally leads to an additional position count for reading this “detector”, and without manually post-processing the filled data matrix, we end up plotting NaN vs. NaN values when trying to plot a real detector vs. the pseudo-detector reused as an axis (and as a result seeing no plotted data).
There was the idea of “compressing” all position counts for detector reads where no axis moves in between into one position count. Can we make sure that this is valid in all cases?
In context of mapping such scans beforehand in the
evefile
subpackage to more appropriate data objects, i.e. with multiple values per position, the problem should be solved.
Some comments (not discussions any more, though):
How to handle data filling?
Obviously, if one wants to plot arbitrary HDF5 datasets against each other (as currently possible), data dimensions need to be made compatible.
The original values should always be retained, to be able to show/tell which values have actually been obtained (and to discriminate between not recorded and failed to record, i.e. no entry vs. NaN in the original HDF5 dataset)
Could there be different (and changing) filling of the data depending on which “axes” should be plotted against each other? – Yes, that should always be the case.
Where/when to apply filling?
The
EveFile
class contains the data as read from the eveH5 file, i.e. the not at all filled data for each channel/detector and axis/motor (faithful representation of the eveH5 file contents). Hence, filling is a task theMeasurement
object is responsible for, calling out to the respective classes from thecontrollers
layer upon setting the data attribute.
Mapping timestamps to position counts
Note
This has been discussed as part of the evefile.controllers
subpackage before and was recently moved here to the measurement.controllers
subpackage.
Points to discuss further (without claiming to be complete)
Mapping MonitorData to MeasureData
Monitor data (with time in milliseconds as primary axis) need to be mapped to measured data (with position counts as primary axis). Mapping position counts to time stamps is trivial (lookup), but vice versa is not unique and the algorithm generally needs to be decided upon. There is an age-long discussion on this topic (#5295 note 3). For a current discussion see #7722.
Besides the question how to best map one to the other (that needs to be discussed, decided, clearly documented and communicated, and eventually implemented): This mapping should most probably take place in the controllers technical layer of the measurement functional layer. The individual
MonitorData
class cannot do the mapping without having access to the mapping table.
For a detailed discussion/summary of the current state of affairs regarding the algorithm and its specification, see #7722. In short:
Monitors corresponding to motor axes should be mapped to the next position.
Monitors corresponding to detector channels should be mapped to the previous position.
For monitors corresponding to devices, there is no sensible decision possible.
Just to make things slightly more entertaining, up to eveH5 v7, monitor datasets do not provide any hint which type (axis, channel, device) they belong to. Hence, this decision can not be made sensibly. For safety reasons, mapping monitors to the previous position seems sensible, as the event could have occurred in the readout phase of the detectors (the position is incremented after moving the axes and before triggering the detector readout and start of nested scan modules).
The TimestampData
class got a method get_position()
to return position counts for given timestamps. Currently, the idea is to have one method handling both, scalars and lists/arrays of values, returning the same data type, respectively.
This means that for a given EveFile
object, the controller carrying out the mapping knows to ask the TimestampData
object via its get_position()
method for the position counts corresponding to a given timestamp.
Special cases that need to be addressed either here or during import of the data of a monitor:
Multiple values with timestamp
-1
, i.e. before the scan has been started.Probably the best solution here would be to skip all values except of the last (newest) with the special timestamp
-1
. See #7688, note 10 for details.Multiple (identical) values with identical timestamp
Not clear whether this situation can actually occur, but if so, most probably in this case only one value should be contained in the data. See #7688, note 11 for details.
Furthermore, a requirement is that the original monitor data are retained when converting timestamps to position counts. This most probably means to create a new MeasureData
object. This is the case for the additional DeviceData
class as subclass of MeasureData
. The next question: Where to place these new objects in the Measurement
(facade) class?
Subscans
Different types of subscans exist, and currently (as of eveH5 v7), neither the scan structure nor proper information on the subscans materialises in the eveH5 files. Typical subscans are nested scan modules, but sometimes, chained scan modules exist as well (e.g., for bolometers with subsequent scans with shutter open and closed).
Furthermore, things get more complicated for measuring different samples within one scan, but not subsequently, but “row-wise” with one parameter set for all samples in one “row”. In any case, future schema versions for the eveH5 files need to account for the different types of subscans and need to store information that allows for reproducible and straight-forward extraction of the data belonging to the individual samples.
Boundaries
What may be in here:
facade:
Measurement
Interface towards users (i.e., mainly the
radiometry
andevedataviewer
packages)Given a filename of an eveH5 file, loads all contained information into the
Measurement
object.
resources:
Saver to (to be defined) standard format
Exporters for different formats
measurement module (facade)
The Measurement
facade is the main entry point to the evedata
package and hence the main interface both, for evedataviewer
and radiometry
packages as well as for human users. Technically speaking, you are free to extract individual data from the structure and do whatever you like with it. However, one big difference between the current state of affairs and the evedata
package is the Measurement
object as a central unit containing all relevant information.
Fig. 18 Class hierarchy of the measurement.boundaries.measurement
module. The Measurement
facade inherits directly from its entities counterpart. The crucial extension here is the data
attribute. This contains the actual data that should be plotted or processed further. Thus, the Measurement
facade resembles the dataset concept known from the ASpecD framework and underlying, i.a., the radiometry
package. The Measurement.set_axes()
method takes care of data joining if necessary. Furthermore, you can set the attribute of the underlying data object to obtain the data from, if it is not data
, but, e.g., std
or an option.
A few comments on the Measurement
facade:
The big difference to the
Measurement entity
is the additional attributedata
. It contains the actual data that should be plotted or processed further. Thus, theMeasurement
facade resembles the dataset concept known from the ASpecD framework and underlying, i.a., theradiometry
package.The
data
attribute will usually be pre-filled with the data from the “preferred_axis” and “preferred_channel” attributes of the eveH5 file, if present. They can be set/changed using theMeasurement.set_data()
andMeasurement.set_axes()
methods.The
Measurement.set_data()
method takes care of joining (“filling”) if necessary. Furthermore, you can set the attribute of the underlying data object to obtain the data from, if it is notdata
, but, e.g.,std
or an option. The latter is crucial, as the data model of theevefile
subpackage provides abstractions from the plain HDF5 datasets (up to eveH5 v7) where options are stored as separate datasets independent of the devices they actually belong to.The
Measurement.set_axes()
method allows to set more than one axis, hence thename
parameter is a list. As an option of a dataset could be used as axis, e.g. theAcquireTime
of images, there is an additionalfield
parameter similar toMeasurement.set_data()
.What is the difference between the
Measurement.save()
andMeasurement.export()
methods? The idea is to have “save” save to a (to be defined) standard format, while “export” would export to whatever given format.This distinction is somewhat arbitrary and may be removed in the future.
In any case, saving/exporting data is a feature for somewhat later, although at least some “sensible” export format (and if it be plain text) may be a high demand for providing data to to collaboration partners. However, here, the
radiometry
package and the underlying ASpecD exporters may come in handy.
Scan
The overall package structure of the evedata package is shown in Fig. 5. Furthermore, a series of (still higher-level) UML schemata for the scan
functional layer are shown below, reflecting the current state of affairs (and thinking).
The scan
functional layer contains all classes necessary to represent the contents of an SCML file. The general idea behind is to have all relevant information contained in the scan description and saved together with the data in the eveH5 file at hand. The SCML file is generally stored within the eveH5 file, and it is the information used by the GUI of the measurement program. One big advantage of having the information of the SCML file as compared to the information stored in the eveH5 file itself: The structure of the scan is available, making it possible to infer much more information relevant for interpreting the data.
Important
The SCML file contained in (most) eveH5 files “only” saves the scan description and a reduced set of devices in the setup section, not the entire description of the measurement station as stored in the <measurement-station-name>.xml
file. Furthermore, it saves the SCML in a way that it can be reused directly by the GUI of the measurement program, i.e. with variables not replaced. Why is this important?
Variables are not replaced by their actual values
Certain fields contain the variables, but not the actual replaced values. Some of this information is currently stored in the HDF5 layer of the eveH5 files and can be read from there. This is important to have in mind when thinking about reducing the metadata stored in the HDF5 layer.
Only the scan description is available, with devices defined therein.
A dynamic snapshot saves the state of all currently defined motors and/or detectors. However, there are usually many more motors/detectors defined in the measurement station description not appearing in the SCML file available from the eveH5 file. Hence, there is no way to generally rely on the SCML file contents for metadata corresponding to whatever dataset that exists in the HDF5 layer of the eveH5 file.
This does not mean that we should save the complete description of the measurement station in the future. It is just important to be aware of this situation, particularly when (further) designing the data model(s).
One big difference between the SCML schema and the class hierarchy defined in this functional layer: As the evedata package can safely assume only ever to receive validated SCML files, some of the types of attributes are more relaxed as compared to the schema definition. This makes it much easier to map the types to standard Python types.
Entities
Although the scan
subpackage has a different goal as compared to the original SCML, namely its read-only interest, the discussion below is currently heading towards a restructuring of the SCML XML schema definition (XSD), mainly informed by the data model developed above.
file module
This module contains the main File
class and probably as well the Plugin
class and its dependencies. Generally an SCML file can be split in two (three) parts: a description of the setup/instrumentation used for a scan (module scan.entities.setup
) and a description of the actual scan/measurement (module scan.entities.scan
). The plugins would be the third part.
Fig. 19 Class hierarchy of the scan.entities.file
module, closely resembling the schema of the SCML file. Currently, the location of the Plugin
class and its dependencies is not decided, as it is not entirely clear whether this information is relevant enough to be mapped. For a class diagram see the separate figure below.
Fig. 20 Class hierarchy of the Plugin
class, probably located in the module scan.entities.file
module and closely resembling the schema of the SCML file. Currently, the location of the Plugin
class and its dependencies is not decided, as it is not entirely clear whether this information is relevant enough to be mapped.
Points to discuss further (without claiming to be complete)
Renaming “setup” to “station”?
In the
Measurement
class, there is a distinction between setup, beamline, and machine. The current notion of “setup” in the SCML file is a summary of these three. However, it might be sensible to distinguish the devices belonging to either of the three types already on the SCML level.Moving “Plugin” to its own module for consistency?
“Scan” and “Setup” are contained in their own modules, as is “Plot” and “Event” that are both used in “Scan”.
Remove the entire “Plugin” part from the SCML file?
What is the reasoning behind defining plugins in the SCML that are eventually a function of the eve engine and not of the individual scan? Positioning plugins are a prime example, as are the (now removed) save plugins.
Probably, originally the idea was to have a generic definition of plugins and allow for defining libraries here that would get accessed by a running engine dynamically. However, it seems that this has never really materialised.
Are position plugins (for creating axes positions) something the GUI would know how to handle if we define something new in the SCML? Probably not, and if not, than we could remove it from the SCML as well.
What are “display” plugins?
Is there any need to have them in the SCML?
scan module
This module contains all classes storing information on the actual scan. An SCML file can contain exactly one scan. Furthermore, as has been decided to remove multiple chains in one scan, and hence the concept of chains altogether, the hierarchy is a bit simpler as compared to the current (Version 9.2, as of 08/2024) SCML XML schema. One scan consists of n scan modules.
To slightly reduce the already rather complex list of classes, plots, events, and pause conditions have been outsourced into separate modules, with the latter two together in one module. These modules are described separately below.
Fig. 21 Class hierarchy of the scan.entities.scan
module, closely resembling the schema of the SCML file. As the scan module is already quite complicated, event and plot-related classes have been separated into their own modules and are described below. Hint: For a larger view, you may open the image in a separate tab. As it is vectorised (SVG), it scales well.
Comments:
The chains have been removed entirely.
The snapshot modules have been reduced to one single module (similar to a combination of the previously available dynamic snapshot modules).
The saveplugin attribute on the scan level has been removed.
Points to discuss further (without claiming to be complete)
Controller class
The Controller class is part of the Scan, Positioning, and ScanModuleAxis classes, and referred to from attributes named “plugin” (or “saveplugin” in case of Scan). Why the different naming?
plot module
Fig. 22 Class hierarchy of the scan.entities.plot
module, closely resembling the schema of the SCML file. One ClassicScanModule class can have n plots. For the context of the ClassicScanModule, see the scan.entities.scan
module.
event module
Fig. 23 Class hierarchy of the scan.entities.event
module, closely resembling the schema of the SCML file. The “Event” and “PauseCondition” classes have both close ties with the “scml.scan” module. Grouping them in one module seems justified, as eventually, a “PauseCondition” could be understood as an event, too.
setup module
Fig. 24 Class hierarchy of the scan.entities.setup
module, closely resembling the schema of the SCML file. Differing from the SCML schema definition, an additional class Setup
is introduced here containing objects of the subclasses “Detector, “Motor”, and “Device” of “AbstractDevice”. The SCML schema contains these three at the same level as “Scan” and “Plugins”.
Points to discuss further (without claiming to be complete)
Better name for “Device”?
All three, Detector, Motor and Device (as well as Axis and Channel, and Option), are abstract devices.
Information on the individual devices
Is there somewhere (e.g. in the SCML file) more information on the individual devices, such as the exact type and manufacturer for commercial devices? This might be relevant in terms of traceability of changes in the setup.
Looks like as of now there is no such information stored anywhere. It might be rather straight-forward to expand the SCML schema for this purpose, not affecting the GUI or engine (both do not care about this information).
Rename into “station” to distinguish better between setup, beamline, and machine?
Furthermore, distinguish for each (abstract) device where it belongs to (setup, beamline, machine) and make these three sections on the top level of the station SCML tag?
How to get the data model into the SCML?
The data model for the devices (detectors, motors, …) as detailed above for Evefile and Measurement should somehow be reflected in the SCML file as well. Do we need distinct tags/classes for that, or is it sufficient to have the
class
attribute with the appropriate name?
Controllers
What may be in here:
Mapping different versions of SCML files to the entities
Extracting set values for axes
Get a mapping of position counts to scan modules?
version_mapping module
For details, see the documentation of the version_mapping
module.
Fig. 25 Class hierarchy of the evedata.scan.controllers.version_mapping
module, providing the functionality to map different SCML file
schemas to the data structure provided by the Scan
and Station
classes. The factory
will be used to get the correct mapper for a given SCML file.
For each SCML schema version, there exists an individual
VersionMapperVxmy
class dealing with the version-specific mapping.
The idea behind the Mapping
class is to provide simple mappings for
attributes and alike that need not be hard-coded and can be stored
externally, e.g. in YAML files. This would make it easier to account
for (simple) changes.
Extract set values for axes
The axes positions stored in the HDF5 file are the RBVs after positioning. If, however, an axis never reached the set value due to limit violation or other constraints, this is usually not visible from the HDF5 file, as the severity is typically not recorded. However, the set values for each axis can be inferred from the scan description. Having this information would be helpful for routine checks whether a scan ran as expected. Set values are stored in the set_values
attribute of the AxisData
class.
Note
In case the axis set values in the SCML are only a reference to an external file, these values cannot reliably be read afterwards. Hence, in this case, probably either a warning should be issued or the situation silently ignored. In a future eveH5 scheme, it may be sensible to store those values in some way in the file.
The set_axis
module of the scan.controllers
subpackage contains the logic to actually extract the information from the SCML representation, while the set_axis
module of the evefile.controllers
subpackage does the mapping to the axes objects.
Boundaries
What may be in here:
facades:
scan
(station)
resources:
scml
reading separate SCML files if present (https://redmine.ahf.ptb.de/issues/2740)
scan module (facade)
Fig. 26 Class hierarchy of the scan.boundaries.scan
module, providing the facades for the scan and setup descriptions. Currently, the basic idea is to inherit from the File
interface. The difference between the load()
and extract()
methods: While load()
loads a file from the file system, extract()
extracts the SCML from the
user block of a given HDF5 file.
When loading an SCML/XML file, the SCML
class is called to read the actual XML, and afterwards, the contents are mapped to the entities defined in the entities
subpackage.
scml module (resource)
Most probably a DOM parser converting the SCML/XML read to a hierarchy of Python objects/structures.
The resource must be able to cope with both, actual file handles and streams/text, as typically, SCML files are not present as files in the file system, but get extracted from an HDF5 file. For the time being (as of eveH5 v7), the SCML is stored in the user section of the HDF5 file. In the future, the SCML and station XML (not currently available in retrospect) are stored within the HDF5 file as separate datasets.