evedata.measurement.controllers.joining module
Ensure data and axes values are commensurate and compatible.
For each motor axis and detector channel, in the original eveH5 file only those values appear—together with a “position” (PosCount) value—that have actually been set or measured. Hence, the number of values (i.e., the length of the data vector) will generally be different for different devices. To be able to plot arbitrary data against each other, the corresponding data vectors need to be commensurate. If this is not the case, they need to be brought to the same dimensions (i.e., “joined”, originally somewhat misleadingly termed “filled”).
To be exact, being commensurate is only a necessary, but not a sufficient criterion, as not only the shape needs to be commensurate, but the indices (in this case the positions) be identical.
A bit of history
In the previous interface (EveFile
), there are four “fill modes” available
for data: NoFill, LastFill, NaNFill, LastNaNFill. From the documentation of
eveFile:
- NoFill
“Use only data from positions where at least one axis and one channel have values.”
Actually, not a filling, but mathematically an intersection, or, in terms of relational databases, an
SQL INNER JOIN
. In any case, data are reduced.- LastFill
“Use all channel data and fill in the last known position for all axes without values.”
Similar to an
SQL LEFT JOIN
with data left and axes right, but additionally explicitly setting the missing axes values in the join to the last known axis value.- NaNFill
“Use all axis data and fill in NaN for all channels without values.”
Similar to an
SQL LEFT JOIN
with axes left and data right. To be exact, theNULL
values of the join operation will be replaced byNaN
.- LastNaNFill
“Use all data and fill in NaN for all channels without values and fill in the last known position for all axes without values.”
Similar to an
SQL OUTER JOIN
, but additionally explicitly setting the missing axes values in the join to the last known axis value and replacing theNULL
values of the join operation byNaN
.
Furthermore, for the Last*Fill modes, snapshots are inspected for axes values that are newer than the last recorded axis in the main/standard section.
Note that none of the fill modes guarantees that there are no NaNs (or comparable null values) in the resulting data.
Important
The IDL Cruncher seems to use LastNaNFill combined with applying some
“dirty” fixes to account for scans using MPSKIP and those scans
“monitoring” a motor position via a pseudo-detector. The EveHDF
class (DS) uses LastNaNFill as a default as well but does not apply
some additional post-processing.
Shall fill modes be something to change in a viewer? And which fill modes are used in practice (and do we have any chance to find this out)?
How to deal with missing values?
Depending on the concrete situation, there may be no value available to fill a gap in an axis. Hence, how to deal with this situation?
Numeric values
For numeric values, some kind of “NaN” (not a number) could be used.
For NumPy, only floats can have a dedicated “NaN”, but no other dtype.
Hence, in case of missing values, a masked array (
numpy.ma.MaskedArray
) is used and numpy.ma.masked
set
explicitly for those missing values. For all practical purposes,
this should work similar to the numpy.nan
. In particular,
when trying to plot a numpy.ma.MaskedArray
, the masked values are
simply ignored. For further details of how to work with masked arrays,
see the numpy.ma
documentation.
Non-numeric values
First of all: Does this situation occur in reality? Yes, there are axes with non-numeric values. But are these axes ever joined? If so, some textual value such as “N/A” (not available) may be used.
Note
The default fill value of a numpy.ma.MaskedArray
is N/A
,
and this is (only) used when calling numpy.ma.MaskedArray.filled()
.
Otherwise, the masked values are in most cases simply ignored. For an
overview of the default fill values of masked arrays, see the
numpy.ma.MaskedArray.fill_value
attribute.
Join modes currently implemented
Currently, there is exactly one join mode implemented:
-
Inflate axes to data dimensions using last for missing value.
If no previous axes value is available, convert the data into a
numpy.ma.MaskedArray
object and mask the value.This mode is equivalent to the “LastFill” mode described above.
For developers
To implement additional join modes, create a class inheriting from the
Join
base class and implement the actual joining in the private
method _join()
.
There is a factory class JoinFactory
that you can ask to get a
Join
object:
factory = JoinFactory()
join = factory.get_join(mode="AxesLastFill")
This would return an AxesLastFill
object. For further details,
see the JoinFactory
documentation.
Module documentation
- class evedata.measurement.controllers.joining.Join(measurement=None)
Bases:
object
Base class for joining data.
For each motor axis and detector channel, in the original eveH5 file only those values appear—together with a “position counter” (PosCount) value—that have actually been set or measured. Hence, the number of values (i.e., the length of the data vector) will generally be different for different devices. To be able to plot arbitrary data against each other, the corresponding data vectors need to be commensurate. If this is not the case, they need to be brought to the same dimensions (i.e., “joined”, originally somewhat misleadingly termed “filled”).
The main “quantisation” axis of the values for a device and the common reference is the list of positions. Hence, to join, first of all the lists of positions are compared, and gaps handled accordingly.
As there are different strategies how to deal with gaps in the positions list, generally, there will be different subclasses of the
Join
class dealing each with a particular strategy.- measurement
Measurement the Join should be performed for.
Although joining is carried out for a small subset of the device data of a measurement, additional information from the measurement may be necessary to perform the task.
- Parameters:
measurement (
evedata.measurement.boundaries.measurement.Measurement
) – Measurement the join should be performed for.
Examples
Usually, joining takes place in the
set_data()
andset_axes()
methods. Furthermore, aMeasurement
object will have aJoin
instance of the appropriate type. To join data, in this case of a detector channel and a motor axis, calljoin()
with the respective parameters:join = Join(measurement=my_measurement) data, *axes = join.join( data=("SimChan:01", None), axes=(("SimMot:02", None)), )
Note the use of two variables for the return of the method, and in particular the use of
*axes
ensuring thataxes
is always a list and takes all remaining return arguments, regardless of their count.Important
While it may be tempting to use this class on your own and work further with the returned arrays, you will lose all metadata and context. Hence, simply don’t. Just use the interface provided by
Measurement
instead.- join(data=None, axes=None, scan_module='')
Harmonise data.
The main “quantisation” axis of the values for a device and the common reference is the list of positions. Hence, to join, first of all the lists of positions are compared, and gaps handled accordingly.
As there are different strategies how to deal with gaps in the positions list, generally, there will be different subclasses of the
Join
class dealing each with a particular strategy.- Parameters:
Name of the device and its attribute data are taken from.
If the attribute is set to None,
data
will be used instead.Names of the devices and their attribute axes values are taken from.
If an attribute is set to None,
data
will be used instead.Each element of the tuple/list is itself a two-element tuple/list with name and attribute.
scan_module (
str
) – Scan module ID the device belongs to
- Returns:
data – Joined data and axes values.
The first element is always the data, the following the (variable number of) axes. To separate the two and always get a list of axes, you may call it like this:
data, *axes = join.join(...)
- Return type:
- Raises:
ValueError – Raised if no measurement is present
ValueError – Raised if no data are provided
ValueError – Raised if no axes are provided
- class evedata.measurement.controllers.joining.AxesLastFill(measurement=None)
Bases:
Join
Inflate axes to data dimensions using last for missing value.
This was previously known as “LastFill” mode and was described as “Use all channel data and fill in the last known position for all axes without values.” In SQL terms (relational database), this would be similar to a left join with data left and axes right, but additionally explicitly setting the missing axes values in the join to the last known axis value.
While the terms “channel” and “axis” have different meanings than in context of the
joining
module, the behaviour is qualitatively similar:The device used as “data” is taken as reference and its values are not changed.
The values of devices used as “axes” are inflated to the same dimension as the data.
For values originally missing for an axis, the last value of the previous position is used.
If no previous value exists for a missing value, the data are converted into a
numpy.ma.MaskedArray
object and the values masked withnumpy.ma.masked
.The snapshots are checked for values corresponding to the axis, and if present, are taken into account.
Of course, as in all cases, the (integer) positions are used as common reference for the values of all devices.
Important
If there is more than one snapshot, always the newest snapshot previous to the current axis position should be used. Check whether this is implemented already.
- measurement
Measurement the join should be performed for.
Although joining is carried out for a small subset of the device data of a measurement, additional information from the measurement may be necessary to perform the task, e.g., the snapshots.
- Parameters:
measurement (
evedata.measurement.boundaries.measurement.Measurement
) – Measurement the join should be performed for.
Examples
See the
Join
base class for examples – and replace the class name accordingly.- join(data=None, axes=None, scan_module='')
Harmonise data.
The main “quantisation” axis of the values for a device and the common reference is the list of positions. Hence, to join, first of all the lists of positions are compared, and gaps handled accordingly.
As there are different strategies how to deal with gaps in the positions list, generally, there will be different subclasses of the
Join
class dealing each with a particular strategy.- Parameters:
Name of the device and its attribute data are taken from.
If the attribute is set to None,
data
will be used instead.Names of the devices and their attribute axes values are taken from.
If an attribute is set to None,
data
will be used instead.Each element of the tuple/list is itself a two-element tuple/list with name and attribute.
scan_module (
str
) – Scan module ID the device belongs to
- Returns:
data – Joined data and axes values.
The first element is always the data, the following the (variable number of) axes. To separate the two and always get a list of axes, you may call it like this:
data, *axes = join.join(...)
- Return type:
- Raises:
ValueError – Raised if no measurement is present
ValueError – Raised if no data are provided
ValueError – Raised if no axes are provided
- class evedata.measurement.controllers.joining.JoinFactory(measurement=None)
Bases:
object
Factory for getting the correct join object.
For background on the need for joining, see the documentation of the entire
joining
module, and of theJoin
class.Given a decision which type of join you would like to apply to your data, this factory class allows you to get the correct join instance without hassle. And you can even change your mind in between and don’t have to change any code—the whole idea behind factories.
- measurement
Measurement the join should be performed for.
- Parameters:
measurement (
evedata.measurement.boundaries.measurement.Measurement
) – Measurement the join should be performed for.
Examples
Getting a join object is as simple as calling a single method on the factory object:
factory = JoinFactory() join = factory.get_join(mode="AxesLastFill")
This will provide you with the appropriate
AxesLastFill
instance.As joins need a
Measurement
object, you can set one to the factory, and it will get added automatically to the join instance for you:factory = JoinFactory(measurement=my_measurement) join = factory.get_join(mode="AxesLastFill")
Thus, when used from within a
Measurement
object, set themeasurement
attribute toself
.- get_join(mode='Join')
Obtain a
Join
instance for a particular mode.If no mode is provided, this defaults to the base class. As the
Join
does not implement any functionality, this is rather useless.If the
measurement
attribute is set, it is automatically set in theJoin
instance returned.