nilspodlib.dataset.Dataset#
- class nilspodlib.dataset.Dataset(sensor_data: Dict[str, ndarray], counter: ndarray, info: Header)[source]#
Class representing a logged session of a single NilsPod.
Warning
Some operations on the dataset should not be performed after each other, as they can lead to unexpected results. The respective methods have specific warnings in their docstring.
Each instance has 3 important (groups of attributes):
self.info: A instance of
nilspodlib.header.Header
containing all the meta info about the measurement.self.counter: The continuous counter created by the sensor. It is in particular important to synchronise multiple datasets that were recorded at the same time (see
nilspodlib.session.SyncedSession
).datastream: The actual sensor_type data accessed directly by the name of the sensor_type (e.g. acc, gyro, baro, …). Each sensor_type data is wrapped in a
NilPodLib.datastream.Datastream
object.
- Attributes:
- path
Path pointing to the recording file (if dataset was loaded from a file)
- info
Metadata of the recording
size
Get the number of samples in the Dataset.
- counter
The continuous counter of the sensor.
time_counter
Counter in seconds since first sample.
utc_counter
Counter as utc timestamps.
utc_datetime_counter
Counter as pandas datetime series in UTC timezone.
- active_sensor
The enabled sensors in the dataset.
datastreams
Iterate through all available datastreams.
- acc
Optional accelerometer datastream.
- gyro
Optional gyroscope datastream.
- mag
Optional magnetometer datastream.
- baro
Optional barometer datastream.
- analog
Optional analog datastream. Its content will depend on the exact recording and sensor used.
- ecg
Optional ECG datastream.
- ppg
Optional PPG datastream.
- temperature
Optional temperature reading datastream.
Methods
calibrate_imu
(calibration[, inplace])Apply a calibration to the Acc and Gyro datastreams.
cut
([start, stop, step, inplace])Cut all datastreams of the dataset.
cut_counter_val
([start, stop, step, inplace])Cut the dataset based on values in the counter and not the index.
cut_to_syncregion
([start, end, warn_thres, ...])Cut the dataset to the region indicated by the first and last sync package received from master.
data_as_df
([datastreams, index, include_units])Export the datastreams of the dataset in a single pandas DataFrame.
downsample
(factor[, inplace])Downsample all datastreams by a factor.
find_calibrations
([folder, recursive, ...])Find all calibration infos that belong to a given sensor_type.
find_closest_calibration
([folder, ...])Find the closest calibration info to the start of the measurement.
from_bin_file
(path, *[, legacy_support, ...])Create a new Dataset from a valid .bin file.
imu_data_as_df
([index, include_units])Export the acc and gyro datastreams of the dataset in a single pandas DataFrame.
- __init__(sensor_data: Dict[str, ndarray], counter: ndarray, info: Header)[source]#
Get new Dataset instance.
Note
Usually you shouldn’t use this init directly. Use the provided
from_bin_file
constructor to handle loading recorded NilsPod Sessions.- Parameters:
- sensor_data
Dictionary with name of sensor_type and sensor_type data as np.array The data needs to be 2D with time/counter as first dimension
- counter
The counter created by the sensor_type. Should have the same length as all datasets
- info
Header instance containing all Metainfo about the measurement.
- calibrate_imu(calibration: Union[CalibrationInfo, path_t], inplace: bool = False) Self [source]#
Apply a calibration to the Acc and Gyro datastreams.
The final units of the datastreams will depend on the used calibration values, but must likely they will be “g” for the Acc and “dps” (degrees per second) for the Gyro.
- Parameters:
- calibration
calibration object or path to .json file, that can be used to create one.
- inplace
If True this methods modifies the current dataset object. If False, a copy of the dataset and all datastream objects is created Notes:
- inplace
If True this methods modifies the current dataset object. If False, a copy of the dataset and all datastream objects is created Notes: This just combines
calibrate_acc
andcalibrate_gyro
.
- cut(start: Optional[int] = None, stop: Optional[int] = None, step: Optional[int] = None, inplace: bool = False) Self [source]#
Cut all datastreams of the dataset.
This is equivalent to applying the following slicing to all datastreams and the counter: array[start:stop:step]
Warning
This will not modify any values in the header/info the dataset. I.e. the number of samples in the header/ sync index values. Using methods that rely on these values might result in unexpected behaviour. For example
cut_to_syncregion
will not work correctly, ifcut
orcut_counter_val
was used before.- Parameters:
- start
Start index
- stop
Stop index
- step
Step size of the cut
- inplace
If True this methods modifies the current dataset object. If False, a copy of the dataset and all datastream objects is created
- cut_counter_val(start: Optional[int] = None, stop: Optional[int] = None, step: Optional[int] = None, inplace: bool = False) Self [source]#
Cut the dataset based on values in the counter and not the index.
Instead of just cutting the datastream based on its index, it is cut based on the counter value. This is equivalent to applying the following pandas style slicing to all datastreams and the counter: array.loc[start:stop:step]
Warning
This will not modify any values in the header/info the dataset. I.e. the number of samples in the header/ sync index values. Using methods that rely on these values might result in unexpected behaviour. For example
cut_to_syncregion
will not work correctly, ifcut
orcut_counter_val
was used before.- Parameters:
- start
Start value in counter
- stop
Stop value in counter
- step
Step size of the cut
- inplace
If True this methods modifies the current dataset object. If False, a copy of the dataset and all datastream objects is created
Notes
The method searches the respective index for the start and the stop value in the
counter
and callscut
with these values. The step size will be passed directly and not modified (i.e. the step size will not respect downsampling or similar operations done beforehand).
- cut_to_syncregion(start: bool = True, end: bool = False, warn_thres: Optional[int] = 30, inplace: bool = False) Self [source]#
Cut the dataset to the region indicated by the first and last sync package received from master.
This cuts the dataset to the values indicated by
info.sync_index_start
andinfo.sync_index_stop
. In case the dataset was a sync-master (info.sync_role = 'master'
) this will have no effect and the dataset will be returned unmodified.Warning
This function should not be used after any other methods that can modify the counter (e.g.
cut
ordownsample
).Warning
This will not modify any values in the header/info the dataset. I.e. the number of samples in the header/ sync index values. Using methods that rely on these values might result in unexpected behaviour.
- Parameters:
- start
Whether the dataset should be cut at the
info.sync_index_start
. If this is False, a jump in the counter will remain. The only usecase for not cutting at the start is when the counters are already perfectly aligned.- end
Whether the dataset should be cut at the
info.sync_index_stop
. Usually it can be assumed that the data will be synchronous for multiple seconds after the last sync package. Therefore, it might be acceptable to just ignore the last syncpackage and just cut the start of the dataset.- warn_thres
Threshold in seconds from the end of a dataset. If the last syncpackage occurred more than warn_thres before the end of the dataset, a warning is emitted. Use warn_thres = None to silence. This is not relevant if the end of the dataset is cut (e.g.
end=True
)- inplace
If True this methods modifies the current dataset object. If False, a copy of the dataset and all datastream objects is created
- Raises:
- ValueError
If the dataset does not have any sync infos
- ValueError
If the dataset does not have any sync infos
Warning
- UserWarning
If a syncpackage occurred far before the last sample in the dataset. See arg
warn_thres
.
Notes
Usually to work with multiple syncronised datasets, a
SyncedSession
should be used instead of cutting the datasets manually.SyncedSession.cut_to_syncregion
will cover multiple edge cases involving multiple datasets, which can not be handled by this method.
- data_as_df(datastreams: Optional[Sequence[str]] = None, index: Optional[str] = None, include_units: Optional[bool] = False) DataFrame [source]#
Export the datastreams of the dataset in a single pandas DataFrame.
- Parameters:
- datastreams
Optional list of datastream names, if only specific ones should be included. Datastreams that are not part of the current dataset will be silently ignored.
- index
Specify which index should be used for the dataset. The options are: “counter”: For the actual counter “time”: For the time in seconds since the first sample “utc”: For the utc time stamp of each sample “utc_datetime”: for a pandas DateTime index in UTC time “local_datetime”: for a pandas DateTime index in the timezone set for the session None: For a simple index (0…N)
- include_units
If True the column names will have the unit of the datastream concatenated with an
_
Notes:- include_units
If True the column names will have the unit of the datastream concatenated with an
_
Notes: This method calls thedata_as_df
methods of each Datastream object and then concats the results.- include_units
If True the column names will have the unit of the datastream concatenated with an
_
- Raises:
- ValueError
If any other than the allowed
index
values are used.
Notes
This method calls the
data_as_df
methods of each Datastream object and then concats the results. Therefore, it will use the column information of each datastream.
- downsample(factor: int, inplace: bool = False) Self [source]#
Downsample all datastreams by a factor.
This applies
scipy.signal.decimate
to all datastreams and the counter of the dataset. Seenilspodlib.datastream.Datastream.downsample
for details.Warning
This will not modify any values in the header/info the dataset. I.e. the number of samples in the header/ sync index values. Using methods that rely on these values might result in unexpected behaviour. For example
cut_to_syncregion
will not work correctly, ifcut
,cut_counter_val
, ordownsample
was used before.- Parameters:
- factor
Factor by which the dataset should be downsampled.
- inplace
If True this methods modifies the current dataset object. If False, a copy of the dataset and all datastream objects is created
- find_calibrations(folder: Optional[path_t] = None, recursive: bool = True, filter_cal_type: Optional[str] = None, ignore_file_not_found: Optional[bool] = False) List[Path] [source]#
Find all calibration infos that belong to a given sensor_type.
As this only checks the filenames, this might return a false positive depending on your folder structure and naming.
- Parameters:
- folder
Basepath of the folder to search. If None, tries to find a default calibration
- recursive
If the folder should be searched recursive or not.
- filter_cal_type
Whether only files obtain with a certain calibration type should be found. This will look for the
CalType
inside the json file and hence cause performance problems. If None, all found files will be returned. For possible values, see theimucal
library.- ignore_file_not_found
If True this function will not raise an error, but rather return an empty list, if no calibration files were found for the specific sensor_type.
- find_closest_calibration(folder: Optional[path_t] = None, recursive: bool = True, filter_cal_type: Optional[str] = None, before_after: Optional[str] = None, warn_thres: timedelta = datetime.timedelta(days=30), ignore_file_not_found: Optional[bool] = False) Path [source]#
Find the closest calibration info to the start of the measurement.
As this only checks the filenames, this might return a false positive depending on your folder structure and naming.
- Parameters:
- folder
Basepath of the folder to search. If None, tries to find a default calibration
- recursive
If the folder should be searched recursive or not.
- filter_cal_type
Whether only files obtain with a certain calibration type should be found. This will look for the
CalType
inside the json file and hence cause performance problems. If None, all found files will be returned. For possible values, see theimucal
library.- before_after
Can either be ‘before’ or ‘after’, if the search should be limited to calibrations that were either before or after the specified date.
- warn_thres
If the distance to the closest calibration is larger than this threshold, a warning is emitted
- ignore_file_not_found
If True this function will not raise an error, but rather return
None
, if no calibration files were found for the specific sensor_type.
- classmethod from_bin_file(path: path_t, *, legacy_support: str = 'error', force_version: Optional[Version] = None, tz: Optional[str] = None) Self [source]#
Create a new Dataset from a valid .bin file.
- Parameters:
- path
Path to the file
- legacy_support
This indicates how to deal with old firmware versions. If
error
: An error is raised, if an unsupported version is detected. Ifwarn
: A warning is raised, but the file is parsed without modification Ifresolve
: A legacy conversion is performed to load old files. If no suitable conversion is found, an error is raised. See thelegacy
package and the README to learn more about available conversions.- force_version
Instead of relying on the version provided in the session header, the legacy support will be determined based on the version provided here. This is only used, if
legacy_support="resolve"
. This option can be helpful, when testing with development firmware images that don’t have official version numbers.- tz
Optional timezone str of the recording. This can be used to localize the start and end time. Note, this should not be the timezone of your current PC, but the timezone relevant for the specific recording.
- Raises:
- VersionError
If unsupported FirmwareVersion is detected and
legacy_error
is True
- imu_data_as_df(index: Optional[str] = None, include_units: Optional[bool] = False) DataFrame [source]#
Export the acc and gyro datastreams of the dataset in a single pandas DataFrame.
- Parameters:
- index
Specify which index should be used for the dataset. The options are: “counter”: For the actual counter “time”: For the time in seconds since the first sample “utc”: For the utc time stamp of each sample “utc_datetime”: for a pandas DateTime index in UTC time “local_datetime”: for a pandas DateTime index in the timezone set for the session None: For a simple index (0…N)
- include_units
If True the column names will have the unit of the datastream concatenated with an
_
Notes:- include_units
If True the column names will have the unit of the datastream concatenated with an
_
Notes: This method calls thedata_as_df
methods of each Datastream object and then concats the results.- include_units
If True the column names will have the unit of the datastream concatenated with an
_
- Raises:
- ValueError
If any other than the allowed
index
values are used.
Notes
This method calls the
data_as_df
methods of each Datastream object and then concats the results. Therefore, it will use the column information of each datastream.