perda.core_data_structures.data_instance#

pydantic model perda.core_data_structures.data_instance.DataInstance[source]#

Bases: BaseModel

A single time-series variable, pairing a 1D timestamp array with a 1D value array.

Config:
  • arbitrary_types_allowed: bool = True

Fields:
Validators:
field timestamp_np: ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]] [Required]#

Timestamps as a 1D NumPy array

Validated by:
field value_np: ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]] [Required]#

Values as a 1D NumPy array

Validated by:
field label: str | None = None#

Human-readable label for this variable

field var_id: int | None = None#

Unique variable ID

field cpp_name: str | None = None#

C++ variable name

validator validate_timestamp  »  timestamp_np[source]#

Validate that timestamp array is 1-dimensional, positive, and strictly increasing.

Return type:

ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]

Parameters:

v (Any)

validator validate_value  »  value_np[source]#

Validate that value array is 1-dimensional

Return type:

ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]

Parameters:

v (Any)

model_post_init(_DataInstance__context)[source]#

Post-initialization validation that timestamp and value arrays have the same length.

Return type:

None

Parameters:

_DataInstance__context (Any)

trim(ts_start=None, ts_end=None)[source]#

Return a new DataInstance containing only points within the given timestamp range.

Parameters:
  • ts_start (float | None, optional) – Lower bound in raw timestamp units (inclusive). Default is None (no lower bound).

  • ts_end (float | None, optional) – Upper bound in raw timestamp units (inclusive). Default is None (no upper bound).

Returns:

New DataInstance with only in-range data points.

Return type:

DataInstance

Examples

>>> clipped = di.trim(ts_start=10_000, ts_end=30_000)
class perda.core_data_structures.data_instance.FilterOptions(*values)[source]#

Bases: Enum

Specifies which array(s) a filter function receives as input.

BOTH = 'both'#
TIMESTAMPS = 'right_only'#
VALUES = 'left_only'#
perda.core_data_structures.data_instance.apply_ufunc_filter(data, filter_func, apply_to=FilterOptions.VALUES)[source]#

Apply a filter function to a DataInstance.

Parameters:
  • data (DataInstance) – Input DataInstance

  • filter_func (Callable) – Function that takes in values and/or timestamps and returns a boolean mask

  • apply_to (FilterOptions, optional) – Whether to apply the filter to values, timestamps, or both. Default is values

Returns:

Filtered DataInstance

Return type:

DataInstance

perda.core_data_structures.data_instance.apply_ufunc_inner_join(left, right, ufunc, *, tolerance)[source]#

Apply a binary operation to two DataInstances using inner join.

Parameters:
  • left (DataInstance) – Left DataInstance

  • right (DataInstance) – Right DataInstance

  • ufunc (Callable) – NumPy universal function to apply (e.g., np.add, np.subtract)

  • tolerance (float) – Maximum allowed distance between left and right timestamps for a match.

Returns:

New DataInstance with combined values

Return type:

DataInstance

perda.core_data_structures.data_instance.apply_ufunc_left_join(left, right, ufunc)[source]#

Apply a binary operation to two DataInstances using left join.

Parameters:
  • left (DataInstance) – Left DataInstance (all timestamps are kept)

  • right (DataInstance) – Right DataInstance (values interpolated to left)

  • ufunc (Callable) – NumPy universal function to apply (e.g., np.add, np.subtract)

Returns:

New DataInstance with combined values

Return type:

DataInstance

perda.core_data_structures.data_instance.apply_ufunc_outer_join(left, right, ufunc, *, drop_nan=True, fill=0.0)[source]#

Apply a binary operation to two DataInstances using outer join.

Parameters:
  • left (DataInstance) – Left DataInstance

  • right (DataInstance) – Right DataInstance

  • ufunc (Callable) – NumPy universal function to apply (e.g., np.add, np.subtract)

  • drop_nan (bool, optional) – If True, drop rows where either series has NaN after interpolation. Default is True.

  • fill (float, optional) – Fill value for NaNs when drop_nan is False. Default is 0.0.

Returns:

New DataInstance with combined values

Return type:

DataInstance

perda.core_data_structures.data_instance.inner_join_data_instances(left, right, *, tolerance)[source]#

Inner join two DataInstances: keep only left timestamps with matching right timestamps.

Parameters:
  • left (DataInstance) – Left DataInstance

  • right (DataInstance) – Right DataInstance

  • tolerance (float) – Maximum allowed distance between left and right timestamps for a match. Timestamps with distance > tolerance are dropped.

Return type:

Tuple[DataInstance, DataInstance]

Returns:

  • left_result (DataInstance) – Left DataInstance with only matched timestamps

  • right_result (DataInstance) – Right DataInstance with only matched timestamps

perda.core_data_structures.data_instance.left_join_data_instances(left, right)[source]#

Left join two DataInstances: keep all left timestamps, interpolate right values.

Parameters:
  • left (DataInstance) – Left DataInstance (all timestamps are kept)

  • right (DataInstance) – Right DataInstance (values are matched/interpolated to left)

Return type:

Tuple[DataInstance, DataInstance]

Returns:

  • left_result (DataInstance) – Left DataInstance with aligned timestamps

  • right_result (DataInstance) – Right DataInstance with values interpolated to left timestamps

perda.core_data_structures.data_instance.outer_join_data_instances(left, right, *, drop_nan=True, fill=0.0)[source]#

Outer join two DataInstances: union of timestamps with interpolation.

Parameters:
  • left (DataInstance) – Left DataInstance

  • right (DataInstance) – Right DataInstance

  • drop_nan (bool, optional) – If True, drop rows where either series has NaN after interpolation. Default is True.

  • fill (float, optional) – Fill value for NaNs when drop_nan is False. Default is 0.0.

Return type:

Tuple[DataInstance, DataInstance]

Returns:

  • left_result (DataInstance) – Left DataInstance with values interpolated to union timestamps

  • right_result (DataInstance) – Right DataInstance with values interpolated to union timestamps