perda.core_data_structures.data_instance#
- pydantic model perda.core_data_structures.data_instance.DataInstance[source]#
Bases:
BaseModelA single time-series variable, pairing a 1D timestamp array with a 1D value array.
- Config:
arbitrary_types_allowed: bool = True
- Fields:
- Validators:
-
field timestamp_np:
ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]] [Required]# Timestamps as a 1D NumPy array
- Validated by:
-
field value_np:
ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]] [Required]# Values as a 1D NumPy array
- Validated by:
-
field label:
str|None= None# Human-readable label for this variable
-
field var_id:
int|None= None# Unique variable ID
-
field cpp_name:
str|None= None# C++ variable name
- validator validate_timestamp » timestamp_np[source]#
Validate that timestamp array is 1-dimensional, positive, and strictly increasing.
- Return type:
ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]]- Parameters:
v (Any)
- validator validate_value » value_np[source]#
Validate that value array is 1-dimensional
- Return type:
ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]]- Parameters:
v (Any)
- model_post_init(_DataInstance__context)[source]#
Post-initialization validation that timestamp and value arrays have the same length.
- Return type:
None- Parameters:
_DataInstance__context (Any)
- trim(ts_start=None, ts_end=None)[source]#
Return a new DataInstance containing only points within the given timestamp range.
- Parameters:
ts_start (float | None, optional) – Lower bound in raw timestamp units (inclusive). Default is None (no lower bound).
ts_end (float | None, optional) – Upper bound in raw timestamp units (inclusive). Default is None (no upper bound).
- Returns:
New DataInstance with only in-range data points.
- Return type:
Examples
>>> clipped = di.trim(ts_start=10_000, ts_end=30_000)
- class perda.core_data_structures.data_instance.FilterOptions(*values)[source]#
Bases:
EnumSpecifies which array(s) a filter function receives as input.
- BOTH = 'both'#
- TIMESTAMPS = 'right_only'#
- VALUES = 'left_only'#
- perda.core_data_structures.data_instance.apply_ufunc_filter(data, filter_func, apply_to=FilterOptions.VALUES)[source]#
Apply a filter function to a DataInstance.
- Parameters:
data (DataInstance) – Input DataInstance
filter_func (Callable) – Function that takes in values and/or timestamps and returns a boolean mask
apply_to (FilterOptions, optional) – Whether to apply the filter to values, timestamps, or both. Default is values
- Returns:
Filtered DataInstance
- Return type:
- perda.core_data_structures.data_instance.apply_ufunc_inner_join(left, right, ufunc, *, tolerance)[source]#
Apply a binary operation to two DataInstances using inner join.
- Parameters:
left (DataInstance) – Left DataInstance
right (DataInstance) – Right DataInstance
ufunc (Callable) – NumPy universal function to apply (e.g., np.add, np.subtract)
tolerance (float) – Maximum allowed distance between left and right timestamps for a match.
- Returns:
New DataInstance with combined values
- Return type:
- perda.core_data_structures.data_instance.apply_ufunc_left_join(left, right, ufunc)[source]#
Apply a binary operation to two DataInstances using left join.
- Parameters:
left (DataInstance) – Left DataInstance (all timestamps are kept)
right (DataInstance) – Right DataInstance (values interpolated to left)
ufunc (Callable) – NumPy universal function to apply (e.g., np.add, np.subtract)
- Returns:
New DataInstance with combined values
- Return type:
- perda.core_data_structures.data_instance.apply_ufunc_outer_join(left, right, ufunc, *, drop_nan=True, fill=0.0)[source]#
Apply a binary operation to two DataInstances using outer join.
- Parameters:
left (DataInstance) – Left DataInstance
right (DataInstance) – Right DataInstance
ufunc (Callable) – NumPy universal function to apply (e.g., np.add, np.subtract)
drop_nan (bool, optional) – If True, drop rows where either series has NaN after interpolation. Default is True.
fill (float, optional) – Fill value for NaNs when drop_nan is False. Default is 0.0.
- Returns:
New DataInstance with combined values
- Return type:
- perda.core_data_structures.data_instance.inner_join_data_instances(left, right, *, tolerance)[source]#
Inner join two DataInstances: keep only left timestamps with matching right timestamps.
- Parameters:
left (DataInstance) – Left DataInstance
right (DataInstance) – Right DataInstance
tolerance (float) – Maximum allowed distance between left and right timestamps for a match. Timestamps with distance > tolerance are dropped.
- Return type:
Tuple[DataInstance,DataInstance]- Returns:
left_result (DataInstance) – Left DataInstance with only matched timestamps
right_result (DataInstance) – Right DataInstance with only matched timestamps
- perda.core_data_structures.data_instance.left_join_data_instances(left, right)[source]#
Left join two DataInstances: keep all left timestamps, interpolate right values.
- Parameters:
left (DataInstance) – Left DataInstance (all timestamps are kept)
right (DataInstance) – Right DataInstance (values are matched/interpolated to left)
- Return type:
Tuple[DataInstance,DataInstance]- Returns:
left_result (DataInstance) – Left DataInstance with aligned timestamps
right_result (DataInstance) – Right DataInstance with values interpolated to left timestamps
- perda.core_data_structures.data_instance.outer_join_data_instances(left, right, *, drop_nan=True, fill=0.0)[source]#
Outer join two DataInstances: union of timestamps with interpolation.
- Parameters:
left (DataInstance) – Left DataInstance
right (DataInstance) – Right DataInstance
drop_nan (bool, optional) – If True, drop rows where either series has NaN after interpolation. Default is True.
fill (float, optional) – Fill value for NaNs when drop_nan is False. Default is 0.0.
- Return type:
Tuple[DataInstance,DataInstance]- Returns:
left_result (DataInstance) – Left DataInstance with values interpolated to union timestamps
right_result (DataInstance) – Right DataInstance with values interpolated to union timestamps