perda.utils.search#

pydantic model perda.utils.search.SearchEntry[source]#

Bases: BaseModel

One entry in the search deck, holding raw variable data alongside its search card.

Fields:

card (str)
cpp_name (str)
descript (str)
var_id (int)

field var_id: int [Required]#: Internal variable ID.

field cpp_name: str [Required]#: C++ variable name used for data access.

field descript: str [Required]#: Human-readable variable description.

field card: str [Required]#: Space-separated search card text for scoring.

pydantic model perda.utils.search.SearchResult[source]#

Bases: BaseModel

A single ranked result returned by search().

Fields:

cpp_name (str)
descript (str)
rank (int)
score (float)
var_id (int)

field rank: int [Required]#: 1-based position in the result list (1 = best match).

field score: float [Required]#: Relevance score (higher is better).

field var_id: int [Required]#: Internal variable ID.

field cpp_name: str [Required]#: C++ variable name used for data access.

field descript: str [Required]#: Human-readable variable description.

perda.utils.search.build_search_card(cpp_name, descript)[source]#

Build a search card for one variable.

Splits the C++ identifier on separators and camelCase boundaries, expands known abbreviations inline, and appends the description.

Parameters:

cpp_name (str) – C++ variable name (e.g. “pcm.requestedTorque”).
descript (str) – Human-readable variable description.

Returns:

Space-separated card text ready for the cross-encoder and keyword scorer.

Return type:

str

perda.utils.search.build_search_deck(data)[source]#

Build the search deck from all variables in a run.

Parameters:: data (SingleRunData) – Parsed CSV telemetry data.
Returns:: One entry per variable, containing its ID, names, description, and search card.
Return type:: list[SearchEntry]

perda.utils.search.combine_scores(semantic_score, keyword_score, num_terms)[source]#

Combine semantic and keyword scores using a weighted blend.

Short queries (fewer terms) get more keyword weight; longer queries lean on semantic relevance.

Parameters:

semantic_score (float) – Relevance score from the cross-encoder.
keyword_score (float) – Relevance score from fuzzy keyword matching.
num_terms (int) – Number of terms in the original query.

Returns:

Combined score

Return type:

float

perda.utils.search.install_encoder()[source]#

Download and save the cross-encoder model for semantic search.

Returns:: True if the model loaded successfully, False otherwise.
Return type:: bool

Notes

Returns False immediately if sentence-transformers is not installed (i.e. perda[semantic] extra was not requested). Any download or filesystem error is caught and printed; the function returns False so callers fall back to keyword-only search.

perda.utils.search.keyword_score(query_terms, entry)[source]#

Score a card against query terms using fuzzy partial matching.

Uses rapidfuzz.fuzz.partial_ratio per term then averages. Handles prefixes, substrings, and minor typos naturally.

Parameters:

query_terms (list[str]) – Tokenized query terms.
entry (SearchEntry) – The search entry to score.

Returns:

Mean fuzzy match score in [0, 1].

Return type:

float

perda.utils.search.preprocess_query(query)[source]#

Expand domain abbreviations in a search query for semantic ranking.

Parameters:: query (str) – Raw user query string.
Returns:: Query with known abbreviations expanded and duplicate tokens removed.
Return type:: str

perda.utils.search.search(data, query, top_n=10)[source]#

Search telemetry variables, print the top matches, and return them.

Parameters:

data (SingleRunData) – Parsed CSV telemetry data.
query (str) – Free-text search query (e.g. “bat wheel”).
top_n (int) – Maximum number of results to return and display (default 10).

Returns:

Top matches in descending relevance order (at most top_n entries). Each entry exposes rank, score, var_id, cpp_name, and descript for programmatic access.

Return type:

list[SearchResult]

Notes

When perda[semantic] is installed and the cross-encoder model loads successfully, results are ranked by a weighted blend of semantic score and rapidfuzz keyword score. Otherwise falls back to keyword-only scoring with no error raised.

Short queries lean on keyword matching; longer queries lean on semantic ranking when the model is available.

Examples

>>> results = aly.search("front wheel speed")
>>> names = [r.cpp_name for r in results]

perda.utils.search#

This Page