perda.utils.search#
- pydantic model perda.utils.search.SearchEntry[source]#
Bases:
BaseModelOne entry in the search deck, holding raw variable data alongside its search card.
-
field var_id:
int[Required]# Internal variable ID.
-
field cpp_name:
str[Required]# C++ variable name used for data access.
-
field descript:
str[Required]# Human-readable variable description.
-
field card:
str[Required]# Space-separated search card text for scoring.
-
field var_id:
- pydantic model perda.utils.search.SearchResult[source]#
Bases:
BaseModelA single ranked result returned by
search().-
field rank:
int[Required]# 1-based position in the result list (1 = best match).
-
field score:
float[Required]# Relevance score (higher is better).
-
field var_id:
int[Required]# Internal variable ID.
-
field cpp_name:
str[Required]# C++ variable name used for data access.
-
field descript:
str[Required]# Human-readable variable description.
-
field rank:
- perda.utils.search.build_search_card(cpp_name, descript)[source]#
Build a search card for one variable.
Splits the C++ identifier on separators and camelCase boundaries, expands known abbreviations inline, and appends the description.
- Parameters:
cpp_name (str) – C++ variable name (e.g. “pcm.requestedTorque”).
descript (str) – Human-readable variable description.
- Returns:
Space-separated card text ready for the cross-encoder and keyword scorer.
- Return type:
str
- perda.utils.search.build_search_deck(data)[source]#
Build the search deck from all variables in a run.
- Parameters:
data (SingleRunData) – Parsed CSV telemetry data.
- Returns:
One entry per variable, containing its ID, names, description, and search card.
- Return type:
list[SearchEntry]
- perda.utils.search.combine_scores(semantic_score, keyword_score, num_terms)[source]#
Combine semantic and keyword scores using a weighted blend.
Short queries (fewer terms) get more keyword weight; longer queries lean on semantic relevance.
- Parameters:
semantic_score (float) – Relevance score from the cross-encoder.
keyword_score (float) – Relevance score from fuzzy keyword matching.
num_terms (int) – Number of terms in the original query.
- Returns:
Combined score
- Return type:
float
- perda.utils.search.install_encoder()[source]#
Download and save the cross-encoder model for semantic search.
- Returns:
True if the model loaded successfully, False otherwise.
- Return type:
bool
Notes
Returns False immediately if
sentence-transformersis not installed (i.e.perda[semantic]extra was not requested). Any download or filesystem error is caught and printed; the function returns False so callers fall back to keyword-only search.
- perda.utils.search.keyword_score(query_terms, entry)[source]#
Score a card against query terms using fuzzy partial matching.
Uses rapidfuzz.fuzz.partial_ratio per term then averages. Handles prefixes, substrings, and minor typos naturally.
- Parameters:
query_terms (list[str]) – Tokenized query terms.
entry (SearchEntry) – The search entry to score.
- Returns:
Mean fuzzy match score in [0, 1].
- Return type:
float
- perda.utils.search.preprocess_query(query)[source]#
Expand domain abbreviations in a search query for semantic ranking.
- Parameters:
query (str) – Raw user query string.
- Returns:
Query with known abbreviations expanded and duplicate tokens removed.
- Return type:
str
- perda.utils.search.search(data, query, top_n=10)[source]#
Search telemetry variables, print the top matches, and return them.
- Parameters:
data (SingleRunData) – Parsed CSV telemetry data.
query (str) – Free-text search query (e.g. “bat wheel”).
top_n (int) – Maximum number of results to return and display (default 10).
- Returns:
Top matches in descending relevance order (at most
top_nentries). Each entry exposesrank,score,var_id,cpp_name, anddescriptfor programmatic access.- Return type:
list[SearchResult]
Notes
When
perda[semantic]is installed and the cross-encoder model loads successfully, results are ranked by a weighted blend of semantic score and rapidfuzz keyword score. Otherwise falls back to keyword-only scoring with no error raised.Short queries lean on keyword matching; longer queries lean on semantic ranking when the model is available.
Examples
>>> results = aly.search("front wheel speed") >>> names = [r.cpp_name for r in results]