perda.utils.search#
- pydantic model perda.utils.search.SearchEntry[source]#
Bases:
BaseModelOne entry in the search deck, holding raw variable data alongside its search card.
-
field var_id:
int[Required]#
-
field cpp_name:
str[Required]#
-
field descript:
str[Required]#
-
field card:
str[Required]#
-
field var_id:
- perda.utils.search.build_search_card(cpp_name, descript)[source]#
Build a search card for one variable.
Splits the C++ identifier on separators and camelCase boundaries, expands known abbreviations inline, and appends the description.
- Parameters:
cpp_name (str) – C++ variable name (e.g. “pcm.requestedTorque”).
descript (str) – Human-readable variable description.
- Returns:
Space-separated card text ready for the cross-encoder and keyword scorer.
- Return type:
str
- perda.utils.search.build_search_deck(data)[source]#
Build the search deck from all variables in a run.
- Parameters:
data (SingleRunData) – Parsed CSV telemetry data.
- Returns:
One entry per variable, containing its ID, names, description, and search card.
- Return type:
list[SearchEntry]
- perda.utils.search.combine_scores(semantic_score, keyword_score, num_terms)[source]#
Combine semantic and keyword scores using a weighted blend.
Short queries (fewer terms) get more keyword weight; longer queries lean on semantic relevance.
- Parameters:
semantic_score (float) – Relevance score from the cross-encoder.
keyword_score (float) – Relevance score from fuzzy keyword matching.
num_terms (int) – Number of terms in the original query.
- Returns:
Combined score
- Return type:
float
- perda.utils.search.install_encoder()[source]#
Download and save the cross-encoder model for semantic search.
- Return type:
None
- perda.utils.search.keyword_score(query_terms, entry)[source]#
Score a card against query terms using fuzzy partial matching.
Uses rapidfuzz.fuzz.partial_ratio per term then averages. Handles prefixes, substrings, and minor typos naturally.
- Parameters:
query_terms (list[str]) – Tokenized query terms.
entry (SearchEntry) – The search entry to score.
- Returns:
Mean fuzzy match score in [0, 1].
- Return type:
float
- perda.utils.search.preprocess_query(query)[source]#
Expand domain abbreviations in a search query for semantic ranking.
- Parameters:
query (str) – Raw user query string.
- Returns:
Query with known abbreviations expanded and duplicate tokens removed.
- Return type:
str
- perda.utils.search.print_result(entry, score)[source]#
Print a single ranked search result.
- Parameters:
entry (SearchEntry) – The search entry to display.
score (float) – Combined relevance score.
- Return type:
None
- perda.utils.search.search(data, query)[source]#
Search telemetry variables and print the top matches.
- Parameters:
data (SingleRunData) – Parsed CSV telemetry data.
query (str) – Free-text search query (e.g. “bat wheel”).
- Return type:
None
Notes
Results are ranked by weighted blend of cross-encoder semantic score and rapidfuzz keyword score. Each variable is represented by a search card combining its expanded C++ identifier tokens with its description.
Short queries lean on keyword matching, longer queries lean on semantic ranking.