perda.utils.search#

pydantic model perda.utils.search.SearchEntry[source]#

Bases: BaseModel

One entry in the search deck, holding raw variable data alongside its search card.

Fields:
field var_id: int [Required]#
field cpp_name: str [Required]#
field descript: str [Required]#
field card: str [Required]#
perda.utils.search.build_search_card(cpp_name, descript)[source]#

Build a search card for one variable.

Splits the C++ identifier on separators and camelCase boundaries, expands known abbreviations inline, and appends the description.

Parameters:
  • cpp_name (str) – C++ variable name (e.g. “pcm.requestedTorque”).

  • descript (str) – Human-readable variable description.

Returns:

Space-separated card text ready for the cross-encoder and keyword scorer.

Return type:

str

perda.utils.search.build_search_deck(data)[source]#

Build the search deck from all variables in a run.

Parameters:

data (SingleRunData) – Parsed CSV telemetry data.

Returns:

One entry per variable, containing its ID, names, description, and search card.

Return type:

list[SearchEntry]

perda.utils.search.combine_scores(semantic_score, keyword_score, num_terms)[source]#

Combine semantic and keyword scores using a weighted blend.

Short queries (fewer terms) get more keyword weight; longer queries lean on semantic relevance.

Parameters:
  • semantic_score (float) – Relevance score from the cross-encoder.

  • keyword_score (float) – Relevance score from fuzzy keyword matching.

  • num_terms (int) – Number of terms in the original query.

Returns:

Combined score

Return type:

float

perda.utils.search.install_encoder()[source]#

Download and save the cross-encoder model for semantic search.

Return type:

None

perda.utils.search.keyword_score(query_terms, entry)[source]#

Score a card against query terms using fuzzy partial matching.

Uses rapidfuzz.fuzz.partial_ratio per term then averages. Handles prefixes, substrings, and minor typos naturally.

Parameters:
  • query_terms (list[str]) – Tokenized query terms.

  • entry (SearchEntry) – The search entry to score.

Returns:

Mean fuzzy match score in [0, 1].

Return type:

float

perda.utils.search.preprocess_query(query)[source]#

Expand domain abbreviations in a search query for semantic ranking.

Parameters:

query (str) – Raw user query string.

Returns:

Query with known abbreviations expanded and duplicate tokens removed.

Return type:

str

perda.utils.search.print_result(entry, score)[source]#

Print a single ranked search result.

Parameters:
  • entry (SearchEntry) – The search entry to display.

  • score (float) – Combined relevance score.

Return type:

None

perda.utils.search.search(data, query)[source]#

Search telemetry variables and print the top matches.

Parameters:
  • data (SingleRunData) – Parsed CSV telemetry data.

  • query (str) – Free-text search query (e.g. “bat wheel”).

Return type:

None

Notes

Results are ranked by weighted blend of cross-encoder semantic score and rapidfuzz keyword score. Each variable is represented by a search card combining its expanded C++ identifier tokens with its description.

Short queries lean on keyword matching, longer queries lean on semantic ranking.