Tessera Advanced Search Guide¶

This guide covers Tessera's power-user search features: FTS5 operators, custom RRF weights, search mode overrides, and document-specific filtering. If you haven't read the basic search guide, start there. This document assumes you're comfortable with the search tool and want more control over how results are ranked and filtered.

Quick Reference¶

Feature	Syntax	Example
Phrase matching	`"exact phrase"` + `advanced_fts=True`	`"def hybrid_search"`
Negation	`term NOT excluded` + `advanced_fts=True`	`error NOT warning`
Prefix matching	`prefix*` + `advanced_fts=True`	`hybrid*`
Proximity search	`NEAR(term1 term2, distance)` + `advanced_fts=True`	`NEAR(search query, 5)`
Keyword-only mode	`lex:query`	`lex:ProjectDB`
Semantic-only mode	`vec:query` or `hyde:query`	`vec:error handling`
Custom weights	`weights="key=val,key=val"`	`weights="keyword=3.0,semantic=1.0"`
Language filter	`filter_language="lang"`	`filter_language="python"`
Document format filter	`formats="fmt1,fmt2"`	`formats="markdown,yaml"`

1. FTS5 Advanced Operators¶

Tessera uses SQLite FTS5 for keyword search. By default, operators like "...", NOT, *, and NEAR(...) are escaped and treated as literal strings. Enable advanced_fts=True to activate these operators.

Phrase Matching¶

Operator: "exact phrase"

Phrase matching finds exact word sequences in order. Useful for finding function definitions or specific patterns.

search('"def hybrid_search"', advanced_fts=True)

Expected: Returns chunks containing the exact phrase def hybrid_search in sequence. Without quotes, the search returns any chunk with both def and hybrid_search anywhere in the content (not necessarily adjacent).

Real benchmark data:

Query	Phrase Results	Unquoted Results	Difference
`"def hybrid_search"`	9	20	Phrase is more precise

When to use: Finding specific function signatures, exact error messages, or code patterns that must appear together.

Negation¶

Operator: term NOT excluded

Exclude results containing a specific term.

search("error NOT warning", advanced_fts=True)

Expected: Returns chunks with "error" but not "warning".

When to use: Narrowing broad searches. For example, "authentication NOT test" excludes test files while keeping authentication logic.

Prefix Matching¶

Operator: prefix*

Match any term starting with a prefix. Useful for function name families or similar variables.

search("hybrid*", advanced_fts=True)

Expected: Returns chunks with words starting with hybrid (e.g., hybrid_search, hybrid_rrf, hybrid_merge).

Benchmark latency:

Query	Latency (ms)
`hybrid*`	0.33
`hybrid_search` (exact)	0.70

Prefix matching is fast because FTS5 indexes pre-fix patterns.

When to use: Exploring function families (e.g., search_* for all search-related functions) or variable naming patterns.

Proximity Search¶

Operator: NEAR(term1 term2, distance)

Find terms that appear within N tokens of each other. Tokens are individual words or punctuation.

search("NEAR(search query, 5)", advanced_fts=True)

Expected: Returns chunks where "search" and "query" appear within 5 tokens of each other.

Real benchmark data:

Query	Results with NEAR	Results without
`NEAR(search query, 5)`	3	10+

When to use: Finding concepts that are discussed together. For example, NEAR(error handling, 3) finds code discussing error handling in close proximity, which is often more relevant than scattered mentions.

Mixing FTS5 Operators¶

Combine operators for powerful queries:

search('"error handling" NOT test', advanced_fts=True)
search('NEAR(authentication scope, 5) NOT mock', advanced_fts=True)

Caveat: FTS5 operator syntax is strict. If a query fails, fall back to advanced_fts=False (the default, which treats operators as literal strings) or simplify the query.

Safe Mode (Default)¶

When advanced_fts=False (the default), all special characters are escaped. This prevents syntax errors but treats operators as literal strings:

search("error NOT warning", advanced_fts=False)  # Searches for the literal string "error NOT warning"

Benchmark: Safe mode is always faster because queries can't fail — no syntax parsing overhead.

2. Custom RRF Weights¶

Tessera merges three ranking signals via Weighted Reciprocal Rank Fusion:

keyword (FTS5 BM25 score) — Exact term matching, precise for code
semantic (FAISS vector similarity) — Conceptual matching, good for design docs
graph (PageRank on symbol call graph) — Structural importance, good for understanding dependencies

Default weights: keyword=1.5, semantic=1.0, graph=0.8

Keyword is weighted highest because FTS5 precision is higher for code search. You can override these weights to adjust what Tessera prioritizes.

Boost Keyword (Identifier-Heavy Searches)¶

search("ProjectDB hybrid_search", weights="keyword=3.0,semantic=1.0,graph=0.8")

Use when you're looking for specific functions, classes, or variables by name. Keyword matching is already precise; boosting it suppresses noisier semantic results.

Real benchmark impact:

Weights	Result Change	Notes
Equal (1.0, 1.0, 1.0)	Baseline	Semantic noise bleeds through
Keyword boost (2.0, 1.0)	2/3 results rank differently	Top-1 + Top-3 swap

Boost Semantic (Conceptual Searches)¶

search("how do I implement retry logic", weights="keyword=1.0,semantic=2.0,graph=0.8")

Use when you're thinking in concepts, not exact names. A question like "how does authentication work" will find relevant design docs and patterns even if it doesn't contain the word "authentication".

When to use: - Searching for design patterns ("factory pattern", "dependency injection") - Exploring how a system works ("how does caching work") - Looking for examples in documentation

Boost Graph (Dependency-Aware Searches)¶

search("how does ProjectDB impact search", weights="keyword=1.0,semantic=1.0,graph=2.0")

Use when you want to understand what code is structurally important. PageRank prioritizes symbols that are called frequently and have many incoming edges (central to the system).

When to use: - Understanding critical paths in the codebase - Finding bottleneck functions that many other functions depend on - Mapping influence (what breaks if I change this function?)

Defaults: When NOT to Override¶

Default weights are tuned for general-purpose search and work well for most queries. Don't override unless you have a specific reason:

Identifier lookups (finding a function by name): Don't bother tuning — exact keyword match dominates anyway
Broad exploratory searches: Defaults blend all signals well
Single-project searches: Defaults assume balanced index (semantic and graph signals are reliable)

3. Search Mode Override¶

By default, Tessera runs both keyword (LEX) and semantic (VEC) search and merges results. You can override this behavior to run only one search type, or force a specific semantic mode.

Keyword-Only (LEX)¶

Inline syntax: lex:query

search("lex:hybrid_search")  # Force keyword-only
search("lex:ProjectDB")      # Look for exact function/class name

Equivalent to explicit mode:

search("ProjectDB", search_mode="lex")

Speed: Keyword-only search is fast (sub-millisecond for most queries). No embedding cost, no semantic ranking overhead.

Real benchmark: Keyword-only vs full hybrid for same query:

Query	Keyword-only (ms)	Full Hybrid (ms)	Saved
`ProjectDB`	0.81	14.3	13.5 ms

When to use: - You know the exact name of what you're searching for - Latency is critical (e.g., real-time code navigation) - Semantic results are too noisy (rare, but happens with short queries)

Semantic-Only (VEC)¶

Inline syntax: vec:query

search("vec:error handling strategy")  # Conceptual search only
search("vec:how do I validate input")  # Question-like query

Runs embedding and FAISS vector search only. Skips keyword matching.

When to use: - Your query is a question or description, not an exact name - You want pure conceptual matching (useful for design docs, architecture comments) - Keyword results are too strict (e.g., searching for "retry" misses "exponential backoff")

Hypothetical Document Embedding (HYDE)¶

Inline syntax: hyde:query

search("hyde:how to set up authentication")
search("hyde:deploy to production safely")

HYDE (Hypothetical Document Embeddings) uses a different embedding strategy: instead of adding a retrieval prefix (which expects you to write in the style of a search query), it embeds your query "as-is" (as if you're asking a hypothetical document).

When VEC vs HYDE matters:

VEC uses embed_query() which adds a retrieval prefix optimized for short search queries. HYDE uses embed_single() without the prefix, treating your query as natural language.

Real benchmark shows both work well for most queries:

Query	VEC Top-1	HYDE Top-1	Same?
`normalize_bm25_score`	search.py	test_search_ben	Different
`error handling`	test_server.py	test_server.py	Same
`graph traversal`	research.md	spec-v1.md	Different

Try HYDE when VEC results aren't relevant. The difference is subtle but can matter for natural-language questions.

When to use: - Asking full-sentence questions ("how do I handle errors safely") - Searching for design patterns described in docs - Pure exploration (let the embedding decide relevance)

4. Language Filtering¶

Restrict search to specific programming languages.

search("hybrid_search", filter_language="python")
search("error NOT warning", filter_language="typescript")

Supported languages: PHP, TypeScript, JavaScript, Python, Swift

This filter works at the database level — non-matching languages are excluded before ranking, saving compute on irrelevant results.

When to use: - Large multi-language codebases where you want results from one language only - Avoiding false matches from comments (e.g., error appears in many language docs)

5. Source Type Filtering¶

Restrict search to code only, or documents only.

search("authentication", source_type="code")      # Only source files
search("authentication", source_type="markdown")  # Only documentation

Valid source types: code, markdown, yaml, json, html, xml, text, txt, rst, csv, tsv, log, ini, cfg, toml, conf, pdf, asset

Document Search (Convenience Wrapper)¶

For document-only searches, use doc_search_tool instead of search:

doc_search_tool("authentication flow")  # Searches only docs, auto-excludes code

With format filtering:

doc_search_tool("database config", formats="yaml,toml")  # Only YAML and TOML files

6. Output Format Selection¶

Control how Tessera returns results.

JSON (Default)¶

search("hybrid_search", output_format="json")

Returns full metadata: file path, line numbers, content, scores, rank sources. Best for programmatic processing.

Markdown¶

search("hybrid_search", output_format="markdown")

Formats results as markdown with code blocks. Best for reading in a text editor or passing to an LLM.

Sample output:

### 1. `search.py:100-110` (score: 0.89)
\`\`\`
def hybrid_search(query, query_embedding, db, limit=10):
    """Hybrid search combining keyword and semantic results."""
    results = []
    ...
\`\`\`

CSV¶

search("hybrid_search", output_format="csv")

Tabular format with file path, line numbers, score, snippet. Best for import into spreadsheets or analysis tools.

Files Only¶

search("hybrid_search", output_format="files")

Returns only file paths, one per line. Deduplicated, no snippets. Best when you just want to know which files to read.

7. Snippet Context Modes¶

Control how much surrounding code context Tessera shows in results.

Lines Mode (Default)¶

search("hybrid_search", expand_context="lines")

Shows the best-matching lines surrounded by a collapsed nesting skeleton. This reveals the function/class hierarchy without showing all intermediate code.

Example output for a query inside a nested function:

32 | class MyClass:
        ...  (42 lines)
74 | def method():
        ...  (8 lines)
82 | key = value  <-- match is here

Each "..." shows how many lines are hidden. This lets you see structure without code bloat.

max_depth parameter: Limit how many nesting levels to show.

search("value", expand_context="lines", max_depth=1)

This shows only the immediate parent (e.g., the function containing the match), hiding any classes/modules wrapping it.

Full Mode¶

search("hybrid_search", expand_context="full")

Expands all ancestor code without collapsing. Shows the complete function/class from definition to end, even if it's 100+ lines.

Use this when you need the full context to understand what's happening.

8. BM25 Short-Circuit (Informational)¶

Note: This is automatic and you don't control it. But understanding it explains why some searches are faster than others.

When a keyword search result is unambiguously the best match, Tessera skips expensive semantic and graph searches. Specifically:

Trigger: Top BM25 result score ≥ 0.85 AND gap to second-place ≥ 0.15
Savings: ~30-50ms per query (skips embedding, FAISS search, PageRank computation)

Real benchmark data:

Query	Triggering?	Reason
`normalize_bm25_score` (top: 0.876, gap: 0.009)	No	Gap too small
`error handling` (top: 0.904, gap: 0.0003)	No	Gap too small

Most queries don't trigger short-circuit because keyword results are rarely so dominant. When they do (e.g., searching for an exact function name), you get a latency boost automatically.

9. Combining Advanced Features¶

Here are realistic query patterns combining multiple advanced features:

Find exact function implementation, fast¶

search("lex:hybrid_search", filter_language="python", output_format="files")

Keyword-only (fast), Python-specific, returns just file paths. Latency: <2ms.

Search for patterns in docs only¶

doc_search_tool("retry with exponential backoff", formats="markdown")

Excludes code files, searches only documentation. Latency: ~10-50ms depending on doc size.

Boost semantic for architecture questions¶

search(
    "how does the indexing pipeline work",
    weights="keyword=1.0,semantic=2.0,graph=0.8",
    expand_context="full"
)

Prioritizes semantic signal (good for conceptual questions), shows full context for understanding the system. Latency: ~20-30ms.

Find "error" handling but not tests¶

search(
    "error NOT test",
    advanced_fts=True,
    filter_language="typescript"
)

Negation to exclude noisy test results, TypeScript-only. Latency: ~1-2ms (FTS5 is very fast).

search(
    "NEAR(graph PageRank, 3)",
    advanced_fts=True,
    weights="keyword=2.0"
)

Finds "graph" and "PageRank" discussed together, boosts keyword matching. Latency: ~1-2ms.

10. Troubleshooting Advanced Queries¶

Advanced FTS operator syntax errors¶

Problem: Query with operators returns 0 results or an error.

Fix: FTS5 operator syntax is strict. If a query fails: 1. Fall back to advanced_fts=False (escapes all operators) 2. Simplify the query (e.g., "phrase" works, but "phrase with many words" might fail) 3. Use simpler operators first (phrases work most reliably; NEAR is pickier)

Semantic results feel irrelevant (vec mode)¶

Problem: search("vec:my query") returns off-topic results.

Fix: 1. Try search("lex:my query") first to see if keyword results are better 2. Switch to HYDE: search("hyde:my query") 3. Boost keyword in hybrid: weights="keyword=2.0,semantic=1.0" 4. Provide more context in the query (longer, more descriptive queries embed better)

Too many results (one language/format has too much content)¶

Problem: 1000+ chunks match, but you only see 10 results.

Fix: 1. Use filter_language to narrow by language 2. Use source_type to narrow by code vs docs 3. Use document formats to narrow by file type (e.g., formats="yaml" instead of all configs)

Keyword search is too strict¶

Problem: search("lex:function_name") finds nothing, but you know it exists.

Fix: 1. Try search("vec:function_name") for semantic matching 2. Use prefix: search("lex:function*", advanced_fts=True) 3. Provide context: search("lex:function_name authentication")

11. Weight Tuning Guidelines¶

The default weights keyword=1.5,semantic=1.0,graph=0.8 are tuned for general-purpose search. Here's how to think about tuning:

Goal	Suggested Weights	Reasoning
Find by exact name	`keyword=2.0,semantic=0.5`	Minimize noise, exact match only
Understand design	`keyword=1.0,semantic=2.0`	Maximize conceptual matching
Find critical code	`keyword=1.0,semantic=1.0,graph=2.0`	Structural importance wins
Balanced search	`keyword=1.5,semantic=1.0,graph=0.8`	Default (use this most of the time)

Don't over-tune: For most searches, defaults work fine. Reserve custom weights for specific, repeatable queries where you notice results are consistently off.

Summary¶

Advanced search in Tessera gives you control over:

Precision (FTS5 operators, negation)
Speed (keyword-only mode, short-circuit)
Signal balance (custom weights)
Scope (language, source type, format filters)
Presentation (output format, snippet context)

Start with defaults. When results aren't what you expect, use one advanced feature at a time. The benchmark data in this guide shows real performance numbers — use them to decide if a feature is worth the latency cost for your use case.

Tessera Advanced Search Guide¶

Quick Reference¶

1. FTS5 Advanced Operators¶

Phrase Matching¶

Negation¶

Prefix Matching¶

Proximity Search¶

Mixing FTS5 Operators¶

Safe Mode (Default)¶

2. Custom RRF Weights¶

Boost Keyword (Identifier-Heavy Searches)¶

Boost Semantic (Conceptual Searches)¶

Boost Graph (Dependency-Aware Searches)¶

Defaults: When NOT to Override¶

3. Search Mode Override¶

Keyword-Only (LEX)¶

Semantic-Only (VEC)¶

Hypothetical Document Embedding (HYDE)¶

4. Language Filtering¶

5. Source Type Filtering¶

Document Search (Convenience Wrapper)¶

6. Output Format Selection¶

JSON (Default)¶

Markdown¶

CSV¶

Files Only¶

7. Snippet Context Modes¶

Lines Mode (Default)¶

Full Mode¶

8. BM25 Short-Circuit (Informational)¶

9. Combining Advanced Features¶

Find exact function implementation, fast¶

Search for patterns in docs only¶

Boost semantic for architecture questions¶

Find "error" handling but not tests¶

Proximity search for related concepts¶

10. Troubleshooting Advanced Queries¶

Advanced FTS operator syntax errors¶

Semantic results feel irrelevant (vec mode)¶

Too many results (one language/format has too much content)¶

Keyword search is too strict¶

11. Weight Tuning Guidelines¶

Summary¶