Skip to content

Latest commit

 

History

History
357 lines (275 loc) · 11.2 KB

File metadata and controls

357 lines (275 loc) · 11.2 KB

Python API Reference

Mainline Surface

The mainline local API is Index(..., engine="shard").

engine="auto" is still the constructor default for compatibility, but if you want the shard production path described in the docs, pass engine="shard" explicitly.

The optional Latence graph lane is not exposed through Index.search() on local shard collections. The programmatic graph-aware surface today is SearchPipeline, and the HTTP surface is the reference API.

colsearch.Index

Primary local interface for creating, mutating, and querying indexes.

Constructor

Index(
    path: str,
    dim: int,
    *,
    engine: str = "auto",
    mode: str | None = None,
    embedding_fn: Any | None = None,
    n_fine: int = 256,
    n_coarse: int = 32,
    max_degree: int = 32,
    ef_construction: int = 200,
    n_probes: int = 4,
    enable_wal: bool = True,
    **kwargs,
)

Shard kwargs

These keyword arguments are passed through when engine="shard":

Argument Meaning
n_shards Number of sealed shards
k_candidates LEMUR candidate budget before exact scoring
compression Stored representation. Default: rroq158 (Riemannian 1.58-bit, K=8192, group_size=128 SOTA, GPU+CPU). Other options: fp16, int8, roq4, rroq4_riem (Riemannian 4-bit asymmetric — the safe-fallback lane for zero-regression workloads). The rroq158 and rroq4_riem codecs need n_tokens >= K to train the codebook — they auto-shrink K to the largest valid power-of-two when the corpus is too small, and silently downgrade to fp16 when even that does not fit. See docs/guides/quantization-tuning.md.
rroq158_k rroq158 spherical k-means centroid count. Default: 8192. Must be a power of two and ≥ effective rroq158_group_size
rroq158_seed rroq158 FWHT rotator + k-means initialisation seed. Default: 42
rroq158_group_size rroq158 ternary group size. Default: 128 (SOTA — one scale per token at dim=128, ~13% smaller storage and ~10–30% faster CPU p95 vs the previous 32 default; NDCG@10 within ±0.005 on Pareto-clean BEIR datasets). Must be a positive multiple of 32 (so the ternary planes pack into int32 words). For dims that aren't a multiple of the requested value (dim=64 / 96 / 160) the encoder transparently steps down through {128, 64, 32} and logs a warning — so dim=64 / 96 / 160 corpora still build cleanly. Pin to 64 for the safest cross-corpus choice (e.g. arguana-class corpora). See docs/guides/quantization-tuning.md for the per-dim recipe and override guidance.
rroq4_riem_k rroq4_riem spherical k-means centroid count. Default: 8192. Must be a power of two and ≥ rroq4_riem_group_size
rroq4_riem_seed rroq4_riem FWHT rotator + k-means initialisation seed. Default: 42
rroq4_riem_group_size rroq4_riem 4-bit asymmetric residual group size. Default: 32. Must be a positive even integer that divides dim
quantization_mode Active scoring mode: none, int8, fp8, roq4, rroq158, rroq4_riem
transfer_mode CPU->GPU transfer path: pageable, pinned, double_buffered
router_device Device for the LEMUR router, usually cpu or cuda
lemur_epochs Router training epochs
lemur_search_k_cap Router search cap
max_docs_exact Exact-stage doc budget
n_full_scores Proxy shortlist size before full scoring
pinned_pool_buffers Pinned-memory buffer pool size
pinned_buffer_max_tokens Max tokens per pinned transfer buffer
gpu_corpus_rerank_topn GPU-corpus rerank frontier
n_centroid_approx Optional centroid-approx candidate stage
variable_length_strategy Variable-length exact scheduling mode
uniform_shard_tokens Optional shard packing knob
seed Random seed
device Scoring device for the manager, typically cpu or cuda

Core methods

Method Signature Notes
add (vectors, *, ids=None, payloads=None) Add multivector documents
add_batch (vectors, *, ids=None, payloads=None) Alias for add()
add_texts (texts, *, ids=None, payloads=None) Uses embedding_fn
upsert (vectors, *, ids, payloads=None) Insert or replace by ID
search (query, k=10, *, ef=100, n_probes=4, filters=None, explain=False) Main query path
search_text (text, k=10, *, ef=100, filters=None, explain=False) Uses embedding_fn
search_batch (queries, k=10, *, ef=100, n_probes=4, filters=None) Batch query path
delete (ids) Tombstone documents by ID
update_payload (doc_id, payload) Payload-only update
get (ids) Retrieve stored payloads
scroll (limit=100, offset=0, *, filters=None) Pagination
stats () Returns IndexStats
snapshot (output_path) Tarball snapshot
flush () Force pending writes to disk
close () Release resources
set_metrics_hook (hook) Metrics callback

Properties

Property Type
path Path
dim int
engine str

Example

from colsearch import Index

idx = Index(
    "my-index",
    dim=128,
    engine="shard",
    n_shards=64,
    k_candidates=512,
    # compression defaults to "rroq158" (K=8192, group_size=128 SOTA).
    # Override with "fp16" / "int8" / "roq4" if required.
    quantization_mode="fp8",
)

colsearch.IndexBuilder

Fluent builder for the same surface.

from colsearch import IndexBuilder

idx = (
    IndexBuilder("my-index", dim=128)
    .with_shard(
        n_shards=64,
        k_candidates=512,
        # compression defaults to "rroq158" (K=8192, group_size=128 SOTA).
        # Override here if you need the legacy "fp16" lane.
        quantization_mode="fp8",
        transfer_mode="pinned",
    )
    .with_wal(enabled=True)
    .build()
)
Method Meaning
with_shard(**kwargs) Select shard engine; recommended path
with_wal(enabled=True) Enable WAL-backed mutation safety
with_quantization(n_fine=256, n_coarse=32) Codebook config helper
with_gpu_rerank(device="cuda") Legacy compatibility helper
with_roq(bits=4, device="cuda") Legacy compatibility helper
with_gem(**kwargs) Compatibility backend, not the documented mainline
with_hnsw(**kwargs) Compatibility backend, not the documented mainline
build() Returns Index

Transport Helpers

Use these helpers for the preferred HTTP wire format:

from colsearch import VectorPayload, decode_payload, encode_roq_payload, encode_vector_payload
Helper Meaning
encode_vector_payload(vectors, dtype="float32") Encode float vectors to JSON-ready base64
encode_roq_payload(vectors, num_bits=4, seed=42) Encode ROQ payloads
decode_payload(payload) Decode a transport payload back to numpy.ndarray
VectorPayload Public transport payload type

Data Classes

colsearch.SearchResult

@dataclass
class SearchResult:
    doc_id: int
    score: float
    payload: Optional[Dict[str, Any]] = None
    token_scores: Optional[List[float]] = None
    matched_tokens: Optional[List[int]] = None

colsearch.ScrollPage

@dataclass
class ScrollPage:
    results: List[SearchResult]
    next_offset: Optional[int] = None

colsearch.IndexStats

@dataclass
class IndexStats:
    total_documents: int = 0
    sealed_segments: int = 0
    active_documents: int = 0
    dim: int = 0
    engine: str = ""

Search And Config Exports

colsearch.BM25Config

@dataclass
class BM25Config:
    k1: float = 1.5
    b: float = 0.75
    epsilon: float = 0.25

colsearch.FusionConfig

@dataclass
class FusionConfig:
    strategy: str = "rrf"
    weights: Optional[Dict[str, float]] = None
    normalization: str = "minmax"
    top_k: int = 10
    min_score: float = 0.0

colsearch.IndexConfig

Higher-level configuration surface used by package helpers.

colsearch.Neo4jConfig

@dataclass
class Neo4jConfig:
    uri: str = "bolt://localhost:7687"
    username: str = "neo4j"
    password: str = ""
    database: str = "neo4j"
    max_hop_distance: int = 2
    relationship_types: Optional[List[str]] = None

Neo4jConfig is a legacy graph-adjacent config surface. It is not the shipped Latence graph sidecar product lane documented elsewhere in this repo.

colsearch.SearchPipeline

Programmatic dense + BM25 retrieval surface.

This is the main local Python entry point for graph-aware retrieval today. It accepts:

  • query_payload for ontology hints, workflow hints, or graph policy cues
  • graph_mode with off, auto, or force
  • graph_options such as local_budget, community_budget, evidence_budget, max_hops, and explain

Example:

import numpy as np

from colsearch import SearchPipeline

pipeline = SearchPipeline("graph-demo", dim=128, use_roq=False, on_disk=False)
query = np.random.default_rng(7).normal(size=(128,)).astype("float32")

response = pipeline.search(
    query,
    top_k_retrieval=16,
    query_text="service c lineage policy",
    query_payload={
        "ontology_terms": ["Service C", "Export Control"],
        "workflow_type": "compliance",
    },
    graph_mode="auto",
    graph_options={
        "local_budget": 4,
        "community_budget": 4,
        "evidence_budget": 8,
        "max_hops": 2,
        "explain": True,
    },
)

Behavioral notes:

  • SearchPipeline is where dense + BM25 + optional graph retrieval comes together in-process
  • shard HTTP search remains vector-only, so use query_payload rather than query_text to steer graph policy there
  • graph candidates are merged additively after first-stage retrieval

colsearch.ColbertIndex

Higher-level late-interaction text helper exported by the package.

colsearch.ColPaliEngine

Multimodal retrieval engine for ColPali-family embeddings.

colsearch.MultiModalEngine

Combined multimodal retrieval surface.

Multimodal And Preprocessing Exports

colsearch.MultimodalModelSpec

@dataclass(frozen=True)
class MultimodalModelSpec:
    plugin_name: str
    model_id: str
    architecture: str
    embedding_style: str
    modalities: tuple[str, ...]
    pooling_task: str
    serve_command: str

colsearch.VllmPoolingProvider

Shared vLLM-compatible embedding provider for multimodal flows.

colsearch.enumerate_renderable_documents

Discovers supported source documents under a directory tree.

colsearch.render_documents

Renders those documents into page-level assets for embedding and indexing.

Triton Kernel Exports

All GPU kernels require the gpu extra and are optional.

colsearch.fast_colbert_scores

Exact MaxSim late-interaction scoring.

colsearch.roq_maxsim_1bit

colsearch.roq_maxsim_2bit

colsearch.roq_maxsim_4bit

colsearch.roq_maxsim_8bit

ROQ scoring kernels exported through the public package.

colsearch.TRITON_AVAILABLE

Boolean flag indicating whether Triton is available.

Server Export

The public server module is:

from colsearch.server import app, create_app, main

Use colsearch-server for the packaged CLI entry point.