Problem
When embedding large datasets (thousands or millions of texts), the current embed() method accumulates all results in memory before returning. This causes:
- Out-of-memory errors for very large datasets
- Memory pressure when processing many texts sequentially
- No way to process results incrementally (e.g., save to database as embeddings arrive)
For enterprise workloads processing large document corpora, this is a significant limitation.
Proposed Solution
A new embed_stream() method that:
- Processes texts in configurable batches
- Yields embeddings one at a time via an iterator
- Keeps memory usage proportional to
batch_size rather than total dataset size
- Works with both v1 and v2 clients
Usage Example
import cohere
client = cohere.Client()
# Process large dataset incrementally
for embedding in client.embed_stream(
texts=large_text_list, # Can be thousands of texts
model="embed-english-v3.0",
input_type="classification",
batch_size=20
):
save_to_database(embedding.index, embedding.embedding)
# Only batch_size worth of embeddings in memory at a time
Memory Impact
| Dataset Size |
Current embed() |
Proposed embed_stream() |
| 1,000 texts |
~4 MB |
~20 KB |
| 100,000 texts |
~400 MB |
~20 KB |
| 1,000,000 texts |
~4 GB+ (OOM) |
~20 KB |
Context
We are using the Cohere Python SDK at Oracle for processing large embedding workloads. We have a working implementation in PR #698 that has been tested with the real Cohere API, passes all unit tests, and is backward compatible (no changes to existing embed()).
Additional Details
- No breaking changes to existing APIs
- Optional dependency on
ijson for more efficient incremental parsing (works without it)
- Supports both
embeddings_floats and embeddings_by_type response formats