Skip to content

feat: Soft Delete documents#16

Draft
Darkheir wants to merge 10 commits intosekoiafrom
feat/soft_delete_documents
Draft

feat: Soft Delete documents#16
Darkheir wants to merge 10 commits intosekoiafrom
feat/soft_delete_documents

Conversation

@Darkheir
Copy link
Copy Markdown
Collaborator

@Darkheir Darkheir commented Mar 16, 2026

Description

This PR allow to soft delete documents from an index

Tasks

  • Update quickwit-proto with new messages
  • Update metastore to store soft deleted documents
  • Update quickwit search to ignore soft deleted documents
  • Add new REST endpoint to soft delete documents
  • Add Elasticsearch compatible endpoint to delete documents
  • Ignore deleted documents on merge of splits
  • Enforce limit of soft deleted events for a given split

@Darkheir Darkheir force-pushed the feat/soft_delete_documents branch from b6c0dec to bb179a6 Compare March 17, 2026 08:20
Signed-off-by: Darkheir <raphael.cohen@sekoia.io>
@Darkheir Darkheir force-pushed the feat/soft_delete_documents branch from bb179a6 to 3c56d41 Compare March 17, 2026 08:39
@Darkheir Darkheir force-pushed the feat/soft_delete_documents branch from 89355de to 0734ed6 Compare March 18, 2026 09:49
@Darkheir Darkheir requested a review from rdettai-sk March 18, 2026 11:25
Signed-off-by: Darkheir <raphael.cohen@sekoia.io>
Signed-off-by: Darkheir <raphael.cohen@sekoia.io>
@Darkheir Darkheir force-pushed the feat/soft_delete_documents branch from c86fbfb to 817c329 Compare March 19, 2026 14:36
Signed-off-by: Darkheir <raphael.cohen@sekoia.io>
@Darkheir Darkheir requested a review from rdettai-sk March 19, 2026 17:00
Copy link
Copy Markdown
Collaborator

@rdettai-sk rdettai-sk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% through with the review (I only glanced over the search part). I think an integration test would be really nice.

@Darkheir Darkheir force-pushed the feat/soft_delete_documents branch 6 times, most recently from eb8afbf to c2300ad Compare March 26, 2026 13:19
Signed-off-by: Darkheir <raphael.cohen@sekoia.io>
@Darkheir Darkheir force-pushed the feat/soft_delete_documents branch from c2300ad to a2b75d8 Compare March 26, 2026 13:32
Signed-off-by: Darkheir <raphael.cohen@sekoia.io>
@Darkheir Darkheir requested a review from rdettai-sk March 26, 2026 14:34
Copy link
Copy Markdown
Collaborator

@rdettai-sk rdettai-sk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My biggest remaining concern is regarding the list splits endpoint. It is actually very slow (couple of seconds) on big indexes.

metastore: &MetastoreServiceClient,
progress: &Progress,
) {
let list_splits_request = match ListSplitsRequest::try_from_index_uid(index_uid.clone()) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can gather a lot of splits from the metastore and needs to be called at every merge upload. We should try to cap the overhead. We can easily:

  • filter on the current node
  • filter on published splits only

I think we can also filter on immature splits. Not 100% sure how that works, but it would be the most efficient.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course if we bring it to the metastore we can more easily get only the splits we need 😄

Comment on lines +386 to +389
replaced_splits.push(ReplacedSplit {
split_id: metadata.split_id().to_string(),
soft_deleted_doc_ids: metadata.soft_deleted_doc_ids.clone(),
});
Copy link
Copy Markdown
Collaborator

@rdettai-sk rdettai-sk Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are telling this poor split to replace itself

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, 4 hours lost one 1 wrong line, not bad 😅

Signed-off-by: Darkheir <raphael.cohen@sekoia.io>
@Darkheir Darkheir force-pushed the feat/soft_delete_documents branch from a24023f to d945645 Compare March 27, 2026 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants