SaveSync: paginate recompute task scan by primary key

get_all_saves() materialized every Save row across all users into a single .all() list. On instances with very large libraries that's a real RAM ceiling and pins every row for the lifetime of the recompute run. Replace it with get_saves_after_id(after_id, limit) and have the recompute task drive keyset pagination in PAGE_SIZE-row chunks. SQLAlchemy streaming via .execution_options(yield_per=...) is incompatible with the per-call session lifetime that @begin_session enforces (the session exits before the consumer iterates), so keyset paging from the caller is the cleanest fit. Behavior is unchanged: same row coverage, same idempotency, same counters. Memory usage drops from O(all saves) to O(PAGE_SIZE).
2026-06-28 06:46:00 +00:00 · 2026-05-29 17:38:49 +09:00
parent ec50f75d77
commit 5bb10dacd1
3 changed files with 123 additions and 40 deletions
--- a/backend/handler/database/saves_handler.py
+++ b/backend/handler/database/saves_handler.py
@@ -196,10 +196,18 @@ class DBSavesHandler(DBBaseHandler):
        )

    @begin_session
-    def get_all_saves(
+    def get_saves_after_id(
        self,
+        after_id: int,
+        limit: int,
        session: Session = None,  # type: ignore
    ) -> Sequence[Save]:
-        """Every Save row across all users, ordered by id. Used by the
-        recompute_save_content_hashes maintenance task."""
-        return session.scalars(select(Save).order_by(asc(Save.id))).all()
+        """Page Save rows by primary key. Returns up to ``limit`` rows with
+        ``id > after_id``, ordered by id. Used by the
+        recompute_save_content_hashes maintenance task to walk every row in
+        bounded-memory batches: streaming via ``yield_per`` is incompatible
+        with the per-call session lifetime that ``@begin_session`` enforces,
+        so the caller drives pagination with this method instead."""
+        return session.scalars(
+            select(Save).where(Save.id > after_id).order_by(asc(Save.id)).limit(limit)
+        ).all()