SaveSync: paginate recompute task scan by primary key

get_all_saves() materialized every Save row across all users into a
single .all() list. On instances with very large libraries that's a
real RAM ceiling and pins every row for the lifetime of the recompute
run.

Replace it with get_saves_after_id(after_id, limit) and have the
recompute task drive keyset pagination in PAGE_SIZE-row chunks. SQLAlchemy
streaming via .execution_options(yield_per=...) is incompatible with the
per-call session lifetime that @begin_session enforces (the session
exits before the consumer iterates), so keyset paging from the caller is
the cleanest fit.

Behavior is unchanged: same row coverage, same idempotency, same
counters. Memory usage drops from O(all saves) to O(PAGE_SIZE).
This commit is contained in:
nendo
2026-05-29 17:38:49 +09:00
parent ec50f75d77
commit 5bb10dacd1
3 changed files with 123 additions and 40 deletions

View File

@@ -196,10 +196,18 @@ class DBSavesHandler(DBBaseHandler):
)
@begin_session
def get_all_saves(
def get_saves_after_id(
self,
after_id: int,
limit: int,
session: Session = None, # type: ignore
) -> Sequence[Save]:
"""Every Save row across all users, ordered by id. Used by the
recompute_save_content_hashes maintenance task."""
return session.scalars(select(Save).order_by(asc(Save.id))).all()
"""Page Save rows by primary key. Returns up to ``limit`` rows with
``id > after_id``, ordered by id. Used by the
recompute_save_content_hashes maintenance task to walk every row in
bounded-memory batches: streaming via ``yield_per`` is incompatible
with the per-call session lifetime that ``@begin_session`` enforces,
so the caller drives pagination with this method instead."""
return session.scalars(
select(Save).where(Save.id > after_id).order_by(asc(Save.id)).limit(limit)
).all()