Commit Graph

19 Commits

Author SHA1 Message Date
nendo
db0f714b4f SaveSync: use pathlib joins for asset content-hash paths
FSAssetsHandler.compute_content_hash and _compute_zip_hash were
building full paths via f"{self.base_path}/{file_path}". self.base_path
is already a pathlib.Path (resolved by FSHandler.__init__), so the
f-string forced it to str, hard-coded the separator, and re-parsed --
fine on Linux but a footgun if a caller ever sneaks a leading slash or
the path needs Path semantics elsewhere.

Switch both spots to self.base_path / file_path, which is what every
other FSHandler subclass in this module already does (e.g.
FSRomsHandler, FSResourcesHandler, FSSyncHandler all join Path objects
directly).
2026-05-29 17:40:56 +09:00
nendo
edb5d15420 Fix save-sync hash drift, archival save leak, and dedupe scoping
Cleanup pass on save-sync addressing three independent failure modes
that interact in production data: content_hash drift between client
and server, null-slot archival saves leaking into sync flows, and
content-hash dedupe collapsing legitimately-distinct slots.

Bug fixes
- compute_content_hash dispatched on zipfile.is_zipfile(relative_path),
  which silently returned False whenever the process's CWD wasn't
  ASSETS_BASE_PATH. Every zip save fell through to the raw-MD5 branch,
  persisting hashes that disagreed with clients computing the intended
  per-entry zip-hash. Resolve to a full path before the dispatch.
- _build_negotiate_plan, sync_push_pull_task, and sync_watcher all
  treated null-slot saves as sync-eligible. Null-slot saves represent
  web-UI / archival uploads; including them in negotiate plans matched
  them against device pushes by filename and overwrote archival data.
  Filter null-slot saves at all three call sites.
- get_save_by_content_hash matched on (rom_id, user_id, content_hash)
  only, so identical bytes uploaded to different slots collapsed into
  one record. Scope the lookup by slot when provided so clone-save-
  to-new-slot creates a distinct row per slot.
- get_save_by_filename matched on (rom_id, user_id, file_name) only.
  When two uploads to different slots happened in the same wall-clock
  second (the datetime tag is per-second), the second upload UPDATED
  the first record's slot instead of creating a distinct row. Scope
  the filename lookup by slot too.

One-shot recovery
- New recompute_save_content_hashes manual task walks every Save row,
  recomputes via the fixed dispatch, and updates rows whose values
  differ. Idempotent; safe to re-run.
- Backend startup runs a COUNT(content_hash IS NULL) query and, if
  any rows exist, enqueues the recompute task on the low-priority
  RQ queue. The API process moves on; the worker handles the
  recompute out-of-band. Subsequent restarts find zero NULL hashes
  and skip. Admins can also trigger the task manually.

Test infrastructure
- Added tests/_zipfile_shim.reload_zipfile() mirroring the pattern
  from utils/zip_cache.py for the same zipfile-inflate64 + CPython
  3.13.5 incompatibility. Test fixtures that build ZIPs call it
  immediately before opening the archive.
2026-05-29 17:00:01 +09:00
Georges-Antoine Assi
90945685e4 Stuff 2026-05-17 12:43:33 -04:00
Georges-Antoine Assi
e3aaa106a2 perf(backend): reuse libmagic instance for image upload validation
magic.Magic(mime=True) loads the magic database from disk on construction;
instantiating it per request was adding pointless overhead to every avatar
and artwork upload. Share a module-level instance guarded by a lock (the
underlying magic_t handle is not thread-safe), and surface MagicException
as a 400 so a sniffing failure fails closed instead of bubbling a 500.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 10:14:38 -04:00
Georges-Antoine Assi
53f14f5710 fix(backend): validate uploaded images with libmagic before storing
Avatar, ROM artwork, and collection artwork uploads now sniff the file
header with libmagic and reject anything that isn't PNG/JPEG/WebP/GIF,
saving the file with an extension derived from the detected MIME rather
than the user-supplied filename. Pairs with the raw asset endpoint,
which decides inline vs attachment from the on-disk extension.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 09:18:02 -04:00
Georges-Antoine Assi
6db9d45928 actually fix 2026-04-07 22:53:44 -04:00
Georges-Antoine Assi
6c88e098ba [ROMM-3232] Fix content_hash not updated 2026-04-07 21:48:51 -04:00
Georges-Antoine Assi
77823c168d [AIKIDO-13126604] Stream file when building file hash 2026-02-16 13:51:20 -05:00
nendo
bf8cb92e93 refactor(assets): move content hash functions to assets_handler
Move compute_file_hash, compute_zip_hash, and compute_content_hash from
scan_handler.py to filesystem/assets_handler.py as standalone module-level
functions. This follows the existing pattern for utility functions in
filesystem handlers.
2026-02-03 20:07:18 +09:00
Georges-Antoine Assi
aaf6741e93 Create safe filesystem handler 2025-07-17 12:30:57 -04:00
zurdi
44bfcaa23c fix: fixed assets path for screenshots 2025-05-14 16:28:46 +00:00
zurdi
5f94440c9d Merge branch 'master' into refactor/saves-states-path 2025-05-14 13:35:28 +00:00
zurdi
6f08912fc0 refactor: remove unnecessary logging highlights and improve log messages for clarity 2025-05-09 10:25:11 +00:00
zurdi
14761c2c83 refactor: enhance logging with highlighted output for improved readability 2025-05-09 09:05:59 +00:00
zurdi
83cb0674c5 fix: include rom_id in file path building for saves and states 2025-04-25 16:54:36 +00:00
Georges-Antoine Assi
7bde4aee70 complete the rst of the files 2024-12-20 23:45:25 -05:00
Georges-Antoine Assi
1840390c8a finish mypy fixes 2024-05-21 21:28:17 -04:00
Georges-Antoine Assi
a7cf0d389a run trunk format on all files 2024-05-21 10:18:13 -04:00
Georges-Antoine Assi
dc33054ba1 more name refactoring 2024-05-05 16:45:58 -04:00