Commit Graph

1656 Commits

Author SHA1 Message Date
Spinnich
9ba4e12fa8 Match IGDB regional-twin platforms in scans (#3462)
IGDB catalogues a console and its regional twin as two separate
platforms (SNES/Super Famicom, NES/Famicom). RomM locked each IGDB
search to a single platform id, so a region-exclusive title catalogued
under only the twin — e.g. the Japan-only Super Famicom game
"Rudra no Hihou" (platform 58) scanned from an `snes` folder
(platform 19) — was filtered out before name matching ran and never
matched.

Include a platform's regional twin in the IGDB platform filter so both
are searched. A non-twin platform keeps the exact existing query
(`platforms=[19]`); a twin produces an OR group
(`(platforms=[19] | platforms=[58])`), leaving all other platforms and
recorded cassettes unchanged.

Written primarily by Claude Code.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 13:40:42 +00:00
Georges-Antoine Assi
85789466d1 Merge pull request #3461 from Spinnich/fix/igdb-localized-name-match
fix(igdb): match ROMs by localized/alternative titles in scan
2026-05-31 13:05:58 -04:00
Spinnich
7e08a43e12 fix(igdb): match ROMs by localized/alternative titles in scan
IGDB scans dropped games whose filename uses a localized (non-English)
title even when that title exists in IGDB's alternative_names. The
alternative_name wildcard search surfaced the correct game, but
_search_rom() rebuilt its name->game candidate dict using only the
primary English name, so the Jaro-Winkler re-check scored the localized
term below threshold and discarded the match (issue #3435).

Add _index_games_by_searchable_name(), which indexes each game by its
primary name plus alternative_names and game_localizations titles, and
use it for both candidate-building passes in _search_rom(). Primary
names keep precedence (lowest-igdb-id tiebreak); alternative/
localization titles fill in only names not already claimed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-31 02:31:46 +00:00
Georges-Antoine Assi
5144e78767 Merge branch 'master' into fix/ra-hash-missing-for-archives 2026-05-30 20:24:29 -04:00
Spinnich
1d9963ac63 fix(hashing): compute RA hash for archive ROMs on cartridge platforms
The archive branch of get_rom_files (introduced in #3412) was missing
the RAHasherService.calculate_hash call that exists in the non-archive
branch, causing all archive-format ROMs to produce an empty ra_hash
during scanning regardless of platform.

The RA hash call is now made for archive ROMs, mirroring the existing
non-archive behaviour. The RA_BUFFER_HASH_UNSUPPORTED skip logic in
RAHasherService already handles disc-based platforms (PSX, PS2, PSP,
Saturn, Dreamcast, etc.) so those continue to be excluded automatically.

Also improves handling of folder-based multi-file ROMs whose directories
contain compressed files. RAHasher cannot process archives via the /*
glob and fails with "Could not open file". The fix mirrors the existing
CHD folder logic: for cartridge platforms the largest archive in the
folder is passed directly to RAHasher for buffer hashing; for disc
platforms the call is skipped as buffer hashing is unsupported.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-30 14:55:42 +00:00
Georges-Antoine Assi
77de623834 Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-05-30 04:34:15 -04:00
Georges-Antoine Assi
69e2373453 cleanup archive member 2026-05-29 20:54:32 -04:00
Spinnich
19d50e86b9 fix(screenscraper): use internal filename as romnom for single-file archives
When sending a hash lookup to ScreenScraper, romnom was always set to the
archive filename on disk (e.g. Mario.zip). For single-file archives, the hash
is computed from the internal file (e.g. mario.n64), so sending the archive
name sends slightly incorrect info to ss.fr during a KO scrape.

When archive_members has exactly one entry, romnom now uses that member's
name. Multi-file archives and non-archive files continue to use the filesystem
filename unchanged.

Closes #3444

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 20:34:40 +00:00
Georges-Antoine Assi
10d731d823 cleanup 2026-05-29 11:58:53 -04:00
Georges-Antoine Assi
ae60d14f81 Merge branch 'master' into feat/composite-hashing-archives 2026-05-29 11:50:17 -04:00
Georges-Antoine Assi
29f90c027f Merge pull request #3448 from tmgast/fix-save-sync-hash-and-archival
Fix save-sync hash drift, archival save leak, and dedupe scoping
2026-05-29 11:47:52 -04:00
copilot-swe-agent[bot]
54dc059e15 Fix 500 error when char_index contains None key from NULL ROM names
Co-authored-by: gantoine <3247106+gantoine@users.noreply.github.com>
2026-05-29 11:23:55 +00:00
nendo
db0f714b4f SaveSync: use pathlib joins for asset content-hash paths
FSAssetsHandler.compute_content_hash and _compute_zip_hash were
building full paths via f"{self.base_path}/{file_path}". self.base_path
is already a pathlib.Path (resolved by FSHandler.__init__), so the
f-string forced it to str, hard-coded the separator, and re-parsed --
fine on Linux but a footgun if a caller ever sneaks a leading slash or
the path needs Path semantics elsewhere.

Switch both spots to self.base_path / file_path, which is what every
other FSHandler subclass in this module already does (e.g.
FSRomsHandler, FSResourcesHandler, FSSyncHandler all join Path objects
directly).
2026-05-29 17:40:56 +09:00
nendo
41c91fdd5b SaveSync: push null-slot exclusion into the SQL query
Three sync callsites (endpoints/sync.py, sync_watcher.py, and both
branches of tasks/sync_push_pull_task.py) ran get_saves(...) and then
discarded archival null-slot rows in a Python list comprehension. On
libraries with many archival/web-UI uploads that's a strict waste:
those rows are pulled from MariaDB, hydrated into Save model instances,
and then immediately filtered out.

Add a slot_not_null bool kwarg to DBSavesHandler.get_saves and apply
the filter in the SQL query. Update all four callsites to use it and
drop the Python-side comprehension. Default stays False so unrelated
callers keep the current behavior.
2026-05-29 17:40:18 +09:00
nendo
5bb10dacd1 SaveSync: paginate recompute task scan by primary key
get_all_saves() materialized every Save row across all users into a
single .all() list. On instances with very large libraries that's a
real RAM ceiling and pins every row for the lifetime of the recompute
run.

Replace it with get_saves_after_id(after_id, limit) and have the
recompute task drive keyset pagination in PAGE_SIZE-row chunks. SQLAlchemy
streaming via .execution_options(yield_per=...) is incompatible with the
per-call session lifetime that @begin_session enforces (the session
exits before the consumer iterates), so keyset paging from the caller is
the cleanest fit.

Behavior is unchanged: same row coverage, same idempotency, same
counters. Memory usage drops from O(all saves) to O(PAGE_SIZE).
2026-05-29 17:38:49 +09:00
nendo
edb5d15420 Fix save-sync hash drift, archival save leak, and dedupe scoping
Cleanup pass on save-sync addressing three independent failure modes
that interact in production data: content_hash drift between client
and server, null-slot archival saves leaking into sync flows, and
content-hash dedupe collapsing legitimately-distinct slots.

Bug fixes
- compute_content_hash dispatched on zipfile.is_zipfile(relative_path),
  which silently returned False whenever the process's CWD wasn't
  ASSETS_BASE_PATH. Every zip save fell through to the raw-MD5 branch,
  persisting hashes that disagreed with clients computing the intended
  per-entry zip-hash. Resolve to a full path before the dispatch.
- _build_negotiate_plan, sync_push_pull_task, and sync_watcher all
  treated null-slot saves as sync-eligible. Null-slot saves represent
  web-UI / archival uploads; including them in negotiate plans matched
  them against device pushes by filename and overwrote archival data.
  Filter null-slot saves at all three call sites.
- get_save_by_content_hash matched on (rom_id, user_id, content_hash)
  only, so identical bytes uploaded to different slots collapsed into
  one record. Scope the lookup by slot when provided so clone-save-
  to-new-slot creates a distinct row per slot.
- get_save_by_filename matched on (rom_id, user_id, file_name) only.
  When two uploads to different slots happened in the same wall-clock
  second (the datetime tag is per-second), the second upload UPDATED
  the first record's slot instead of creating a distinct row. Scope
  the filename lookup by slot too.

One-shot recovery
- New recompute_save_content_hashes manual task walks every Save row,
  recomputes via the fixed dispatch, and updates rows whose values
  differ. Idempotent; safe to re-run.
- Backend startup runs a COUNT(content_hash IS NULL) query and, if
  any rows exist, enqueues the recompute task on the low-priority
  RQ queue. The API process moves on; the worker handles the
  recompute out-of-band. Subsequent restarts find zero NULL hashes
  and skip. Admins can also trigger the task manually.

Test infrastructure
- Added tests/_zipfile_shim.reload_zipfile() mirroring the pattern
  from utils/zip_cache.py for the same zipfile-inflate64 + CPython
  3.13.5 incompatibility. Test fixtures that build ZIPs call it
  immediately before opening the archive.
2026-05-29 17:00:01 +09:00
Georges-Antoine Assi
8f08769670 run fmt 2026-05-28 20:05:24 -04:00
copilot-swe-agent[bot]
d29ed39a6a Add miximage_v2 media type mapping to SS.fr mixrbv2
Co-authored-by: gantoine <3247106+gantoine@users.noreply.github.com>
2026-05-28 20:15:40 +00:00
Georges-Antoine Assi
207d0dc4c6 feat(hashing): persist per-member hashes on archive RomFile
Internal members of multi-file archives (zip/tar/7z/rar) are now hashed
individually (crc/md5/sha1) and stored in a new `archive_members` JSON
column on the archive's RomFile, alongside the existing composite hash
used for hash-database matching. Only the archive itself is surfaced as
a RomFile so full_path keeps pointing at a file that exists on disk,
which is the constraint that previously forced us to choose between
composite-only or broken downloads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 09:41:04 -04:00
Georges-Antoine Assi
9111f70d0a refactor(filesystem): merge archive_7zip.py into archives.py
Consolidate all archive readers (zip/tar/7z/rar) and 7z-internal helpers
into a single utils/archives.py module to keep the archive surface area
in one place.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 09:10:01 -04:00
Georges-Antoine Assi
a1194dc5e0 changes from bot review 2026-05-28 09:02:26 -04:00
Georges-Antoine Assi
a170649fe6 fix(hashing): emit single RomFile for multi-file archives
Per-internal-member RomFiles produced full_paths that didn't exist on
disk, breaking downloads and zip-building. Stream entries into the
composite hash only and emit one RomFile pointing at the archive itself.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 21:35:01 -04:00
Georges-Antoine Assi
0bfe369425 run fmt 2026-05-27 21:03:08 -04:00
Georges-Antoine Assi
30451d5651 fix(security): move SSRF defense into the HTTP client path
The previous validator did a preflight `socket.getaddrinfo` before each
httpx request. Two problems:

  * DNS rebinding / TOCTOU: httpx re-resolves at connect time, so a
    hostname can answer with a public IP for the validator and a
    private IP for the real request. The preflight check did not
    constrain the connection.
  * Event-loop blocking: `socket.getaddrinfo` is synchronous, and the
    media-download callers are async. Slow resolvers stalled
    unrelated requests.

Replace it with two layers, both wired automatically onto every httpx
client built by `utils.context`:

  1. A request event hook running `validate_url_for_http_request`
     (syntactic checks only: scheme, reserved hostnames, literal IPs,
     internal TLDs). No DNS, no call-site responsibility.
  2. `SSRFProtectedAsyncBackend` / `SSRFProtectedSyncBackend`, custom
     httpcore network backends that resolve the hostname inside
     `connect_tcp`, reject any address in a forbidden range, then
     connect to that *same* validated address. The async variant uses
     `loop.getaddrinfo` so it doesn't block the loop. httpcore calls
     `start_tls(server_hostname=<URL host>)` after `connect_tcp`, so
     TLS SNI and cert verification still use the original hostname
     even though the TCP layer connects by IP.

Drop the explicit `validate_url_for_http_request(...)` calls from
`resources_handler.py` — the event hook covers them. Consolidate the
URL validator and its tests under `utils/ssrf.py` /
`tests/utils/test_ssrf.py` so the SSRF surface lives in one module.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 17:58:14 -04:00
Georges-Antoine Assi
acff688f11 refactor(hashing): use _make_file_hash helper at remaining sites
Apply the helper to the three other per-file FileHash constructions
(folder-walk hash, empty-archive fallback, single-file hash). The
all-empty FileHash literals are left alone since the helper would be
strictly more obscure for that case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 09:12:11 -04:00
Georges-Antoine Assi
f255b5a7d9 feat(hashing): add RAR support to multi-file archive composite hashing
Add read_rar_archive_files via the existing 7zz binary (which natively
handles RAR3/RAR5 read), and collapse the per-extension reader dispatch
into an ARCHIVE_READERS dict so future formats are one entry away. Also
extract a small _make_file_hash helper to remove the repeated nested
ternaries in the inner loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 09:09:37 -04:00
Georges-Antoine Assi
438c03facc refactor(filesystem): extract archive/CHD helpers to utils/archives.py
Pull file/archive readers (zip/tar/gz/bz2/7z), CHD parsing, and the
shared libmagic MIME detector out of roms_handler.py into a new
utils/archives.py. Rename the previously underscore-prefixed
read_zip_archive_files / read_tar_archive_files to match the existing
read_7z_archive_files convention, and consolidate the duplicated
"with lock: detector.from_file()" pattern into a detect_mime_type helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 08:41:45 -04:00
Georges-Antoine Assi
84f9dd2e2d Merge pull request #3434 from rommapp/copilot/fix-region-specific-release-date
Use region-prioritized release dates from ScreenScraper
2026-05-26 21:14:21 -04:00
Georges-Antoine Assi
29ce936c7d Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-05-26 20:45:41 -04:00
copilot-swe-agent[bot]
511f5e4272 Revert IGDB handler and test changes
Co-authored-by: gantoine <3247106+gantoine@users.noreply.github.com>
2026-05-27 00:06:50 +00:00
Georges-Antoine Assi
f5b1d44313 changes from bot review 2026-05-26 19:52:16 -04:00
Georges-Antoine Assi
04241169d7 fix 2026-05-26 19:36:38 -04:00
Georges-Antoine Assi
09aecc81bf cleanup 2026-05-26 18:22:12 -04:00
Spinnich
3c2f421dbb fix(screenscraper): inject user credentials for cover, manual, and screenshot downloads
Standard media fields (url_cover, url_manual, url_screenshots) were downloaded
using the stored credential-less URLs, causing them to count against the anonymous
IP quota instead of the user's SS account. Apply add_ss_auth_to_url() at each
download call site in the scan and ROM update paths.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix(screenscraper): guard add_ss_auth_to_url against non-SS URLs

Only inject ssid/sspassword into screenscraper.fr URLs to prevent
leaking user credentials to third-party sources (IGDB, LaunchBox, etc.)
when url_cover/url_manual/url_screenshots originate from other providers.

Add tests for the non-SS no-op and empty-string edge cases.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

test(screenscraper): verify SS credentials injected for all media download paths

- TestAddSsAuthToUrl: add guards for non-SS URLs (IGDB, LaunchBox) and
  empty string inputs
- test_update_rom: verify ssid/sspassword appear in url_cover and
  url_manual args passed to get_cover/get_manual for screenscraper.fr
  URLs; verify IGDB URLs are NOT decorated with SS credentials
- TestScanCredentialInjection: verify the scan-path ternary pattern
  correctly applies add_ss_auth_to_url to cover and screenshot URLs,
  and that a None cover URL passes through without error

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

test(screenscraper): empirical audit — every SS request carries ssid/sspassword

Intercepts both HTTP clients at the transport/session level to verify
that every outgoing screenscraper.fr request is decorated with the user's
ssid and sspassword credentials:

  aiohttp (API calls via auth_middleware):
  - jeuInfos.php, jeuRecherche.php, ssinfraInfos.php, ssuserInfos.php

  httpx (media downloads via FSResourcesHandler):
  - get_cover          → url_cover
  - get_manual         → url_manual
  - get_rom_screenshots → url_screenshots (each URL)
  - store_media_file   → extra media (fanart, bezel, etc.)

Also verifies the domain guard: IGDB URLs passed through add_ss_auth_to_url
are NOT decorated with SS credentials.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-26 20:50:18 +00:00
copilot-swe-agent[bot]
536f6ac815 fix: use region-aware release dates for SS and IGDB metadata
Co-authored-by: gantoine <3247106+gantoine@users.noreply.github.com>
2026-05-26 13:21:39 +00:00
Georges-Antoine Assi
fb3cc1da87 perf 2026-05-25 12:00:14 -04:00
Georges-Antoine Assi
9e9a282286 fix(roms): dedupe and sort sibling IDs for stable API output
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 11:35:16 -04:00
Georges-Antoine Assi
5e8fac36a7 Merge branch 'master' into claude/awesome-gates-NUpke 2026-05-25 11:10:04 -04:00
Georges-Antoine Assi
b9b82c751b more cleanup 2026-05-25 11:07:03 -04:00
Georges-Antoine Assi
9a87348c22 cleanup 2026-05-25 11:02:35 -04:00
Claude
95b1a99f2a perf(roms): avoid hydrating full Rom rows for siblings on list endpoint
The paginated ROM list eager-loaded sibling_roms via selectinload, which
hydrated full Rom ORM instances (including heavy JSON metadata columns)
for every sibling even though only an existence/count check was needed
on the frontend. On large collections this dominated request latency.

Split sibling handling by response shape:
- SimpleRomSchema (list): siblings is now list[int]; populated per page
  by a single SELECT against the sibling_roms view projecting only
  (rom_id, sibling_rom_id) — no Rom row hydration.
- DetailedRomSchema (detail): keeps full SiblingRomSchema objects, with
  load_only on (id, name, fs_name_no_tags, fs_name_no_ext) so sibling
  rows stop dragging in JSON metadata.

Frontend usage already only consumes siblings.length on list views; the
detail-page VersionSwitcher continues to receive the richer schema.
2026-05-24 23:17:34 +00:00
Georges-Antoine Assi
9b97e32f54 Merge pull request #3425 from rommapp/claude/ecstatic-dirac-dFQQO
Denormalize ROM file stats for efficient gallery rendering
2026-05-24 19:04:12 -04:00
Georges-Antoine Assi
1a560d3660 cleanup 2026-05-24 18:57:38 -04:00
Georges-Antoine Assi
fb13f54f48 cleanup 2026-05-24 17:43:01 -04:00
Claude
8fcc16bad2 refactor(roms): replace denormalized columns with deferred column_property
Drop the migration and the multi_file / top_level_file_count columns on
roms; express both as deferred column_property correlated subqueries
against rom_files instead. The gallery list and detail queries opt in
via undefer, so they get the values computed in the same SELECT via
indexed subqueries (rom_id index already in place); other code paths
that don't read the flags pay nothing.

This keeps the gallery perf win (no rom_files load for cards) without
introducing schema state that has to stay in sync with rom_files at
write time.
2026-05-24 20:41:44 +00:00
Georges-Antoine Assi
63644d0c6f Merge pull request #3426 from rommapp/claude/loving-darwin-pveIr
Defer optional handler initialization with lazy factories
2026-05-24 16:18:24 -04:00
Georges-Antoine Assi
0eec8b0e47 Merge pull request #3424 from rommapp/copilot/fix-csrf-token-issue
Refresh CSRF cookie on OIDC session authentication changes
2026-05-24 15:57:30 -04:00
Georges-Antoine Assi
be476cb7dc Only set CSRF cookie on http.response.start
ASGI spec only allows headers on the http.response.start message;
appending Set-Cookie to body messages is out-of-spec and may break on
some servers. Early-return for non-start messages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-24 15:46:50 -04:00
Georges-Antoine Assi
8af556ee46 run fmt 2026-05-24 14:07:45 -04:00
Georges-Antoine Assi
acc1e630b7 Apply suggestions from code review
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-05-24 14:05:58 -04:00