Commit Graph

454 Commits

Author SHA1 Message Date
Georges-Antoine Assi
ab9b7bd775 changes from self review 2026-06-07 08:29:49 -04:00
Claude
cd41422660 Use a concurrency limiter for ScreenScraper, honoring account threads
ScreenScraper enforces a per-account *thread* (concurrency) cap rather than
a request rate. Requests can take several seconds, so spacing out request
starts at 1/s could still leave multiple requests overlapping in flight,
exceeding the cap and getting rejected with ScreenScraper's custom HTTP codes.

- Add ConcurrencyLimiter: a runtime-resizable, loop-agnostic limiter that
  bounds simultaneous in-flight operations (held for the whole request via
  async context manager), instead of spacing request starts.
- Switch the ScreenScraper service from the req/s RateLimiter to a module-level
  ConcurrencyLimiter defaulting to a single thread.
- Recognize contributor/donor accounts: parse `ssuser.maxthreads` from each
  response and raise the concurrency allowance to match, so supporters scrape
  with their full thread count instead of the conservative default.

Adds unit tests for the limiter (blocking, wake-on-release, runtime resize)
and for the ScreenScraper slot-holding and thread-allowance updates.

https://claude.ai/code/session_01133QQuWvq8Zm25DZMP9PVr
2026-06-06 16:02:13 +00:00
DevYukine
8006e4391d feat(playmatch): pre-emptive 4 req/s rate limiting with best-effort lookups 2026-06-06 04:16:21 +02:00
Georges-Antoine Assi
064a65698c add non-global IPs as forbidden 2026-06-03 16:21:27 -04:00
Georges-Antoine Assi
ad576909d3 changes from bot review 2026-05-30 20:26:29 -04:00
Georges-Antoine Assi
ae60d14f81 Merge branch 'master' into feat/composite-hashing-archives 2026-05-29 11:50:17 -04:00
copilot-swe-agent[bot]
d29ed39a6a Add miximage_v2 media type mapping to SS.fr mixrbv2
Co-authored-by: gantoine <3247106+gantoine@users.noreply.github.com>
2026-05-28 20:15:40 +00:00
Georges-Antoine Assi
207d0dc4c6 feat(hashing): persist per-member hashes on archive RomFile
Internal members of multi-file archives (zip/tar/7z/rar) are now hashed
individually (crc/md5/sha1) and stored in a new `archive_members` JSON
column on the archive's RomFile, alongside the existing composite hash
used for hash-database matching. Only the archive itself is surfaced as
a RomFile so full_path keeps pointing at a file that exists on disk,
which is the constraint that previously forced us to choose between
composite-only or broken downloads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 09:41:04 -04:00
Georges-Antoine Assi
9111f70d0a refactor(filesystem): merge archive_7zip.py into archives.py
Consolidate all archive readers (zip/tar/7z/rar) and 7z-internal helpers
into a single utils/archives.py module to keep the archive surface area
in one place.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 09:10:01 -04:00
Georges-Antoine Assi
a1194dc5e0 changes from bot review 2026-05-28 09:02:26 -04:00
Georges-Antoine Assi
5f63668996 cleanup 2026-05-27 20:58:33 -04:00
Georges-Antoine Assi
c3adbd3f71 fix(ssrf): bound DNS lookup by caller timeout; clear lint findings
The async backend's `loop.getaddrinfo` ran without any timeout, so a
slow or hanging resolver could outlive the timeout the caller passed —
the previous code only bounded the TCP connect inside the inner
backend. Wrap the resolution in `asyncio.timeout(timeout)` and surface
the timeout as `httpcore.ConnectTimeout`.

Also tidy the test stubs (mypy func-returns-value) and add explicit
type annotations to the `calls` lists (mypy var-annotated). A targeted
`# noqa: ASYNC109` sits on the `timeout` parameter of `connect_tcp` /
`connect_unix_socket` with an explanatory comment: the rule advises
against `timeout` parameters on async APIs we author, but here we're
implementing `AsyncNetworkBackend`, and the timeout is consumed in the
asyncio-native pattern the rule endorses.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 18:31:42 -04:00
Georges-Antoine Assi
30451d5651 fix(security): move SSRF defense into the HTTP client path
The previous validator did a preflight `socket.getaddrinfo` before each
httpx request. Two problems:

  * DNS rebinding / TOCTOU: httpx re-resolves at connect time, so a
    hostname can answer with a public IP for the validator and a
    private IP for the real request. The preflight check did not
    constrain the connection.
  * Event-loop blocking: `socket.getaddrinfo` is synchronous, and the
    media-download callers are async. Slow resolvers stalled
    unrelated requests.

Replace it with two layers, both wired automatically onto every httpx
client built by `utils.context`:

  1. A request event hook running `validate_url_for_http_request`
     (syntactic checks only: scheme, reserved hostnames, literal IPs,
     internal TLDs). No DNS, no call-site responsibility.
  2. `SSRFProtectedAsyncBackend` / `SSRFProtectedSyncBackend`, custom
     httpcore network backends that resolve the hostname inside
     `connect_tcp`, reject any address in a forbidden range, then
     connect to that *same* validated address. The async variant uses
     `loop.getaddrinfo` so it doesn't block the loop. httpcore calls
     `start_tls(server_hostname=<URL host>)` after `connect_tcp`, so
     TLS SNI and cert verification still use the original hostname
     even though the TCP layer connects by IP.

Drop the explicit `validate_url_for_http_request(...)` calls from
`resources_handler.py` — the event hook covers them. Consolidate the
URL validator and its tests under `utils/ssrf.py` /
`tests/utils/test_ssrf.py` so the SSRF surface lives in one module.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 17:58:14 -04:00
Georges-Antoine Assi
2b0ed2296b cleanup 2026-05-27 09:29:21 -04:00
Georges-Antoine Assi
3ae7f998b9 run fmt 2026-05-27 09:20:23 -04:00
Georges-Antoine Assi
f255b5a7d9 feat(hashing): add RAR support to multi-file archive composite hashing
Add read_rar_archive_files via the existing 7zz binary (which natively
handles RAR3/RAR5 read), and collapse the per-extension reader dispatch
into an ARCHIVE_READERS dict so future formats are one entry away. Also
extract a small _make_file_hash helper to remove the repeated nested
ternaries in the inner loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 09:09:37 -04:00
Georges-Antoine Assi
438c03facc refactor(filesystem): extract archive/CHD helpers to utils/archives.py
Pull file/archive readers (zip/tar/gz/bz2/7z), CHD parsing, and the
shared libmagic MIME detector out of roms_handler.py into a new
utils/archives.py. Rename the previously underscore-prefixed
read_zip_archive_files / read_tar_archive_files to match the existing
read_7z_archive_files convention, and consolidate the duplicated
"with lock: detector.from_file()" pattern into a detect_mime_type helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 08:41:45 -04:00
Claude
009d358175 fix(security): resolve hostnames in SSRF URL validator
validate_url_for_http_request previously skipped DNS resolution, so
attacker-controlled domains that resolve to private/loopback/link-local
addresses (e.g. 127.0.0.1.nip.io) passed validation and the subsequent
httpx GET hit internal services. Resolve the hostname via getaddrinfo
and reject any result whose IP is private, loopback, link-local,
reserved, multicast, or unspecified. Unresolvable hostnames are
rejected as well.

https://claude.ai/code/session_01T335ZvA825YhuzPctmYzUy
2026-05-27 12:33:36 +00:00
Spinnich
a9f9ea2edc fix(hashing): address trunk lint issues in composite archive hashing
- Use AnyioPath.stat() instead of os.path.getmtime in async context (ASYNC240)
- Add assert to narrow rom_md5_h/rom_sha1_h from HASH|None to HASH (mypy/union-attr)
- Auto-formatted long log.error calls in archive_7zip.py (ruff)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-23 12:14:39 +00:00
Spinnich
c20d48bbf8 feat(hashing): compute both composite hash & individual files hash for multi-file archives
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-23 12:04:04 +00:00
Georges-Antoine Assi
584f35b797 changes from bot review 2026-05-19 07:52:43 -04:00
Georges-Antoine Assi
757fafae5f feat(fs): hardlink import/export assets when possible, harden sync init
Importer (gamelist/launchbox file:// flows) and exporters (gamelist.xml,
metadata.pegasus.txt local exports) now hardlink media assets when source
and destination share a filesystem, falling back transparently to a copy
on EXDEV / EPERM / EOPNOTSUPP / EMLINK / EACCES (cross-device, FAT32,
exFAT, network mounts, etc.). Saves disk space and is effectively
instantaneous on large files (videos, manuals, miximages).

Covers keep a real copy (allow_link=False) because _store_cover resizes
the small cover in place via PIL.Image.save, which would truncate the
shared inode and corrupt the user's source image.

Also makes FSSyncHandler tolerate a missing/unwritable /romm/sync at
startup: an OSError from mkdir now logs a warning instead of crashing
the whole app at module-import time. Sync calls still fail at use time
if the mount remains broken — the right place to surface the error.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 07:38:11 -04:00
Georges-Antoine Assi
4f03cdf03d Copy gamelist media into a local assets/ folder
Mirror the pegasus exporter pattern: collect each ROM's media into a
canonical asset-kind dict, then either copy the files under
<platform>/assets/<subdir>/ for local exports or build absolute URLs
from request.base_url for remote exports. Exclude the generated assets/
directory from filesystem scans.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 08:15:41 -04:00
Georges-Antoine Assi
f4227cafaf Build absolute resource URLs and simplify traversal check
Use URLPath.make_absolute_url with request.base_url to build resource
URLs for non-local exports, and simplify the local-export traversal
check with Path.is_relative_to.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-15 22:22:09 -04:00
Gavin Tronset
4a2a4d6d38 PR feedback for tests and relpath validation 2026-05-13 15:23:54 -07:00
Gavin Tronset
87f0cbd3de Fix YouTube URL and conditionally use relative 2026-05-13 13:37:45 -07:00
Gavin Tronset
17d5178586 Use relative paths for resources in gamelist exports 2026-05-13 13:37:45 -07:00
Yukine
a00bb281e4 refactor(utils): move fire_and_forget helper into utils.background_tasks 2026-04-21 12:50:10 +02:00
Georges-Antoine Assi
e8a6e9f01d final fixes 2026-04-12 18:43:24 -04:00
Georges-Antoine Assi
d45afb5dde more fixes 2026-04-12 18:32:15 -04:00
Georges-Antoine Assi
8f1b8f41d7 perf: skip RAHasher subprocess for archived disc-platform ROMs
RAHasher was being spawned for every hashable ROM regardless of file
type. When the source file is a zip/7z/tar and the RA platform needs
an on-disk disc image (PSX, PS2, PSP, Saturn, Dreamcast, Sega CD,
3DO, PC-FX, Neo Geo CD, TurboGrafx CD, Atari Jaguar CD, Wii), the
subprocess fails with "Unsupported console for buffer hash: {id}"
after paying full process-spawn overhead per ROM — a serious slowdown
when indexing large zipped collections (e.g. myrient PS2/PSP sets).

calculate_hash now short-circuits those combinations with a debug log
and no subprocess. Raw disc images (.iso, .chd, .cue/.bin) and
archives on cartridge platforms still go through RAHasher as before.

Also centralize COMPRESSED_FILE_EXTENSIONS in utils/filesystem.py so
roms_handler (is_compressed_file / hashing), rahasher (skip logic),
and feeds (PKGi passthrough) share one source of truth. The shared
set adds .rar, which is_compressed_file now recognizes too.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:18:14 -04:00
Georges-Antoine Assi
62e540d60e changes from bot review 2026-04-12 15:43:14 -04:00
Georges-Antoine Assi
4928041593 manual cleanu 2026-04-12 11:04:12 -04:00
Georges-Antoine Assi
2dc1678931 changes from bot review 2026-04-06 11:22:44 -04:00
Georges-Antoine Assi
f2619ac0d1 Merge branch 'master' into pegasus-metadata-export 2026-04-06 11:06:08 -04:00
Georges-Antoine Assi
21eee327b0 Merge branch 'master' into save-sync 2026-04-06 09:09:53 -04:00
Georges-Antoine Assi
af69630481 more self review 2026-04-05 23:17:57 -04:00
Georges-Antoine Assi
1501f45220 more changes from review 2026-04-05 23:15:42 -04:00
Georges-Antoine Assi
fafb804bc6 mega cleanup 2026-04-05 22:35:37 -04:00
Georges-Antoine Assi
a61ff81e22 Merge branch 'master' into gamelist-customize 2026-04-05 22:11:02 -04:00
Georges-Antoine Assi
f2e8e337b2 Merge branch 'master' into save-sync 2026-04-05 21:47:53 -04:00
Georges-Antoine Assi
b79bcbcfce remove clud meta ID 2026-04-05 19:50:37 -04:00
Georges-Antoine Assi
7c41fb5bac revert fs_name sibling roms 2026-04-05 17:57:48 -04:00
Georges-Antoine Assi
ef35ecaea9 props rom updte endpoint 2026-04-04 14:16:00 -04:00
Georges-Antoine Assi
4a8c3423df run trunk fmt 2026-04-02 10:21:43 -04:00
Eric Daras
9d659bf00b move proxy env handling into config 2026-04-02 06:23:53 +02:00
Vargash
11827d1700 fix: retrieve config from get_config() 2026-03-31 15:08:19 +02:00
Vargash
f4c965be33 feat: add new export.gamelist.media.image configuration
also fix broken title_screen/miximage/physical key name usage
2026-03-31 11:16:40 +02:00
Vargash
ec30b18bba feat: add bezel tag to gamelist export 2026-03-31 11:04:13 +02:00
Vargash
3e072873e1 feat: new export.gamelist.media.thumbnail configuration 2026-03-31 11:00:49 +02:00