ScreenScraper enforces a per-account *thread* (concurrency) cap rather than
a request rate. Requests can take several seconds, so spacing out request
starts at 1/s could still leave multiple requests overlapping in flight,
exceeding the cap and getting rejected with ScreenScraper's custom HTTP codes.
- Add ConcurrencyLimiter: a runtime-resizable, loop-agnostic limiter that
bounds simultaneous in-flight operations (held for the whole request via
async context manager), instead of spacing request starts.
- Switch the ScreenScraper service from the req/s RateLimiter to a module-level
ConcurrencyLimiter defaulting to a single thread.
- Recognize contributor/donor accounts: parse `ssuser.maxthreads` from each
response and raise the concurrency allowance to match, so supporters scrape
with their full thread count instead of the conservative default.
Adds unit tests for the limiter (blocking, wake-on-release, runtime resize)
and for the ScreenScraper slot-holding and thread-allowance updates.
https://claude.ai/code/session_01133QQuWvq8Zm25DZMP9PVr
Internal members of multi-file archives (zip/tar/7z/rar) are now hashed
individually (crc/md5/sha1) and stored in a new `archive_members` JSON
column on the archive's RomFile, alongside the existing composite hash
used for hash-database matching. Only the archive itself is surfaced as
a RomFile so full_path keeps pointing at a file that exists on disk,
which is the constraint that previously forced us to choose between
composite-only or broken downloads.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Consolidate all archive readers (zip/tar/7z/rar) and 7z-internal helpers
into a single utils/archives.py module to keep the archive surface area
in one place.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The async backend's `loop.getaddrinfo` ran without any timeout, so a
slow or hanging resolver could outlive the timeout the caller passed —
the previous code only bounded the TCP connect inside the inner
backend. Wrap the resolution in `asyncio.timeout(timeout)` and surface
the timeout as `httpcore.ConnectTimeout`.
Also tidy the test stubs (mypy func-returns-value) and add explicit
type annotations to the `calls` lists (mypy var-annotated). A targeted
`# noqa: ASYNC109` sits on the `timeout` parameter of `connect_tcp` /
`connect_unix_socket` with an explanatory comment: the rule advises
against `timeout` parameters on async APIs we author, but here we're
implementing `AsyncNetworkBackend`, and the timeout is consumed in the
asyncio-native pattern the rule endorses.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous validator did a preflight `socket.getaddrinfo` before each
httpx request. Two problems:
* DNS rebinding / TOCTOU: httpx re-resolves at connect time, so a
hostname can answer with a public IP for the validator and a
private IP for the real request. The preflight check did not
constrain the connection.
* Event-loop blocking: `socket.getaddrinfo` is synchronous, and the
media-download callers are async. Slow resolvers stalled
unrelated requests.
Replace it with two layers, both wired automatically onto every httpx
client built by `utils.context`:
1. A request event hook running `validate_url_for_http_request`
(syntactic checks only: scheme, reserved hostnames, literal IPs,
internal TLDs). No DNS, no call-site responsibility.
2. `SSRFProtectedAsyncBackend` / `SSRFProtectedSyncBackend`, custom
httpcore network backends that resolve the hostname inside
`connect_tcp`, reject any address in a forbidden range, then
connect to that *same* validated address. The async variant uses
`loop.getaddrinfo` so it doesn't block the loop. httpcore calls
`start_tls(server_hostname=<URL host>)` after `connect_tcp`, so
TLS SNI and cert verification still use the original hostname
even though the TCP layer connects by IP.
Drop the explicit `validate_url_for_http_request(...)` calls from
`resources_handler.py` — the event hook covers them. Consolidate the
URL validator and its tests under `utils/ssrf.py` /
`tests/utils/test_ssrf.py` so the SSRF surface lives in one module.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add read_rar_archive_files via the existing 7zz binary (which natively
handles RAR3/RAR5 read), and collapse the per-extension reader dispatch
into an ARCHIVE_READERS dict so future formats are one entry away. Also
extract a small _make_file_hash helper to remove the repeated nested
ternaries in the inner loop.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pull file/archive readers (zip/tar/gz/bz2/7z), CHD parsing, and the
shared libmagic MIME detector out of roms_handler.py into a new
utils/archives.py. Rename the previously underscore-prefixed
read_zip_archive_files / read_tar_archive_files to match the existing
read_7z_archive_files convention, and consolidate the duplicated
"with lock: detector.from_file()" pattern into a detect_mime_type helper.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
validate_url_for_http_request previously skipped DNS resolution, so
attacker-controlled domains that resolve to private/loopback/link-local
addresses (e.g. 127.0.0.1.nip.io) passed validation and the subsequent
httpx GET hit internal services. Resolve the hostname via getaddrinfo
and reject any result whose IP is private, loopback, link-local,
reserved, multicast, or unspecified. Unresolvable hostnames are
rejected as well.
https://claude.ai/code/session_01T335ZvA825YhuzPctmYzUy
- Use AnyioPath.stat() instead of os.path.getmtime in async context (ASYNC240)
- Add assert to narrow rom_md5_h/rom_sha1_h from HASH|None to HASH (mypy/union-attr)
- Auto-formatted long log.error calls in archive_7zip.py (ruff)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Importer (gamelist/launchbox file:// flows) and exporters (gamelist.xml,
metadata.pegasus.txt local exports) now hardlink media assets when source
and destination share a filesystem, falling back transparently to a copy
on EXDEV / EPERM / EOPNOTSUPP / EMLINK / EACCES (cross-device, FAT32,
exFAT, network mounts, etc.). Saves disk space and is effectively
instantaneous on large files (videos, manuals, miximages).
Covers keep a real copy (allow_link=False) because _store_cover resizes
the small cover in place via PIL.Image.save, which would truncate the
shared inode and corrupt the user's source image.
Also makes FSSyncHandler tolerate a missing/unwritable /romm/sync at
startup: an OSError from mkdir now logs a warning instead of crashing
the whole app at module-import time. Sync calls still fail at use time
if the mount remains broken — the right place to surface the error.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirror the pegasus exporter pattern: collect each ROM's media into a
canonical asset-kind dict, then either copy the files under
<platform>/assets/<subdir>/ for local exports or build absolute URLs
from request.base_url for remote exports. Exclude the generated assets/
directory from filesystem scans.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Use URLPath.make_absolute_url with request.base_url to build resource
URLs for non-local exports, and simplify the local-export traversal
check with Path.is_relative_to.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RAHasher was being spawned for every hashable ROM regardless of file
type. When the source file is a zip/7z/tar and the RA platform needs
an on-disk disc image (PSX, PS2, PSP, Saturn, Dreamcast, Sega CD,
3DO, PC-FX, Neo Geo CD, TurboGrafx CD, Atari Jaguar CD, Wii), the
subprocess fails with "Unsupported console for buffer hash: {id}"
after paying full process-spawn overhead per ROM — a serious slowdown
when indexing large zipped collections (e.g. myrient PS2/PSP sets).
calculate_hash now short-circuits those combinations with a debug log
and no subprocess. Raw disc images (.iso, .chd, .cue/.bin) and
archives on cartridge platforms still go through RAHasher as before.
Also centralize COMPRESSED_FILE_EXTENSIONS in utils/filesystem.py so
roms_handler (is_compressed_file / hashing), rahasher (skip logic),
and feeds (PKGi passthrough) share one source of truth. The shared
set adds .rar, which is_compressed_file now recognizes too.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>