Apply the helper to the three other per-file FileHash constructions
(folder-walk hash, empty-archive fallback, single-file hash). The
all-empty FileHash literals are left alone since the helper would be
strictly more obscure for that case.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add read_rar_archive_files via the existing 7zz binary (which natively
handles RAR3/RAR5 read), and collapse the per-extension reader dispatch
into an ARCHIVE_READERS dict so future formats are one entry away. Also
extract a small _make_file_hash helper to remove the repeated nested
ternaries in the inner loop.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pull file/archive readers (zip/tar/gz/bz2/7z), CHD parsing, and the
shared libmagic MIME detector out of roms_handler.py into a new
utils/archives.py. Rename the previously underscore-prefixed
read_zip_archive_files / read_tar_archive_files to match the existing
read_7z_archive_files convention, and consolidate the duplicated
"with lock: detector.from_file()" pattern into a detect_mime_type helper.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User-configured EXCLUDED_MULTI_PARTS_EXT/FILES are intentionally not
applied to archive internal files. Archives are curated ROM sets where
every file is relevant — user custom exclusions (e.g. "bin") could
silently produce incorrect composite hashes. Only the hardcoded
DEFAULT_EXCLUDED_FILES/EXTENSIONS (junk like .DS_Store, gamelist.xml)
are applied.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Use AnyioPath.stat() instead of os.path.getmtime in async context (ASYNC240)
- Add assert to narrow rom_md5_h/rom_sha1_h from HASH|None to HASH (mypy/union-attr)
- Auto-formatted long log.error calls in archive_7zip.py (ruff)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Importer (gamelist/launchbox file:// flows) and exporters (gamelist.xml,
metadata.pegasus.txt local exports) now hardlink media assets when source
and destination share a filesystem, falling back transparently to a copy
on EXDEV / EPERM / EOPNOTSUPP / EMLINK / EACCES (cross-device, FAT32,
exFAT, network mounts, etc.). Saves disk space and is effectively
instantaneous on large files (videos, manuals, miximages).
Covers keep a real copy (allow_link=False) because _store_cover resizes
the small cover in place via PIL.Image.save, which would truncate the
shared inode and corrupt the user's source image.
Also makes FSSyncHandler tolerate a missing/unwritable /romm/sync at
startup: an OSError from mkdir now logs a warning instead of crashing
the whole app at module-import time. Sync calls still fail at use time
if the mount remains broken — the right place to surface the error.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CHD files now follow the same hash logic as all other file types — CRC32,
MD5, and SHA1 are computed from raw container bytes. This allows
ScreenScraper to log KO entries for unrecognised CHD files, which it
could not do when only the disc-data SHA1 was being computed.
The CHD header SHA1 (disc-data SHA1) is separately extracted and stored
in a new chd_sha1_hash field on RomFile, with a migration adding the
column to rom_files. Hasheous receives only this disc-data SHA1 (no
CRC/MD5) since it indexes disc-based games by disc-data SHA1, not raw
file hashes.
The RAHasher multi-file path now passes the largest CHD directly instead
of a /* wildcard, which RAHasher cannot expand. Hash computations are
wrapped in asyncio.to_thread to avoid blocking the event loop during
large reads.
Hash-lookup metadata handlers (ScreenScraper, Hasheous, Playmatch) now
fall back to rom.files (stored DB hashes) when fs_rom files are not
rehashed, fixing hash-based matching for UNMATCHED and UPDATE scan types.
The Disc SHA-1 is displayed in the ROM detail view for both single-file
(FileInfo.vue) and multi-file (FileSelectItem.vue) CHD games.
Rom.regions can contain raw filename text like "europe" or "EUROPE"
(filename parsing in roms_handler doesn't normalize casing), so the
direct dict lookup missed those tags and the locale silently fell back
to scan.priority.region. Replace the dict access with a helper that
lowercases both sides.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a ROM filename carries a region tag (e.g. (Europe)), use that
region first when picking artwork and localized titles, falling back to
the configured scan.priority.region. Previously the configured priority
was the only signal, so a US-first config would force US covers onto
European ROMs even when an EU asset was available.
Adds a shared name->provider-shortcode map and threads the rom through
the IGDB and SS lookup APIs so the rom-aware locale/region selection
can run for both providers.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
magic.Magic(mime=True) loads the magic database from disk on construction;
instantiating it per request was adding pointless overhead to every avatar
and artwork upload. Share a module-level instance guarded by a lock (the
underlying magic_t handle is not thread-safe), and surface MagicException
as a 400 so a sniffing failure fails closed instead of bubbling a 500.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Avatar, ROM artwork, and collection artwork uploads now sniff the file
header with libmagic and reject anything that isn't PNG/JPEG/WebP/GIF,
saving the file with an extension derived from the detected MIME rather
than the user-supplied filename. Pairs with the raw asset endpoint,
which decides inline vs attachment from the on-disk extension.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the single HIGH_PRIO_STRUCTURE_PATH config attribute with two
glob patterns (STRUCTURE_PATH_A = roms/*, STRUCTURE_PATH_B = */roms) and
update all call sites to detect Structure B via glob.glob, defaulting to
Structure A when no match is found.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LaunchBox produced file:// URIs relative to /romm/launchbox, but the
resources handler resolved them under /romm/library via fs_rom_handler,
so local images/manuals/screenshots were never found. Switch LaunchBox
to a distinct launchbox-file:// scheme and add FSLaunchboxHandler +
_resolve_local_file_uri to route each scheme to the correct root.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Callers now pass the full platform dict and rom.fs_extension; the service
normalizes the extension (optional leading dot, case-insensitive) before
checking the compressed-archive skip set, so ROMs stored with bare
extensions like "zip" correctly hit the skip path.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
RAHasher was being spawned for every hashable ROM regardless of file
type. When the source file is a zip/7z/tar and the RA platform needs
an on-disk disc image (PSX, PS2, PSP, Saturn, Dreamcast, Sega CD,
3DO, PC-FX, Neo Geo CD, TurboGrafx CD, Atari Jaguar CD, Wii), the
subprocess fails with "Unsupported console for buffer hash: {id}"
after paying full process-spawn overhead per ROM — a serious slowdown
when indexing large zipped collections (e.g. myrient PS2/PSP sets).
calculate_hash now short-circuits those combinations with a debug log
and no subprocess. Raw disc images (.iso, .chd, .cue/.bin) and
archives on cartridge platforms still go through RAHasher as before.
Also centralize COMPRESSED_FILE_EXTENSIONS in utils/filesystem.py so
roms_handler (is_compressed_file / hashing), rahasher (skip logic),
and feeds (PKGi passthrough) share one source of truth. The shared
set adds .rar, which is_compressed_file now recognizes too.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Problem
_check_content_type used the full Content-Type header string (lowercased) and matched it with startswith(...) against allowed prefixes.
That is mostly fine when the server sends a bare type like application/pdf. It breaks down when vendors send parameters on the same header (e.g. name="…", charset=…). In theory application/force-download; name="…" should still start with application/force-download, but in practice you can get:
Leading whitespace or a UTF‑8 BOM before the type token, so the string no longer starts with your prefix even though the MIME type is correct.
Confusing logs: logging only the lowercased full header is fine, but the decision should be based on the standardized MIME essence (type + subtype, no parameters), which is what other stacks use for “what is this?”
So the fix is to parse the header the usual way and only then apply your allowlist.
What changed
_content_type_essence(header_value)
Takes everything before the first ; (the essence).
Strips whitespace, lowercases, strips a leading BOM (\ufeff) so odd clients/proxies don’t break the check.
_check_content_type
Reads the raw content-type header once.
Runs startswith on the essence, not on the full header with parameters.
Rejects if the essence is empty (missing or useless header).
Logging uses the raw header string (or (missing header)), so operators still see exactly what the server sent.
Call sites and allowed prefixes (image/, application/pdf, etc.) are unchanged; only how the string is normalized before comparison changes.
Security / SSRF
This does not replace URL / SSRF controls; it only makes post-fetch type checking consistent with how Content-Type is defined (essence vs parameters). You are not widening the allowlist—same prefixes, stricter handling of “empty” and clearer matching on the actual type token.
Risk / regression
Low: same allowed prefixes, strictly more tolerant of benign formatting (whitespace, BOM, parameters). The only stricter case is empty essence after strip (e.g. malformed header), which correctly fails the check.
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
I have reviewed the proposal and these edits will handle cases where the string we match against for the content_type is cleaned up more before comparing against the allow list of content_types.
I have tested this, and confirm that I do not get any errors loading PDFs for game manuals using this. Please consider this, as this should be compatible with the existing content type allowlist, and easily work with any new types added to it.