Commit Graph

132 Commits

Author SHA1 Message Date
Georges-Antoine Assi
207d0dc4c6 feat(hashing): persist per-member hashes on archive RomFile
Internal members of multi-file archives (zip/tar/7z/rar) are now hashed
individually (crc/md5/sha1) and stored in a new `archive_members` JSON
column on the archive's RomFile, alongside the existing composite hash
used for hash-database matching. Only the archive itself is surfaced as
a RomFile so full_path keeps pointing at a file that exists on disk,
which is the constraint that previously forced us to choose between
composite-only or broken downloads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 09:41:04 -04:00
Georges-Antoine Assi
9111f70d0a refactor(filesystem): merge archive_7zip.py into archives.py
Consolidate all archive readers (zip/tar/7z/rar) and 7z-internal helpers
into a single utils/archives.py module to keep the archive surface area
in one place.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 09:10:01 -04:00
Georges-Antoine Assi
a1194dc5e0 changes from bot review 2026-05-28 09:02:26 -04:00
Georges-Antoine Assi
a170649fe6 fix(hashing): emit single RomFile for multi-file archives
Per-internal-member RomFiles produced full_paths that didn't exist on
disk, breaking downloads and zip-building. Stream entries into the
composite hash only and emit one RomFile pointing at the archive itself.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 21:35:01 -04:00
Georges-Antoine Assi
0bfe369425 run fmt 2026-05-27 21:03:08 -04:00
Georges-Antoine Assi
acff688f11 refactor(hashing): use _make_file_hash helper at remaining sites
Apply the helper to the three other per-file FileHash constructions
(folder-walk hash, empty-archive fallback, single-file hash). The
all-empty FileHash literals are left alone since the helper would be
strictly more obscure for that case.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 09:12:11 -04:00
Georges-Antoine Assi
f255b5a7d9 feat(hashing): add RAR support to multi-file archive composite hashing
Add read_rar_archive_files via the existing 7zz binary (which natively
handles RAR3/RAR5 read), and collapse the per-extension reader dispatch
into an ARCHIVE_READERS dict so future formats are one entry away. Also
extract a small _make_file_hash helper to remove the repeated nested
ternaries in the inner loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 09:09:37 -04:00
Georges-Antoine Assi
438c03facc refactor(filesystem): extract archive/CHD helpers to utils/archives.py
Pull file/archive readers (zip/tar/gz/bz2/7z), CHD parsing, and the
shared libmagic MIME detector out of roms_handler.py into a new
utils/archives.py. Rename the previously underscore-prefixed
read_zip_archive_files / read_tar_archive_files to match the existing
read_7z_archive_files convention, and consolidate the duplicated
"with lock: detector.from_file()" pattern into a detect_mime_type helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 08:41:45 -04:00
Spinnich
242dc9e357 fix(hashing): use only default exclusions for archive internal files
User-configured EXCLUDED_MULTI_PARTS_EXT/FILES are intentionally not
applied to archive internal files. Archives are curated ROM sets where
every file is relevant — user custom exclusions (e.g. "bin") could
silently produce incorrect composite hashes. Only the hardcoded
DEFAULT_EXCLUDED_FILES/EXTENSIONS (junk like .DS_Store, gamelist.xml)
are applied.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-23 12:28:49 +00:00
Spinnich
a9f9ea2edc fix(hashing): address trunk lint issues in composite archive hashing
- Use AnyioPath.stat() instead of os.path.getmtime in async context (ASYNC240)
- Add assert to narrow rom_md5_h/rom_sha1_h from HASH|None to HASH (mypy/union-attr)
- Auto-formatted long log.error calls in archive_7zip.py (ruff)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-23 12:14:39 +00:00
Spinnich
c20d48bbf8 feat(hashing): compute both composite hash & individual files hash for multi-file archives
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-23 12:04:04 +00:00
Georges-Antoine Assi
1be2ca2b3c soimplify 2026-05-21 17:17:30 -04:00
copilot-swe-agent[bot]
98bc9a9eea Optimize multi-ROM exclusion matching pass
Co-authored-by: gantoine <3247106+gantoine@users.noreply.github.com>
2026-05-21 18:52:55 +00:00
copilot-swe-agent[bot]
5a1e238a5f perf: pre-normalize exclusions once and use set for O(1) lookup in exclude_multi_roms
Co-authored-by: gantoine <3247106+gantoine@users.noreply.github.com>
2026-05-21 18:50:45 +00:00
copilot-swe-agent[bot]
9e3f85b085 Fix ES-DE multi-folder exclusion matching
Agent-Logs-Url: https://github.com/rommapp/romm/sessions/2213cb94-9971-48a6-8d17-9efc5c209db4

Co-authored-by: gantoine <3247106+gantoine@users.noreply.github.com>
2026-05-21 11:22:21 +00:00
Georges-Antoine Assi
591b07ec49 changes from bot review 2026-05-18 14:44:52 -04:00
Georges-Antoine Assi
e6d4ede939 cleanup 2026-05-18 07:40:59 -04:00
Spinnich
01f0b1d2b5 feat(hashing): compute raw CHD hashes and route disc-data SHA1 to Hasheous
CHD files now follow the same hash logic as all other file types — CRC32,
MD5, and SHA1 are computed from raw container bytes. This allows
ScreenScraper to log KO entries for unrecognised CHD files, which it
could not do when only the disc-data SHA1 was being computed.

The CHD header SHA1 (disc-data SHA1) is separately extracted and stored
in a new chd_sha1_hash field on RomFile, with a migration adding the
column to rom_files. Hasheous receives only this disc-data SHA1 (no
CRC/MD5) since it indexes disc-based games by disc-data SHA1, not raw
file hashes.

The RAHasher multi-file path now passes the largest CHD directly instead
of a /* wildcard, which RAHasher cannot expand. Hash computations are
wrapped in asyncio.to_thread to avoid blocking the event loop during
large reads.

Hash-lookup metadata handlers (ScreenScraper, Hasheous, Playmatch) now
fall back to rom.files (stored DB hashes) when fs_rom files are not
rehashed, fixing hash-based matching for UNMATCHED and UPDATE scan types.

The Disc SHA-1 is displayed in the ROM detail view for both single-file
(FileInfo.vue) and multi-file (FileSelectItem.vue) CHD games.
2026-05-17 08:01:05 -04:00
Georges-Antoine Assi
5e3a2707b0 cleanup 2026-05-03 19:39:19 -04:00
copilot-swe-agent[bot]
9593c30292 Address PR review: normalize exclusion sets, avoid duplicates, add multi-dot test for get_rom_files
Agent-Logs-Url: https://github.com/rommapp/romm/sessions/8cbbc2ca-a3e3-4c61-9e47-f8544d59231a

Co-authored-by: gantoine <3247106+gantoine@users.noreply.github.com>
2026-05-03 23:34:30 +00:00
copilot-swe-agent[bot]
101629628e Simplify extension exclusion to use ends-with check instead of sub-extension iteration
Agent-Logs-Url: https://github.com/rommapp/romm/sessions/a81b2023-a243-4721-bc5e-c6fa1a473a79

Co-authored-by: gantoine <3247106+gantoine@users.noreply.github.com>
2026-05-03 22:46:21 +00:00
copilot-swe-agent[bot]
55cd0cfc4f Support compound suffix exclusions like "hash.txt" for multi-dot filenames
Agent-Logs-Url: https://github.com/rommapp/romm/sessions/d1c69638-bfa0-480e-8050-d565b234ea44

Co-authored-by: gantoine <3247106+gantoine@users.noreply.github.com>
2026-05-03 01:29:04 +00:00
copilot-swe-agent[bot]
21de7e21f8 Fix file exclusion for multi-dot filenames (e.g. game.nds.hash.txt)
Agent-Logs-Url: https://github.com/rommapp/romm/sessions/2f711770-100b-4e9e-a66e-ab1a74f025f8

Co-authored-by: gantoine <3247106+gantoine@users.noreply.github.com>
2026-05-02 16:50:30 +00:00
Georges-Antoine Assi
962a9bfa7e one more 2026-04-30 14:39:52 -04:00
Georges-Antoine Assi
fc8d69dc0c update tests 2026-04-30 14:18:49 -04:00
Georges-Antoine Assi
96c3634b80 refactor: split HIGH_PRIO_STRUCTURE_PATH into STRUCTURE_PATH_A/B
Replace the single HIGH_PRIO_STRUCTURE_PATH config attribute with two
glob patterns (STRUCTURE_PATH_A = roms/*, STRUCTURE_PATH_B = */roms) and
update all call sites to detect Structure B via glob.glob, defaulting to
Structure A when no match is found.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 12:52:22 -04:00
Georges-Antoine Assi
e8a6e9f01d final fixes 2026-04-12 18:43:24 -04:00
Georges-Antoine Assi
d45afb5dde more fixes 2026-04-12 18:32:15 -04:00
Georges-Antoine Assi
628d8d8bae refactor: pass RAGamesPlatform dict into calculate_hash, normalize extension
Callers now pass the full platform dict and rom.fs_extension; the service
normalizes the extension (optional leading dot, case-insensitive) before
checking the compressed-archive skip set, so ROMs stored with bare
extensions like "zip" correctly hit the skip path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 18:05:44 -04:00
Georges-Antoine Assi
8f1b8f41d7 perf: skip RAHasher subprocess for archived disc-platform ROMs
RAHasher was being spawned for every hashable ROM regardless of file
type. When the source file is a zip/7z/tar and the RA platform needs
an on-disk disc image (PSX, PS2, PSP, Saturn, Dreamcast, Sega CD,
3DO, PC-FX, Neo Geo CD, TurboGrafx CD, Atari Jaguar CD, Wii), the
subprocess fails with "Unsupported console for buffer hash: {id}"
after paying full process-spawn overhead per ROM — a serious slowdown
when indexing large zipped collections (e.g. myrient PS2/PSP sets).

calculate_hash now short-circuits those combinations with a debug log
and no subprocess. Raw disc images (.iso, .chd, .cue/.bin) and
archives on cartridge platforms still go through RAHasher as before.

Also centralize COMPRESSED_FILE_EXTENSIONS in utils/filesystem.py so
roms_handler (is_compressed_file / hashing), rahasher (skip logic),
and feeds (PKGi passthrough) share one source of truth. The shared
set adds .rar, which is_compressed_file now recognizes too.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 17:18:14 -04:00
Georges-Antoine Assi
f9f3dfd927 changes from bot review 2026-04-12 09:50:54 -04:00
Georges-Antoine Assi
ec8583016b mega ton of fixes for 4.8 2026-04-03 10:54:31 -04:00
Georges-Antoine Assi
744d92d6d1 lower fs name 2026-03-11 21:05:01 -04:00
copilot-swe-agent[bot]
24fe5b941f refactor: move get_pico8_cover_url to FSRomsHandler, use validate_path for safe path construction
Co-authored-by: gantoine <3247106+gantoine@users.noreply.github.com>
2026-03-11 22:17:22 +00:00
Georges-Antoine Assi
ee8b55e6ef last set of changes 2026-03-07 09:56:17 -05:00
Georges-Antoine Assi
8a56e9b333 [ROMM-3026] Region/language shortcodes should be case sensitive 2026-02-18 10:19:12 -05:00
Georges-Antoine Assi
f867968f37 refactor get_rom_files return value 2025-12-30 11:42:38 -05:00
Georges-Antoine Assi
0971026f95 Add support for version tag 2025-12-30 11:37:06 -05:00
Zurdi
0d9a2e9380 Update backend/handler/filesystem/roms_handler.py
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-20 15:21:02 +01:00
zurdi
c82caa81b8 Add calculate hashes option to scanning process and update translations 2025-11-20 10:49:22 +00:00
sftwninja
7581c0a8e8 fix: Address Gemini PR comments 2025-11-17 01:55:55 -06:00
sftwninja
90a5a66a12 Use internal SHA1 hash if CHD file is v5 2025-11-16 23:41:32 -06:00
Georges-Antoine Assi
f8b0ae63a1 fix scanning multi file games with ssfr 2025-10-31 10:50:51 -04:00
Georges-Antoine Assi
24a5acce5d [ROMM-2552] Rom hashes should only include top-level nested files 2025-10-18 18:05:57 -04:00
Georges-Antoine Assi
b5776be475 Split rom.multi into more specific fields 2025-09-25 18:48:27 -04:00
Georges-Antoine Assi
3c4113f8a8 Merge branch 'master' into flashpoint-metadata-handler 2025-09-11 21:27:48 -04:00
Michael Manganiello
e4e3928d1b misc: Apply import sorting 2025-09-04 11:17:00 -03:00
Georges-Antoine Assi
ef2546ec08 fix base handler filename 2025-08-27 12:40:16 -04:00
Georges-Antoine Assi
82f527b3ad Remove check for non extension 2025-08-20 20:48:30 -04:00
Georges-Antoine Assi
8fb4769776 changes from code review 2025-08-13 14:03:45 -04:00