Commit Graph

290 Commits

Author SHA1 Message Date
rishikanthc
4dc6810015 Add partial index predicate regression tests 2026-04-23 12:32:36 -07:00
rishikanthc
5993f418d0 Fix partial index WHERE extraction 2026-04-23 12:30:40 -07:00
rishikanthc
42ff560afc Harden partial index predicate verification 2026-04-23 12:29:02 -07:00
rishikanthc
4960a2d528 Finalize database migration correctness fixes 2026-04-23 12:22:01 -07:00
rishikanthc
a5f88fb638 Fix database migration backfill regressions 2026-04-23 12:10:36 -07:00
rishikanthc
f926509ac9 Harden database schema detection and migration errors 2026-04-23 12:01:25 -07:00
rishikanthc
5d6a60d793 Refactor database migration and persistence layer 2026-04-23 11:38:25 -07:00
rishikanthc
c5f758cddd repo: make execution creation deterministic and add user-scoped APIs 2026-04-23 11:24:13 -07:00
rishikanthc
fc3e933104 db: enforce single default per user and migration normalization 2026-04-23 11:24:08 -07:00
rishikanthc
5067a40790 test database migration with real sqlite fixtures 2026-04-23 10:35:59 -07:00
rishikanthc
e3a7c48bd7 fix schema compatibility gaps after migration 2026-04-23 10:35:15 -07:00
rishikanthc
0be71a63a0 refactor database schema and legacy migration flow 2026-04-23 10:17:03 -07:00
Booth
bdb8838b8b fix: pin torchcodec to 0.7.x for compatibility with PyTorch 2.8.x
torchcodec>=0.6.0 (upstream default) resolves to 0.10.0+ which requires
PyTorch 2.9. Scriberr ships PyTorch 2.8.x, causing a C++ ABI symbol
mismatch at load time. Pin to ~=0.7.0, the last release compatible with
PyTorch 2.8.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 10:41:40 -07:00
Claude
9fd0943b92 fix: move override-dependencies to correct TOML scope [tool.uv]
The override-dependencies key was placed after [tool.uv.sources], causing
it to be parsed as tool.uv.sources.override-dependencies instead of
tool.uv.override-dependencies. uv would silently ignore it, meaning
torchcodec was never actually excluded on Linux aarch64.

https://claude.ai/code/session_01YMyUwpk577EradV93tMMqS
2026-04-21 10:38:20 -07:00
Claude
6221843866 fix: restore diarization on Linux ARM64 and add WhisperX model selector
- Default sortformer output format to json; RTTM path fails silently
  on NeMo annotation objects, producing zero diarization segments
- Exclude torchcodec on Linux aarch64 via uv platform marker; no
  wheels exist for any torchcodec version on manylinux aarch64, causing
  pyannote environment setup to fail entirely on ARM64 Docker
- Add diarization model selector to WhisperX config UI; Parakeet and
  Canary sections already had this but WhisperX was missing it, making
  it impossible to select nvidia_sortformer as the diarization backend

https://claude.ai/code/session_01YMyUwpk577EradV93tMMqS
2026-04-21 10:38:20 -07:00
scnerd
77c3365a4a Add TODO comment to remember that this fix is really about technical debt and we should just remove the legacy workers argument entirely. 2026-04-21 10:38:09 -07:00
scnerd
d0a1dcbd6c docs(queue): add comments to test functions for consistency
All six test functions now have // TestFoo verifies... comments
matching the project's existing convention.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-21 10:38:09 -07:00
scnerd
8b3f25f801 refactor(queue): use testify assertions and t.Setenv in queue tests
Switch from raw t.Errorf to testify/assert for consistency with the
rest of the codebase. Use t.Setenv() instead of manual os.Setenv/defer
os.Unsetenv for automatic cleanup. Simplify table structs where min
and max are always equal.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-21 10:38:09 -07:00
scnerd
8d05a7cdd9 fix(queue): allow QUEUE_WORKERS env var to override hardcoded worker count
The QUEUE_WORKERS environment variable was defined and read in
getOptimalWorkerCount(), but NewTaskQueue() unconditionally overwrote
the result with the hardcoded legacyWorkers parameter (always 2).
This made QUEUE_WORKERS effectively dead code.

Now legacyWorkers is only used as a fallback when QUEUE_WORKERS is
not set, preserving the default of 2 workers while allowing users
to control concurrency via the environment variable.

Set QUEUE_WORKERS=1 to serialize all transcription jobs and prevent
system overload during bulk uploads.

Fixes: rishikanthc/Scriberr#379

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-21 10:38:09 -07:00
scnerd
1ab08ddc11 test(queue): add unit tests for QUEUE_WORKERS env var behavior
Add tests verifying that getOptimalWorkerCount() respects the
QUEUE_WORKERS environment variable and that NewTaskQueue() should
allow QUEUE_WORKERS to override the hardcoded legacy worker count.

Includes a failing test (TestNewTaskQueue_EnvOverridesLegacy) that
reproduces the bug where QUEUE_WORKERS is always overridden by the
hardcoded legacyWorkers parameter.

Ref: rishikanthc/Scriberr#379

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-21 10:38:09 -07:00
Peter Somlo
175832e8e7 fix: voxtral duration comparison and "auto" language handling
- use shared LANGUAGES constant in frontend config dialog
2026-02-28 10:59:07 -08:00
Fran Fitzpatrick
4e75295019 feat: add speaker identification toggle to summary templates
Add option to include speaker labels in summary prompts when diarization
is available. When enabled, transcripts are formatted as:
[SPEAKER_NAME] Text here...

The prompt also includes a hint to the LLM that speaker labels are present,
helping it produce summaries that attribute statements to specific speakers.

Changes:
- Add IncludeSpeakerInfo field to SummaryTemplate model
- Add toggle UI in summary template dialog
- Format transcript with speaker labels when generating summary
- Update prompt prefix to indicate speaker labels are present

Closes #353

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-28 10:58:49 -08:00
Fran Fitzpatrick
850af1fb6e test: update PyAnnote test to reflect optional HF token
The HF token parameter is now optional at validation time since
it can be provided via the HF_TOKEN environment variable at runtime.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 12:23:27 -08:00
Fran Fitzpatrick
ff12270419 feat: add HF_TOKEN environment variable fallback for diarization
Previously, users had to enter their Hugging Face token in the UI
for every transcription job that used diarization. Now the token
can be set via the HF_TOKEN environment variable, which is
especially useful for Docker deployments.

Changes:
- Add HFToken to backend config (reads from HF_TOKEN env var)
- Update PyAnnote adapter to fall back to env var when no UI token
- Update WhisperX adapter to fall back to env var when no UI token
- Update documentation to clarify both configuration options

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 12:23:27 -08:00
Fran Fitzpatrick
f6df31b500 feat: add VAD segmentation thresholds for Pyannote diarization
Add configurable voice activity detection thresholds to improve
speaker diarization accuracy for noisy or distant audio recordings.

- Add --segmentation-onset and --segmentation-offset CLI args to
  pyannote_diarize.py
- Pass segmentation thresholds from Go adapter to Python script
- Map existing vad_onset/vad_offset params to Pyannote segmentation
- Add VAD Onset/Offset inputs to UI when Pyannote diarization is
  selected (Whisper, Parakeet, Canary model families)

Lower onset values (0.3-0.4) help detect quieter/distant speakers.
Lower offset values improve detection of speech endings.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 12:20:31 -08:00
Peter Somlo
df5de714c4 fix: make transcription temp and output directories configurable
- Add TempDir field to Config struct to read TEMP_DIR env var
- Update NewUnifiedTranscriptionService to accept tempDir and outputDir parameters
- Remove hardcoded "data/temp" and "data/transcripts" paths from unified service
- Update NewUnifiedJobProcessor to pass directory paths from config
- Update main.go to use cfg.TempDir and cfg.TranscriptsDir
- Update all test files to use new function signatures
- Fix database.go to use directory from DATABASE_PATH instead of hardcoded "data/"
2026-01-07 12:18:26 -08:00
rishikanthc
73a82b9f6b fix auto device detection in voxtral 2025-12-31 15:47:19 -08:00
rishikanthc
ad3053cc9b fix: add Voxtral model selection and fix dependencies
- Add FamilyMistralVoxtral and ModelVoxtral constants
- Add case for Voxtral in selectModels switch statement
- Add convertToVoxtralParams function for parameter conversion
- Add MaxNewTokens field to WhisperXParams model
- Map language and max_new_tokens parameters correctly
- Fix parameter name in buffered script (output_path -> output_file)
- Add mistral-common dependency to pyproject.toml
- Check for both VoxtralForConditionalGeneration AND mistral_common

On next server restart, the environment will be re-synced automatically
to install the missing mistral-common dependency.
2025-12-31 15:47:19 -08:00
rishikanthc
1485b01488 feat: add buffered transcription for Voxtral to handle long audio
- Create voxtral_transcribe_buffered.py for audio > 30 minutes
- Split audio into 25-minute chunks for processing
- Automatically detect long audio and use buffered mode
- Concatenate text results from all chunks
- No timestamp adjustment needed (text-only model)
- Handles unlimited audio length via chunking
2025-12-31 15:47:19 -08:00
rishikanthc
5a947e8739 fix: update Voxtral token limits based on 32k context window
- Default: 4096 → 8192 tokens
- Maximum: 8192 → 16384 tokens
- Minimum: 512 → 1024 tokens
- Voxtral has 32k context window, handles 30-40 min audio
- Updated UI description to reflect capabilities
2025-12-31 15:47:19 -08:00
rishikanthc
95ecbf6d21 fix: increase Voxtral max_new_tokens to 4096 (max 8192)
- Default increased from 500 to 4096 tokens
- Maximum increased from 2000 to 8192 tokens
- Minimum increased from 100 to 512 tokens
- Add max_new_tokens to TypeScript interface
- Fix UI to use correct parameter (was using max_line_width)
2025-12-31 15:47:19 -08:00
rishikanthc
1ae7b2bf71 feat: add Voxtral-mini transcription support
- Add VoxtralAdapter using transformers library with direct model loading
- Add Python transcription script with apply_transcription_request() method
- Register Voxtral adapter in main.go with dedicated environment
- Add UI configuration in TranscriptionConfigDialog with warning banner
- Support multilingual transcription without word-level timestamps
- Auto GPU/CPU detection, no device parameter needed
- Graceful degradation for missing timestamp features

Voxtral provides high-quality text-only transcription but does not
support word-level timestamps. UI warns users that synchronized
playback and seek features won't be available.
2025-12-31 15:47:19 -08:00
rishikanthc
923b39e415 fix: ensure directories exist before writing adapter scripts
- Create env directory in copy script functions before writing
- Fixes initialization errors for Parakeet, Canary, and Sortformer adapters
- Update Makefile to use web/project-site for website commands
- Add build target to Makefile for building Scriberr binary
2025-12-31 15:47:19 -08:00
rishikanthc
2afd6a1ecf fixes #317 2025-12-29 21:11:47 -08:00
Paul Irish
9975e6fb02 fix duplicated openapi annotations pt 2 2025-12-29 21:10:13 -08:00
Paul Irish
a7aaf06bbb fix duplicated openapi annotations 2025-12-29 21:10:13 -08:00
Paul Irish
ab912a6b6e always copy scripts 2025-12-29 21:09:53 -08:00
Paul Irish
7471a2a1b6 Add test suite for python adapter scripts 2025-12-29 21:09:53 -08:00
Paul Irish
50dd4130ff Extract python adapter scripts to proper files 2025-12-29 21:09:53 -08:00
Fran Fitzpatrick
8f537548d4 feat: add RTX 5090 Blackwell GPU support (sm_120)
Add support for NVIDIA RTX 50-series GPUs (Blackwell architecture) which
require CUDA 12.8+ and PyTorch cu128 wheels due to the new sm_120 compute
capability.

Changes:
- Add configurable PYTORCH_CUDA_VERSION environment variable to control
  PyTorch wheel version at runtime (cu126 for legacy, cu128 for Blackwell)
- Update all model adapters to use dynamic CUDA version instead of
  hardcoded cu126 URLs
- Update Dockerfile.cuda.12.9 for Blackwell with CUDA 12.9.1 base image,
  PYTORCH_CUDA_VERSION=cu128, and missing WHISPERX_ENV/yt-dlp
- Update Dockerfile.cuda with explicit PYTORCH_CUDA_VERSION=cu126
- Add docker-compose.blackwell.yml for pre-built Blackwell image
- Add docker-compose.build.blackwell.yml for local Blackwell builds
- Add GPU compatibility documentation to README

Fixes: rishikanthc/Scriberr#104
2025-12-24 14:46:44 -08:00
rishikanthc
913063eb49 refactor: Switch yt-dlp to standalone binary & cleanup UV config
- Dockerfiles: Install yt-dlp binary from GitHub releases to /usr/local/bin
- Go: Execute yt-dlp binary directly, removing uv python wrapper
- Config: Remove unused UVPath configuration and findUVPath function
- Entrypoint: Remove yt-dlp init logic (still initializes whisperx env if needed)
2025-12-16 19:07:29 -08:00
rishikanthc
f99087b2bd fix: Resolve mobile audio playback permission issues
- Change cookie SameSite policy from Strict to Lax (Strict blocks media subresources on mobile)
- Decouple Secure cookie flag from APP_ENV:
  - Add SECURE_COOKIES config (defaults to true in prod, but can be overridden)
  - Allows testing production builds over HTTP (home network)
- Increase gocyclo threshold to 25 to accommodate complex handlers
2025-12-16 18:21:36 -08:00
rishikanthc
11434b9f1b feat: Add production security configuration for CORS and cookie handling
- Fix refresh token cookie Secure flag bug (was hardcoded to false)
- Wire up AllowedOrigins config in CORS middleware (router, handlers, chat, SSE)
- Add APP_ENV=production to Dockerfile and Dockerfile.cuda
- Update all docker-compose files with APP_ENV and ALLOWED_ORIGINS examples
- CORS now validates origins in production, allows all in development
- Increase gocyclo threshold from 20 to 25 for complex handlers
2025-12-16 18:21:36 -08:00
rishikanthc
c44de7858b refactor: Complete repository pattern migration for all remaining files (Phases 5-7)
Phase 5: Refactor queue.go (10 DB calls removed)
- Added JobRepository to TaskQueue struct and constructor
- Added UpdateStatus, UpdateError, FindByStatus, CountByStatus methods to JobRepository
- Replaced all database.DB calls with repository methods

Phase 6: Refactor chat_handlers.go and summarize_handlers.go (6 DB calls removed)
- Added GetMessageCountsBySessionIDs and GetLastMessagesBySessionIDs to ChatRepository
- Added UpdateSummary to JobRepository
- Replaced batch queries and update calls with repository methods
- Removed database import from both files

Phase 7: Refactor quick_transcription.go (3 DB calls removed)
- Added JobRepository injection to QuickTranscriptionService
- Updated constructor and all callers

Summary: 46+ database.DB calls replaced with repository methods across 7 phases.
All tests pass, build succeeds.
2025-12-16 18:21:36 -08:00
rishikanthc
86add0037d refactor: Replace direct database.DB calls with repository pattern in handlers, dropzone, and multitrack_processor
Phase 1: Define interfaces
- Created internal/interfaces/ package with AuthServiceInterface, TaskQueueInterface, JobProcessorInterface

Phase 2: Refactor handlers.go (21 DB calls removed)
- Replaced all database.DB calls with repository methods
- Added RefreshTokenRepository for token management
- Added new repository methods: Count, FindActiveTrackJobs, FindLatestCompletedExecution, FindByName

Phase 3: Refactor dropzone.go (3 DB calls removed)
- Added CountWithAutoTranscription to UserRepository
- Injected JobRepository and UserRepository into Service

Phase 4: Refactor multitrack_processor.go
- Changed constructor to accept *gorm.DB and JobRepository
- Updated Handler to inject MultiTrackProcessor

Updated all test files with new dependencies and mock implementations.
2025-12-16 18:21:36 -08:00
rishikanthc
7fc7619ee6 fix: tests for upstream changes
fix: new tests for chat and user management flows

fix: resolve lint errors

fix: configured lefthook to check entire project
2025-12-16 18:21:36 -08:00
rishikanthc
658a1a5c49 fix: lint errors in go code 2025-12-16 18:21:36 -08:00
rishikanthc
3bbcbcfd63 fix: responsive design 2025-12-15 13:36:12 -08:00
rishikanthc
7deff5b903 fix: change logo.. fix chat 2025-12-14 19:09:52 -08:00
rishikanthc
201b3b787c fix: remove timer-based job scanner that caused duplicate transcriptions
The jobScanner was running every 10 seconds and re-enqueueing jobs that
were already in the queue but hadn't started processing yet. This caused
completed files to be re-transcribed when auto-transcribe was enabled.

Changes:
- Removed jobScanner goroutine (10-second polling loop)
- Removed scanPendingJobs function
- Added recoverPendingJobs that runs ONCE at startup to recover
  any pending jobs left from previous server runs
- Jobs are now only enqueued when explicitly requested:
  - Upload with auto-transcribe enabled
  - Manual transcription start
  - Server restart recovery (one-time)
2025-12-14 19:09:52 -08:00