torchcodec>=0.6.0 (upstream default) resolves to 0.10.0+ which requires
PyTorch 2.9. Scriberr ships PyTorch 2.8.x, causing a C++ ABI symbol
mismatch at load time. Pin to ~=0.7.0, the last release compatible with
PyTorch 2.8.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The override-dependencies key was placed after [tool.uv.sources], causing
it to be parsed as tool.uv.sources.override-dependencies instead of
tool.uv.override-dependencies. uv would silently ignore it, meaning
torchcodec was never actually excluded on Linux aarch64.
https://claude.ai/code/session_01YMyUwpk577EradV93tMMqS
- Default sortformer output format to json; RTTM path fails silently
on NeMo annotation objects, producing zero diarization segments
- Exclude torchcodec on Linux aarch64 via uv platform marker; no
wheels exist for any torchcodec version on manylinux aarch64, causing
pyannote environment setup to fail entirely on ARM64 Docker
- Add diarization model selector to WhisperX config UI; Parakeet and
Canary sections already had this but WhisperX was missing it, making
it impossible to select nvidia_sortformer as the diarization backend
https://claude.ai/code/session_01YMyUwpk577EradV93tMMqS
All six test functions now have // TestFoo verifies... comments
matching the project's existing convention.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Switch from raw t.Errorf to testify/assert for consistency with the
rest of the codebase. Use t.Setenv() instead of manual os.Setenv/defer
os.Unsetenv for automatic cleanup. Simplify table structs where min
and max are always equal.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The QUEUE_WORKERS environment variable was defined and read in
getOptimalWorkerCount(), but NewTaskQueue() unconditionally overwrote
the result with the hardcoded legacyWorkers parameter (always 2).
This made QUEUE_WORKERS effectively dead code.
Now legacyWorkers is only used as a fallback when QUEUE_WORKERS is
not set, preserving the default of 2 workers while allowing users
to control concurrency via the environment variable.
Set QUEUE_WORKERS=1 to serialize all transcription jobs and prevent
system overload during bulk uploads.
Fixes: rishikanthc/Scriberr#379
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add tests verifying that getOptimalWorkerCount() respects the
QUEUE_WORKERS environment variable and that NewTaskQueue() should
allow QUEUE_WORKERS to override the hardcoded legacy worker count.
Includes a failing test (TestNewTaskQueue_EnvOverridesLegacy) that
reproduces the bug where QUEUE_WORKERS is always overridden by the
hardcoded legacyWorkers parameter.
Ref: rishikanthc/Scriberr#379
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add option to include speaker labels in summary prompts when diarization
is available. When enabled, transcripts are formatted as:
[SPEAKER_NAME] Text here...
The prompt also includes a hint to the LLM that speaker labels are present,
helping it produce summaries that attribute statements to specific speakers.
Changes:
- Add IncludeSpeakerInfo field to SummaryTemplate model
- Add toggle UI in summary template dialog
- Format transcript with speaker labels when generating summary
- Update prompt prefix to indicate speaker labels are present
Closes#353🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The HF token parameter is now optional at validation time since
it can be provided via the HF_TOKEN environment variable at runtime.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Previously, users had to enter their Hugging Face token in the UI
for every transcription job that used diarization. Now the token
can be set via the HF_TOKEN environment variable, which is
especially useful for Docker deployments.
Changes:
- Add HFToken to backend config (reads from HF_TOKEN env var)
- Update PyAnnote adapter to fall back to env var when no UI token
- Update WhisperX adapter to fall back to env var when no UI token
- Update documentation to clarify both configuration options
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add configurable voice activity detection thresholds to improve
speaker diarization accuracy for noisy or distant audio recordings.
- Add --segmentation-onset and --segmentation-offset CLI args to
pyannote_diarize.py
- Pass segmentation thresholds from Go adapter to Python script
- Map existing vad_onset/vad_offset params to Pyannote segmentation
- Add VAD Onset/Offset inputs to UI when Pyannote diarization is
selected (Whisper, Parakeet, Canary model families)
Lower onset values (0.3-0.4) help detect quieter/distant speakers.
Lower offset values improve detection of speech endings.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add TempDir field to Config struct to read TEMP_DIR env var
- Update NewUnifiedTranscriptionService to accept tempDir and outputDir parameters
- Remove hardcoded "data/temp" and "data/transcripts" paths from unified service
- Update NewUnifiedJobProcessor to pass directory paths from config
- Update main.go to use cfg.TempDir and cfg.TranscriptsDir
- Update all test files to use new function signatures
- Fix database.go to use directory from DATABASE_PATH instead of hardcoded "data/"
- Add FamilyMistralVoxtral and ModelVoxtral constants
- Add case for Voxtral in selectModels switch statement
- Add convertToVoxtralParams function for parameter conversion
- Add MaxNewTokens field to WhisperXParams model
- Map language and max_new_tokens parameters correctly
- Fix parameter name in buffered script (output_path -> output_file)
- Add mistral-common dependency to pyproject.toml
- Check for both VoxtralForConditionalGeneration AND mistral_common
On next server restart, the environment will be re-synced automatically
to install the missing mistral-common dependency.
- Create voxtral_transcribe_buffered.py for audio > 30 minutes
- Split audio into 25-minute chunks for processing
- Automatically detect long audio and use buffered mode
- Concatenate text results from all chunks
- No timestamp adjustment needed (text-only model)
- Handles unlimited audio length via chunking
- Default increased from 500 to 4096 tokens
- Maximum increased from 2000 to 8192 tokens
- Minimum increased from 100 to 512 tokens
- Add max_new_tokens to TypeScript interface
- Fix UI to use correct parameter (was using max_line_width)
- Add VoxtralAdapter using transformers library with direct model loading
- Add Python transcription script with apply_transcription_request() method
- Register Voxtral adapter in main.go with dedicated environment
- Add UI configuration in TranscriptionConfigDialog with warning banner
- Support multilingual transcription without word-level timestamps
- Auto GPU/CPU detection, no device parameter needed
- Graceful degradation for missing timestamp features
Voxtral provides high-quality text-only transcription but does not
support word-level timestamps. UI warns users that synchronized
playback and seek features won't be available.
- Create env directory in copy script functions before writing
- Fixes initialization errors for Parakeet, Canary, and Sortformer adapters
- Update Makefile to use web/project-site for website commands
- Add build target to Makefile for building Scriberr binary
Add support for NVIDIA RTX 50-series GPUs (Blackwell architecture) which
require CUDA 12.8+ and PyTorch cu128 wheels due to the new sm_120 compute
capability.
Changes:
- Add configurable PYTORCH_CUDA_VERSION environment variable to control
PyTorch wheel version at runtime (cu126 for legacy, cu128 for Blackwell)
- Update all model adapters to use dynamic CUDA version instead of
hardcoded cu126 URLs
- Update Dockerfile.cuda.12.9 for Blackwell with CUDA 12.9.1 base image,
PYTORCH_CUDA_VERSION=cu128, and missing WHISPERX_ENV/yt-dlp
- Update Dockerfile.cuda with explicit PYTORCH_CUDA_VERSION=cu126
- Add docker-compose.blackwell.yml for pre-built Blackwell image
- Add docker-compose.build.blackwell.yml for local Blackwell builds
- Add GPU compatibility documentation to README
Fixes: rishikanthc/Scriberr#104
- Change cookie SameSite policy from Strict to Lax (Strict blocks media subresources on mobile)
- Decouple Secure cookie flag from APP_ENV:
- Add SECURE_COOKIES config (defaults to true in prod, but can be overridden)
- Allows testing production builds over HTTP (home network)
- Increase gocyclo threshold to 25 to accommodate complex handlers
- Fix refresh token cookie Secure flag bug (was hardcoded to false)
- Wire up AllowedOrigins config in CORS middleware (router, handlers, chat, SSE)
- Add APP_ENV=production to Dockerfile and Dockerfile.cuda
- Update all docker-compose files with APP_ENV and ALLOWED_ORIGINS examples
- CORS now validates origins in production, allows all in development
- Increase gocyclo threshold from 20 to 25 for complex handlers
Phase 5: Refactor queue.go (10 DB calls removed)
- Added JobRepository to TaskQueue struct and constructor
- Added UpdateStatus, UpdateError, FindByStatus, CountByStatus methods to JobRepository
- Replaced all database.DB calls with repository methods
Phase 6: Refactor chat_handlers.go and summarize_handlers.go (6 DB calls removed)
- Added GetMessageCountsBySessionIDs and GetLastMessagesBySessionIDs to ChatRepository
- Added UpdateSummary to JobRepository
- Replaced batch queries and update calls with repository methods
- Removed database import from both files
Phase 7: Refactor quick_transcription.go (3 DB calls removed)
- Added JobRepository injection to QuickTranscriptionService
- Updated constructor and all callers
Summary: 46+ database.DB calls replaced with repository methods across 7 phases.
All tests pass, build succeeds.
Phase 1: Define interfaces
- Created internal/interfaces/ package with AuthServiceInterface, TaskQueueInterface, JobProcessorInterface
Phase 2: Refactor handlers.go (21 DB calls removed)
- Replaced all database.DB calls with repository methods
- Added RefreshTokenRepository for token management
- Added new repository methods: Count, FindActiveTrackJobs, FindLatestCompletedExecution, FindByName
Phase 3: Refactor dropzone.go (3 DB calls removed)
- Added CountWithAutoTranscription to UserRepository
- Injected JobRepository and UserRepository into Service
Phase 4: Refactor multitrack_processor.go
- Changed constructor to accept *gorm.DB and JobRepository
- Updated Handler to inject MultiTrackProcessor
Updated all test files with new dependencies and mock implementations.
The jobScanner was running every 10 seconds and re-enqueueing jobs that
were already in the queue but hadn't started processing yet. This caused
completed files to be re-transcribed when auto-transcribe was enabled.
Changes:
- Removed jobScanner goroutine (10-second polling loop)
- Removed scanPendingJobs function
- Added recoverPendingJobs that runs ONCE at startup to recover
any pending jobs left from previous server runs
- Jobs are now only enqueued when explicitly requested:
- Upload with auto-transcribe enabled
- Manual transcription start
- Server restart recovery (one-time)