torchcodec>=0.6.0 (upstream default) resolves to 0.10.0+ which requires
PyTorch 2.9. Scriberr ships PyTorch 2.8.x, causing a C++ ABI symbol
mismatch at load time. Pin to ~=0.7.0, the last release compatible with
PyTorch 2.8.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The override-dependencies key was placed after [tool.uv.sources], causing
it to be parsed as tool.uv.sources.override-dependencies instead of
tool.uv.override-dependencies. uv would silently ignore it, meaning
torchcodec was never actually excluded on Linux aarch64.
https://claude.ai/code/session_01YMyUwpk577EradV93tMMqS
- Default sortformer output format to json; RTTM path fails silently
on NeMo annotation objects, producing zero diarization segments
- Exclude torchcodec on Linux aarch64 via uv platform marker; no
wheels exist for any torchcodec version on manylinux aarch64, causing
pyannote environment setup to fail entirely on ARM64 Docker
- Add diarization model selector to WhisperX config UI; Parakeet and
Canary sections already had this but WhisperX was missing it, making
it impossible to select nvidia_sortformer as the diarization backend
https://claude.ai/code/session_01YMyUwpk577EradV93tMMqS
All six test functions now have // TestFoo verifies... comments
matching the project's existing convention.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Switch from raw t.Errorf to testify/assert for consistency with the
rest of the codebase. Use t.Setenv() instead of manual os.Setenv/defer
os.Unsetenv for automatic cleanup. Simplify table structs where min
and max are always equal.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The QUEUE_WORKERS environment variable was defined and read in
getOptimalWorkerCount(), but NewTaskQueue() unconditionally overwrote
the result with the hardcoded legacyWorkers parameter (always 2).
This made QUEUE_WORKERS effectively dead code.
Now legacyWorkers is only used as a fallback when QUEUE_WORKERS is
not set, preserving the default of 2 workers while allowing users
to control concurrency via the environment variable.
Set QUEUE_WORKERS=1 to serialize all transcription jobs and prevent
system overload during bulk uploads.
Fixes: rishikanthc/Scriberr#379
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add tests verifying that getOptimalWorkerCount() respects the
QUEUE_WORKERS environment variable and that NewTaskQueue() should
allow QUEUE_WORKERS to override the hardcoded legacy worker count.
Includes a failing test (TestNewTaskQueue_EnvOverridesLegacy) that
reproduces the bug where QUEUE_WORKERS is always overridden by the
hardcoded legacyWorkers parameter.
Ref: rishikanthc/Scriberr#379
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
needed because I was adding a new SpeakerSettings component but
the useAuth hook triggered an infinite recusion bug because of the
window.fetch wrappings.
Add option to include speaker labels in summary prompts when diarization
is available. When enabled, transcripts are formatted as:
[SPEAKER_NAME] Text here...
The prompt also includes a hint to the LLM that speaker labels are present,
helping it produce summaries that attribute statements to specific speakers.
Changes:
- Add IncludeSpeakerInfo field to SummaryTemplate model
- Add toggle UI in summary template dialog
- Format transcript with speaker labels when generating summary
- Update prompt prefix to indicate speaker labels are present
Closes#353🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This fixes an issue where the frontend would spam the auth endpoints repeatedly when logged out or when a session expired.
1. Infinite Recursion on 401: The window.fetch wrapper would catch a 401, call tryRefresh(), which then called fetch() again, triggering the wrapper recursively if the refresh also failed. We now use the original fetch for refresh attempts and exclude auth endpoints from auto-refresh logic.
2. Multiple Wrapper Layers: Since useAuth is a hook used by many components, multiple instances were independently wrapping window.fetch. We now store the original fetch globally and ensure wrapping only happens once.
The HF token parameter is now optional at validation time since
it can be provided via the HF_TOKEN environment variable at runtime.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Previously, users had to enter their Hugging Face token in the UI
for every transcription job that used diarization. Now the token
can be set via the HF_TOKEN environment variable, which is
especially useful for Docker deployments.
Changes:
- Add HFToken to backend config (reads from HF_TOKEN env var)
- Update PyAnnote adapter to fall back to env var when no UI token
- Update WhisperX adapter to fall back to env var when no UI token
- Update documentation to clarify both configuration options
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add configurable voice activity detection thresholds to improve
speaker diarization accuracy for noisy or distant audio recordings.
- Add --segmentation-onset and --segmentation-offset CLI args to
pyannote_diarize.py
- Pass segmentation thresholds from Go adapter to Python script
- Map existing vad_onset/vad_offset params to Pyannote segmentation
- Add VAD Onset/Offset inputs to UI when Pyannote diarization is
selected (Whisper, Parakeet, Canary model families)
Lower onset values (0.3-0.4) help detect quieter/distant speakers.
Lower offset values improve detection of speech endings.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The selection menu's "Listen" button wasn't working in Timeline View because
the character-to-timestamp mapping was incorrectly counting text from timestamp
and speaker name elements.
Changes:
- Add data-transcript-text attribute to transcript text containers
- Update TreeWalker in useSelectionMenu to only count text inside these marked elements
This fixes the character index calculation so word timestamps are correctly looked up.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Match the title/controls section styling to the audio player below
with glass-card, rounded corners, border, shadow, and padding.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Make title, chat button, and settings dropdown sticky so users can
toggle auto-scroll without pausing playback. Wraps both the title
section and audio player in a single sticky container.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The "Auto Scroll On" feature was broken because it relied on a word-level ref
that was never assigned. This fix implements segment-level auto-scroll for
Timeline View.
Changes:
- Enable autoScrollEnabled prop usage in TranscriptView
- Add activeSegmentIndex computation to track current playback position
- Add auto-scroll effect that scrolls to active segment on segment change
- Add subtle background highlight to indicate the currently playing segment
The auto-scroll only triggers when:
- Mode is 'expanded' (Timeline View)
- Auto-scroll is enabled
- Audio is playing
- The segment actually changes (debounced to prevent jitter)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
After renaming speakers in Timeline View, the changes now appear immediately
in both the transcript display and downloads (JSON, TXT, SRT).
Root cause: The onSpeakerMappingsUpdate callback was a no-op, so the React
Query cache wasn't being invalidated after saving speaker mappings.
Fix: Invalidate the speakerMappings cache when the dialog saves, triggering
an automatic refetch that updates all components using the hook.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add TempDir field to Config struct to read TEMP_DIR env var
- Update NewUnifiedTranscriptionService to accept tempDir and outputDir parameters
- Remove hardcoded "data/temp" and "data/transcripts" paths from unified service
- Update NewUnifiedJobProcessor to pass directory paths from config
- Update main.go to use cfg.TempDir and cfg.TranscriptsDir
- Update all test files to use new function signatures
- Fix database.go to use directory from DATABASE_PATH instead of hardcoded "data/"
Expands language selection from 24 to 58 languages for Whisper and OpenAI transcription profiles.
Changes:
- Expand LANGUAGES array to 58 languages (all with WER >50%)
- Add 34 new languages including Afrikaans, Armenian, Czech, Danish, Hungarian, Norwegian, Romanian, Serbian, Slovak, Thai, and many more
- Create VOXTRAL_LANGUAGES array with original 24-language subset for Voxtral
- Update VoxtralConfig to use VOXTRAL_LANGUAGES instead of LANGUAGES
- All languages alphabetically sorted
Language array usage:
- LANGUAGES (58) → Whisper and OpenAI models
- VOXTRAL_LANGUAGES (24) → Voxtral model
- CANARY_LANGUAGES (4) → NVIDIA Canary model
Updated make docs to generate swagger.json to both api-docs/ and
web/project-site/public/api/ to match CI workflow behavior.
This fixes CI failures where the project site swagger.json was out
of sync with code changes (max_new_tokens field for Voxtral).
- Auto-installs Air if not found (with GOPATH/bin PATH handling)
- Creates placeholder files for Go embed directive in dev mode
- Starts backend with Air live reload (or falls back to go run)
- Starts frontend with Vite HMR
- Handles cleanup on Ctrl+C/SIGTERM
- Removed dev.sh in favor of unified Makefile command
- Check for presence of word_segments in transcript data
- Show disabled menu item with explanation when timestamps unavailable
- Applies to Voxtral and other models without word-level timestamps
- Add FamilyMistralVoxtral and ModelVoxtral constants
- Add case for Voxtral in selectModels switch statement
- Add convertToVoxtralParams function for parameter conversion
- Add MaxNewTokens field to WhisperXParams model
- Map language and max_new_tokens parameters correctly
- Fix parameter name in buffered script (output_path -> output_file)
- Add mistral-common dependency to pyproject.toml
- Check for both VoxtralForConditionalGeneration AND mistral_common
On next server restart, the environment will be re-synced automatically
to install the missing mistral-common dependency.
- Create voxtral_transcribe_buffered.py for audio > 30 minutes
- Split audio into 25-minute chunks for processing
- Automatically detect long audio and use buffered mode
- Concatenate text results from all chunks
- No timestamp adjustment needed (text-only model)
- Handles unlimited audio length via chunking
- Default increased from 500 to 4096 tokens
- Maximum increased from 2000 to 8192 tokens
- Minimum increased from 100 to 512 tokens
- Add max_new_tokens to TypeScript interface
- Fix UI to use correct parameter (was using max_line_width)
- Add VoxtralAdapter using transformers library with direct model loading
- Add Python transcription script with apply_transcription_request() method
- Register Voxtral adapter in main.go with dedicated environment
- Add UI configuration in TranscriptionConfigDialog with warning banner
- Support multilingual transcription without word-level timestamps
- Auto GPU/CPU detection, no device parameter needed
- Graceful degradation for missing timestamp features
Voxtral provides high-quality text-only transcription but does not
support word-level timestamps. UI warns users that synchronized
playback and seek features won't be available.
- Create env directory in copy script functions before writing
- Fixes initialization errors for Parakeet, Canary, and Sortformer adapters
- Update Makefile to use web/project-site for website commands
- Add build target to Makefile for building Scriberr binary
Use Chrome/Edge's 'remote-only' echo cancellation mode to allow
microphone input during local system audio playback while still
preventing acoustic echo from remote sources in video calls
Removed Firefox/Safari support as only Chromium browsers (Chrome, Edge, Brave)
reliably support tab audio capture via getDisplayMedia API.
Changes:
- Added Chromium browser detection (Chrome, Edge, Brave, Chromium)
- Show compatibility error dialog for non-Chromium browsers
- Removed all Firefox-specific code and constraints
- Simplified UI instructions (tab selection only)
- Cleaner error messages focused on tab audio
Tested working on: Chrome, Edge, Brave
Not supported: Firefox, Safari, other browsers
Implements Screen Capture API based system audio recording for meeting recordings.
Works on Chrome/Edge with tab audio capture.
Features:
- Client-side audio mixing (system audio + microphone) using Web Audio API
- Real-time volume controls via GainNode
- Simple timer-based recording (no visualization complexity)
- Echo cancellation enabled for microphone to prevent feedback loops
- Browser compatibility checks
- Graceful error handling for permissions and stream interruptions
Technical details:
- Uses getDisplayMedia() for system audio capture (requires video=true, immediately stopped)
- getUserMedia() for microphone with echo cancellation
- MediaRecorder for direct recording without WaveSurfer dependency
- Cyan/blue themed UI to differentiate from regular microphone recording
Tested and working on Chrome. Firefox support needs investigation (v146.0.1).