The HF token parameter is now optional at validation time since
it can be provided via the HF_TOKEN environment variable at runtime.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Previously, users had to enter their Hugging Face token in the UI
for every transcription job that used diarization. Now the token
can be set via the HF_TOKEN environment variable, which is
especially useful for Docker deployments.
Changes:
- Add HFToken to backend config (reads from HF_TOKEN env var)
- Update PyAnnote adapter to fall back to env var when no UI token
- Update WhisperX adapter to fall back to env var when no UI token
- Update documentation to clarify both configuration options
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add configurable voice activity detection thresholds to improve
speaker diarization accuracy for noisy or distant audio recordings.
- Add --segmentation-onset and --segmentation-offset CLI args to
pyannote_diarize.py
- Pass segmentation thresholds from Go adapter to Python script
- Map existing vad_onset/vad_offset params to Pyannote segmentation
- Add VAD Onset/Offset inputs to UI when Pyannote diarization is
selected (Whisper, Parakeet, Canary model families)
Lower onset values (0.3-0.4) help detect quieter/distant speakers.
Lower offset values improve detection of speech endings.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The selection menu's "Listen" button wasn't working in Timeline View because
the character-to-timestamp mapping was incorrectly counting text from timestamp
and speaker name elements.
Changes:
- Add data-transcript-text attribute to transcript text containers
- Update TreeWalker in useSelectionMenu to only count text inside these marked elements
This fixes the character index calculation so word timestamps are correctly looked up.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Match the title/controls section styling to the audio player below
with glass-card, rounded corners, border, shadow, and padding.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Make title, chat button, and settings dropdown sticky so users can
toggle auto-scroll without pausing playback. Wraps both the title
section and audio player in a single sticky container.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The "Auto Scroll On" feature was broken because it relied on a word-level ref
that was never assigned. This fix implements segment-level auto-scroll for
Timeline View.
Changes:
- Enable autoScrollEnabled prop usage in TranscriptView
- Add activeSegmentIndex computation to track current playback position
- Add auto-scroll effect that scrolls to active segment on segment change
- Add subtle background highlight to indicate the currently playing segment
The auto-scroll only triggers when:
- Mode is 'expanded' (Timeline View)
- Auto-scroll is enabled
- Audio is playing
- The segment actually changes (debounced to prevent jitter)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
After renaming speakers in Timeline View, the changes now appear immediately
in both the transcript display and downloads (JSON, TXT, SRT).
Root cause: The onSpeakerMappingsUpdate callback was a no-op, so the React
Query cache wasn't being invalidated after saving speaker mappings.
Fix: Invalidate the speakerMappings cache when the dialog saves, triggering
an automatic refetch that updates all components using the hook.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add TempDir field to Config struct to read TEMP_DIR env var
- Update NewUnifiedTranscriptionService to accept tempDir and outputDir parameters
- Remove hardcoded "data/temp" and "data/transcripts" paths from unified service
- Update NewUnifiedJobProcessor to pass directory paths from config
- Update main.go to use cfg.TempDir and cfg.TranscriptsDir
- Update all test files to use new function signatures
- Fix database.go to use directory from DATABASE_PATH instead of hardcoded "data/"
Expands language selection from 24 to 58 languages for Whisper and OpenAI transcription profiles.
Changes:
- Expand LANGUAGES array to 58 languages (all with WER >50%)
- Add 34 new languages including Afrikaans, Armenian, Czech, Danish, Hungarian, Norwegian, Romanian, Serbian, Slovak, Thai, and many more
- Create VOXTRAL_LANGUAGES array with original 24-language subset for Voxtral
- Update VoxtralConfig to use VOXTRAL_LANGUAGES instead of LANGUAGES
- All languages alphabetically sorted
Language array usage:
- LANGUAGES (58) → Whisper and OpenAI models
- VOXTRAL_LANGUAGES (24) → Voxtral model
- CANARY_LANGUAGES (4) → NVIDIA Canary model
Updated make docs to generate swagger.json to both api-docs/ and
web/project-site/public/api/ to match CI workflow behavior.
This fixes CI failures where the project site swagger.json was out
of sync with code changes (max_new_tokens field for Voxtral).
- Auto-installs Air if not found (with GOPATH/bin PATH handling)
- Creates placeholder files for Go embed directive in dev mode
- Starts backend with Air live reload (or falls back to go run)
- Starts frontend with Vite HMR
- Handles cleanup on Ctrl+C/SIGTERM
- Removed dev.sh in favor of unified Makefile command
- Check for presence of word_segments in transcript data
- Show disabled menu item with explanation when timestamps unavailable
- Applies to Voxtral and other models without word-level timestamps
- Add FamilyMistralVoxtral and ModelVoxtral constants
- Add case for Voxtral in selectModels switch statement
- Add convertToVoxtralParams function for parameter conversion
- Add MaxNewTokens field to WhisperXParams model
- Map language and max_new_tokens parameters correctly
- Fix parameter name in buffered script (output_path -> output_file)
- Add mistral-common dependency to pyproject.toml
- Check for both VoxtralForConditionalGeneration AND mistral_common
On next server restart, the environment will be re-synced automatically
to install the missing mistral-common dependency.
- Create voxtral_transcribe_buffered.py for audio > 30 minutes
- Split audio into 25-minute chunks for processing
- Automatically detect long audio and use buffered mode
- Concatenate text results from all chunks
- No timestamp adjustment needed (text-only model)
- Handles unlimited audio length via chunking
- Default increased from 500 to 4096 tokens
- Maximum increased from 2000 to 8192 tokens
- Minimum increased from 100 to 512 tokens
- Add max_new_tokens to TypeScript interface
- Fix UI to use correct parameter (was using max_line_width)
- Add VoxtralAdapter using transformers library with direct model loading
- Add Python transcription script with apply_transcription_request() method
- Register Voxtral adapter in main.go with dedicated environment
- Add UI configuration in TranscriptionConfigDialog with warning banner
- Support multilingual transcription without word-level timestamps
- Auto GPU/CPU detection, no device parameter needed
- Graceful degradation for missing timestamp features
Voxtral provides high-quality text-only transcription but does not
support word-level timestamps. UI warns users that synchronized
playback and seek features won't be available.
- Create env directory in copy script functions before writing
- Fixes initialization errors for Parakeet, Canary, and Sortformer adapters
- Update Makefile to use web/project-site for website commands
- Add build target to Makefile for building Scriberr binary
Use Chrome/Edge's 'remote-only' echo cancellation mode to allow
microphone input during local system audio playback while still
preventing acoustic echo from remote sources in video calls
Removed Firefox/Safari support as only Chromium browsers (Chrome, Edge, Brave)
reliably support tab audio capture via getDisplayMedia API.
Changes:
- Added Chromium browser detection (Chrome, Edge, Brave, Chromium)
- Show compatibility error dialog for non-Chromium browsers
- Removed all Firefox-specific code and constraints
- Simplified UI instructions (tab selection only)
- Cleaner error messages focused on tab audio
Tested working on: Chrome, Edge, Brave
Not supported: Firefox, Safari, other browsers
Implements Screen Capture API based system audio recording for meeting recordings.
Works on Chrome/Edge with tab audio capture.
Features:
- Client-side audio mixing (system audio + microphone) using Web Audio API
- Real-time volume controls via GainNode
- Simple timer-based recording (no visualization complexity)
- Echo cancellation enabled for microphone to prevent feedback loops
- Browser compatibility checks
- Graceful error handling for permissions and stream interruptions
Technical details:
- Uses getDisplayMedia() for system audio capture (requires video=true, immediately stopped)
- getUserMedia() for microphone with echo cancellation
- MediaRecorder for direct recording without WaveSurfer dependency
- Cyan/blue themed UI to differentiate from regular microphone recording
Tested and working on Chrome. Firefox support needs investigation (v146.0.1).
Add support for NVIDIA RTX 50-series GPUs (Blackwell architecture) which
require CUDA 12.8+ and PyTorch cu128 wheels due to the new sm_120 compute
capability.
Changes:
- Add configurable PYTORCH_CUDA_VERSION environment variable to control
PyTorch wheel version at runtime (cu126 for legacy, cu128 for Blackwell)
- Update all model adapters to use dynamic CUDA version instead of
hardcoded cu126 URLs
- Update Dockerfile.cuda.12.9 for Blackwell with CUDA 12.9.1 base image,
PYTORCH_CUDA_VERSION=cu128, and missing WHISPERX_ENV/yt-dlp
- Update Dockerfile.cuda with explicit PYTORCH_CUDA_VERSION=cu126
- Add docker-compose.blackwell.yml for pre-built Blackwell image
- Add docker-compose.build.blackwell.yml for local Blackwell builds
- Add GPU compatibility documentation to README
Fixes: rishikanthc/Scriberr#104