Expands language selection from 24 to 58 languages for Whisper and OpenAI transcription profiles.
Changes:
- Expand LANGUAGES array to 58 languages (all with WER >50%)
- Add 34 new languages including Afrikaans, Armenian, Czech, Danish, Hungarian, Norwegian, Romanian, Serbian, Slovak, Thai, and many more
- Create VOXTRAL_LANGUAGES array with original 24-language subset for Voxtral
- Update VoxtralConfig to use VOXTRAL_LANGUAGES instead of LANGUAGES
- All languages alphabetically sorted
Language array usage:
- LANGUAGES (58) → Whisper and OpenAI models
- VOXTRAL_LANGUAGES (24) → Voxtral model
- CANARY_LANGUAGES (4) → NVIDIA Canary model
Updated make docs to generate swagger.json to both api-docs/ and
web/project-site/public/api/ to match CI workflow behavior.
This fixes CI failures where the project site swagger.json was out
of sync with code changes (max_new_tokens field for Voxtral).
- Auto-installs Air if not found (with GOPATH/bin PATH handling)
- Creates placeholder files for Go embed directive in dev mode
- Starts backend with Air live reload (or falls back to go run)
- Starts frontend with Vite HMR
- Handles cleanup on Ctrl+C/SIGTERM
- Removed dev.sh in favor of unified Makefile command
- Check for presence of word_segments in transcript data
- Show disabled menu item with explanation when timestamps unavailable
- Applies to Voxtral and other models without word-level timestamps
- Add FamilyMistralVoxtral and ModelVoxtral constants
- Add case for Voxtral in selectModels switch statement
- Add convertToVoxtralParams function for parameter conversion
- Add MaxNewTokens field to WhisperXParams model
- Map language and max_new_tokens parameters correctly
- Fix parameter name in buffered script (output_path -> output_file)
- Add mistral-common dependency to pyproject.toml
- Check for both VoxtralForConditionalGeneration AND mistral_common
On next server restart, the environment will be re-synced automatically
to install the missing mistral-common dependency.
- Create voxtral_transcribe_buffered.py for audio > 30 minutes
- Split audio into 25-minute chunks for processing
- Automatically detect long audio and use buffered mode
- Concatenate text results from all chunks
- No timestamp adjustment needed (text-only model)
- Handles unlimited audio length via chunking
- Default increased from 500 to 4096 tokens
- Maximum increased from 2000 to 8192 tokens
- Minimum increased from 100 to 512 tokens
- Add max_new_tokens to TypeScript interface
- Fix UI to use correct parameter (was using max_line_width)
- Add VoxtralAdapter using transformers library with direct model loading
- Add Python transcription script with apply_transcription_request() method
- Register Voxtral adapter in main.go with dedicated environment
- Add UI configuration in TranscriptionConfigDialog with warning banner
- Support multilingual transcription without word-level timestamps
- Auto GPU/CPU detection, no device parameter needed
- Graceful degradation for missing timestamp features
Voxtral provides high-quality text-only transcription but does not
support word-level timestamps. UI warns users that synchronized
playback and seek features won't be available.
- Create env directory in copy script functions before writing
- Fixes initialization errors for Parakeet, Canary, and Sortformer adapters
- Update Makefile to use web/project-site for website commands
- Add build target to Makefile for building Scriberr binary
Use Chrome/Edge's 'remote-only' echo cancellation mode to allow
microphone input during local system audio playback while still
preventing acoustic echo from remote sources in video calls
Removed Firefox/Safari support as only Chromium browsers (Chrome, Edge, Brave)
reliably support tab audio capture via getDisplayMedia API.
Changes:
- Added Chromium browser detection (Chrome, Edge, Brave, Chromium)
- Show compatibility error dialog for non-Chromium browsers
- Removed all Firefox-specific code and constraints
- Simplified UI instructions (tab selection only)
- Cleaner error messages focused on tab audio
Tested working on: Chrome, Edge, Brave
Not supported: Firefox, Safari, other browsers
Implements Screen Capture API based system audio recording for meeting recordings.
Works on Chrome/Edge with tab audio capture.
Features:
- Client-side audio mixing (system audio + microphone) using Web Audio API
- Real-time volume controls via GainNode
- Simple timer-based recording (no visualization complexity)
- Echo cancellation enabled for microphone to prevent feedback loops
- Browser compatibility checks
- Graceful error handling for permissions and stream interruptions
Technical details:
- Uses getDisplayMedia() for system audio capture (requires video=true, immediately stopped)
- getUserMedia() for microphone with echo cancellation
- MediaRecorder for direct recording without WaveSurfer dependency
- Cyan/blue themed UI to differentiate from regular microphone recording
Tested and working on Chrome. Firefox support needs investigation (v146.0.1).
Add support for NVIDIA RTX 50-series GPUs (Blackwell architecture) which
require CUDA 12.8+ and PyTorch cu128 wheels due to the new sm_120 compute
capability.
Changes:
- Add configurable PYTORCH_CUDA_VERSION environment variable to control
PyTorch wheel version at runtime (cu126 for legacy, cu128 for Blackwell)
- Update all model adapters to use dynamic CUDA version instead of
hardcoded cu126 URLs
- Update Dockerfile.cuda.12.9 for Blackwell with CUDA 12.9.1 base image,
PYTORCH_CUDA_VERSION=cu128, and missing WHISPERX_ENV/yt-dlp
- Update Dockerfile.cuda with explicit PYTORCH_CUDA_VERSION=cu126
- Add docker-compose.blackwell.yml for pre-built Blackwell image
- Add docker-compose.build.blackwell.yml for local Blackwell builds
- Add GPU compatibility documentation to README
Fixes: rishikanthc/Scriberr#104
- Update Docker Compose files to default PUID/PGID to 1000
- Add note about SECURE_COOKIES for non-SSL access in README and project site
- Create dedicated Troubleshooting page in documentation site
- Synchronize permissions documentation across all platforms