Commit Graph

800 Commits

Author SHA1 Message Date
Peter Somlo
93abf6eb21 feat: expand language support in the UI to 58 languages for Whisper and OpenAI models
Expands language selection from 24 to 58 languages for Whisper and OpenAI transcription profiles.

Changes:
- Expand LANGUAGES array to 58 languages (all with WER >50%)
- Add 34 new languages including Afrikaans, Armenian, Czech, Danish, Hungarian, Norwegian, Romanian, Serbian, Slovak, Thai, and many more
- Create VOXTRAL_LANGUAGES array with original 24-language subset for Voxtral
- Update VoxtralConfig to use VOXTRAL_LANGUAGES instead of LANGUAGES
- All languages alphabetically sorted

Language array usage:
- LANGUAGES (58) → Whisper and OpenAI models
- VOXTRAL_LANGUAGES (24) → Voxtral model
- CANARY_LANGUAGES (4) → NVIDIA Canary model
2026-01-01 12:47:41 -08:00
rishikanthc
b8fd360ca2 fix: streamline API docs generation to sync both locations
Updated make docs to generate swagger.json to both api-docs/ and
web/project-site/public/api/ to match CI workflow behavior.

This fixes CI failures where the project site swagger.json was out
of sync with code changes (max_new_tokens field for Voxtral).
2025-12-31 16:03:33 -08:00
rishikanthc
f9a58baa1e clean lint 2025-12-31 15:53:35 -08:00
rishikanthc
0248b01cbd update docs 2025-12-31 15:47:19 -08:00
rishikanthc
73a82b9f6b fix auto device detection in voxtral 2025-12-31 15:47:19 -08:00
rishikanthc
97eb45ea67 feat: add 'make dev' command to replace dev.sh script
- Auto-installs Air if not found (with GOPATH/bin PATH handling)
- Creates placeholder files for Go embed directive in dev mode
- Starts backend with Air live reload (or falls back to go run)
- Starts frontend with Vite HMR
- Handles cleanup on Ctrl+C/SIGTERM
- Removed dev.sh in favor of unified Makefile command
2025-12-31 15:47:19 -08:00
rishikanthc
efff1a3a7c fix: use Literata font for all transcripts
Changed from font-inter to font-literata to ensure consistent
typography across all transcript views regardless of model used.
2025-12-31 15:47:19 -08:00
rishikanthc
f08504eaa3 fix: disable timeline view for transcripts without word-level timestamps
- Check for presence of word_segments in transcript data
- Show disabled menu item with explanation when timestamps unavailable
- Applies to Voxtral and other models without word-level timestamps
2025-12-31 15:47:19 -08:00
rishikanthc
ad3053cc9b fix: add Voxtral model selection and fix dependencies
- Add FamilyMistralVoxtral and ModelVoxtral constants
- Add case for Voxtral in selectModels switch statement
- Add convertToVoxtralParams function for parameter conversion
- Add MaxNewTokens field to WhisperXParams model
- Map language and max_new_tokens parameters correctly
- Fix parameter name in buffered script (output_path -> output_file)
- Add mistral-common dependency to pyproject.toml
- Check for both VoxtralForConditionalGeneration AND mistral_common

On next server restart, the environment will be re-synced automatically
to install the missing mistral-common dependency.
2025-12-31 15:47:19 -08:00
rishikanthc
56c540da36 forgot to commit removal of old project site 2025-12-31 15:47:19 -08:00
rishikanthc
1485b01488 feat: add buffered transcription for Voxtral to handle long audio
- Create voxtral_transcribe_buffered.py for audio > 30 minutes
- Split audio into 25-minute chunks for processing
- Automatically detect long audio and use buffered mode
- Concatenate text results from all chunks
- No timestamp adjustment needed (text-only model)
- Handles unlimited audio length via chunking
2025-12-31 15:47:19 -08:00
rishikanthc
5a947e8739 fix: update Voxtral token limits based on 32k context window
- Default: 4096 → 8192 tokens
- Maximum: 8192 → 16384 tokens
- Minimum: 512 → 1024 tokens
- Voxtral has 32k context window, handles 30-40 min audio
- Updated UI description to reflect capabilities
2025-12-31 15:47:19 -08:00
rishikanthc
95ecbf6d21 fix: increase Voxtral max_new_tokens to 4096 (max 8192)
- Default increased from 500 to 4096 tokens
- Maximum increased from 2000 to 8192 tokens
- Minimum increased from 100 to 512 tokens
- Add max_new_tokens to TypeScript interface
- Fix UI to use correct parameter (was using max_line_width)
2025-12-31 15:47:19 -08:00
rishikanthc
1ae7b2bf71 feat: add Voxtral-mini transcription support
- Add VoxtralAdapter using transformers library with direct model loading
- Add Python transcription script with apply_transcription_request() method
- Register Voxtral adapter in main.go with dedicated environment
- Add UI configuration in TranscriptionConfigDialog with warning banner
- Support multilingual transcription without word-level timestamps
- Auto GPU/CPU detection, no device parameter needed
- Graceful degradation for missing timestamp features

Voxtral provides high-quality text-only transcription but does not
support word-level timestamps. UI warns users that synchronized
playback and seek features won't be available.
2025-12-31 15:47:19 -08:00
rishikanthc
923b39e415 fix: ensure directories exist before writing adapter scripts
- Create env directory in copy script functions before writing
- Fixes initialization errors for Parakeet, Canary, and Sortformer adapters
- Update Makefile to use web/project-site for website commands
- Add build target to Makefile for building Scriberr binary
2025-12-31 15:47:19 -08:00
rishikanthc
5e5dc17a13 fix colors and styles 2025-12-29 21:11:47 -08:00
rishikanthc
c433db07b7 feat: add toggle for Automatic Gain Control
Allow users to enable/disable AGC before starting recording.
AGC automatically adjusts microphone volume for consistent levels.
2025-12-29 21:11:47 -08:00
rishikanthc
ca2ed2fd72 fix: use remote-only echo cancellation for microphone
Use Chrome/Edge's 'remote-only' echo cancellation mode to allow
microphone input during local system audio playback while still
preventing acoustic echo from remote sources in video calls
2025-12-29 21:11:47 -08:00
rishikanthc
76d92e2055 style: apply brand gradient to Upload Recording button 2025-12-29 21:11:47 -08:00
rishikanthc
8ca2b5ba2b refactor: use consistent design system in SystemAudioRecorder
- Replace hardcoded colors with CSS variables
- Match button design with transcription settings
- Apply brand gradient to Start Recording button
2025-12-29 21:11:47 -08:00
rishikanthc
b13d1b360d refactor: use default button components in SystemAudioRecorder 2025-12-29 21:11:47 -08:00
rishikanthc
eb6192960f refactor: restrict system audio to Chromium browsers only
Removed Firefox/Safari support as only Chromium browsers (Chrome, Edge, Brave)
reliably support tab audio capture via getDisplayMedia API.

Changes:
- Added Chromium browser detection (Chrome, Edge, Brave, Chromium)
- Show compatibility error dialog for non-Chromium browsers
- Removed all Firefox-specific code and constraints
- Simplified UI instructions (tab selection only)
- Cleaner error messages focused on tab audio

Tested working on: Chrome, Edge, Brave
Not supported: Firefox, Safari, other browsers
2025-12-29 21:11:47 -08:00
rishikanthc
f5379464f6 feat: add system audio recording with microphone mixing
Implements Screen Capture API based system audio recording for meeting recordings.
Works on Chrome/Edge with tab audio capture.

Features:
- Client-side audio mixing (system audio + microphone) using Web Audio API
- Real-time volume controls via GainNode
- Simple timer-based recording (no visualization complexity)
- Echo cancellation enabled for microphone to prevent feedback loops
- Browser compatibility checks
- Graceful error handling for permissions and stream interruptions

Technical details:
- Uses getDisplayMedia() for system audio capture (requires video=true, immediately stopped)
- getUserMedia() for microphone with echo cancellation
- MediaRecorder for direct recording without WaveSurfer dependency
- Cyan/blue themed UI to differentiate from regular microphone recording

Tested and working on Chrome. Firefox support needs investigation (v146.0.1).
2025-12-29 21:11:47 -08:00
rishikanthc
2afd6a1ecf fixes #317 2025-12-29 21:11:47 -08:00
Paul Irish
0029078b8a project site 2025-12-29 21:10:13 -08:00
Paul Irish
9975e6fb02 fix duplicated openapi annotations pt 2 2025-12-29 21:10:13 -08:00
Paul Irish
a7aaf06bbb fix duplicated openapi annotations 2025-12-29 21:10:13 -08:00
Paul Irish
ab912a6b6e always copy scripts 2025-12-29 21:09:53 -08:00
Paul Irish
7471a2a1b6 Add test suite for python adapter scripts 2025-12-29 21:09:53 -08:00
Paul Irish
50dd4130ff Extract python adapter scripts to proper files 2025-12-29 21:09:53 -08:00
Paul Irish
edb65339b8 dont blank on vite startup 2025-12-29 21:09:53 -08:00
Paul Irish
d013fe288a build: adopt gotestsum for go test output formatting 2025-12-26 20:40:52 -08:00
Fran Fitzpatrick
8f537548d4 feat: add RTX 5090 Blackwell GPU support (sm_120)
Add support for NVIDIA RTX 50-series GPUs (Blackwell architecture) which
require CUDA 12.8+ and PyTorch cu128 wheels due to the new sm_120 compute
capability.

Changes:
- Add configurable PYTORCH_CUDA_VERSION environment variable to control
  PyTorch wheel version at runtime (cu126 for legacy, cu128 for Blackwell)
- Update all model adapters to use dynamic CUDA version instead of
  hardcoded cu126 URLs
- Update Dockerfile.cuda.12.9 for Blackwell with CUDA 12.9.1 base image,
  PYTORCH_CUDA_VERSION=cu128, and missing WHISPERX_ENV/yt-dlp
- Update Dockerfile.cuda with explicit PYTORCH_CUDA_VERSION=cu126
- Add docker-compose.blackwell.yml for pre-built Blackwell image
- Add docker-compose.build.blackwell.yml for local Blackwell builds
- Add GPU compatibility documentation to README

Fixes: rishikanthc/Scriberr#104
2025-12-24 14:46:44 -08:00
Paul Irish
718cb74b70 simpler name of job 2025-12-21 08:40:49 -08:00
Paul Irish
64953f9dde only on main and PRs 2025-12-21 08:40:49 -08:00
Paul Irish
4f75db3856 cleaner 2025-12-21 08:40:49 -08:00
Paul Irish
57127b6ec6 Revert "fix lint in TOC"
This reverts commit 15c919e327.
2025-12-21 08:40:49 -08:00
Paul Irish
2db72409da fix lint in TOC 2025-12-21 08:40:49 -08:00
Paul Irish
5d11f318d5 any 2025-12-21 08:40:49 -08:00
Paul Irish
03c8f76a1d flesh it out 2025-12-21 08:40:49 -08:00
Paul Irish
007b344f60 ci: add basic build and test workflow 2025-12-21 08:40:49 -08:00
Paul Irish
ff41bd7dc6 drop the any 2025-12-21 08:38:42 -08:00
Paul Irish
410e6ea91b add speaker dialog to download menu 2025-12-21 08:38:42 -08:00
rishikanthc
9328215a2e m 2025-12-19 09:48:53 -08:00
rishikanthc
3ff2136d19 docs: add LLM disclosure section to README.md 2025-12-19 09:31:14 -08:00
rishikanthc
bab12bfe39 docs: update installation for PUID/PGID and add troubleshooting section
- Update Docker Compose files to default PUID/PGID to 1000
- Add note about SECURE_COOKIES for non-SSL access in README and project site
- Create dedicated Troubleshooting page in documentation site
- Synchronize permissions documentation across all platforms
2025-12-19 09:24:22 -08:00
rishikanthc
eac630e494 fix: add Features page to docs sidebar navigation 2025-12-17 13:24:17 -08:00
rishikanthc
becfd0ad0f fix compose v1.2.0 2025-12-17 11:41:07 -08:00
rishikanthc
38c8b69f3b feat: add elegant sponsor segment to homepage 2025-12-17 11:37:20 -08:00
rishikanthc
069cc7e0ce docs: update site routing and navigation to use Diarization page 2025-12-17 11:31:50 -08:00