809 Commits

Author SHA1 Message Date
Fran Fitzpatrick
850af1fb6e test: update PyAnnote test to reflect optional HF token
The HF token parameter is now optional at validation time since
it can be provided via the HF_TOKEN environment variable at runtime.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 12:23:27 -08:00
Fran Fitzpatrick
ff12270419 feat: add HF_TOKEN environment variable fallback for diarization
Previously, users had to enter their Hugging Face token in the UI
for every transcription job that used diarization. Now the token
can be set via the HF_TOKEN environment variable, which is
especially useful for Docker deployments.

Changes:
- Add HFToken to backend config (reads from HF_TOKEN env var)
- Update PyAnnote adapter to fall back to env var when no UI token
- Update WhisperX adapter to fall back to env var when no UI token
- Update documentation to clarify both configuration options

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 12:23:27 -08:00
Fran Fitzpatrick
f6df31b500 feat: add VAD segmentation thresholds for Pyannote diarization
Add configurable voice activity detection thresholds to improve
speaker diarization accuracy for noisy or distant audio recordings.

- Add --segmentation-onset and --segmentation-offset CLI args to
  pyannote_diarize.py
- Pass segmentation thresholds from Go adapter to Python script
- Map existing vad_onset/vad_offset params to Pyannote segmentation
- Add VAD Onset/Offset inputs to UI when Pyannote diarization is
  selected (Whisper, Parakeet, Canary model families)

Lower onset values (0.3-0.4) help detect quieter/distant speakers.
Lower offset values improve detection of speech endings.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 12:20:31 -08:00
Fran Fitzpatrick
f8c0c6759d fix: Listen button in selection menu now works in Timeline View
The selection menu's "Listen" button wasn't working in Timeline View because
the character-to-timestamp mapping was incorrectly counting text from timestamp
and speaker name elements.

Changes:
- Add data-transcript-text attribute to transcript text containers
- Update TreeWalker in useSelectionMenu to only count text inside these marked elements

This fixes the character index calculation so word timestamps are correctly looked up.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 12:20:07 -08:00
Fran Fitzpatrick
8c3f345cee style: add glass-card styling to sticky title section
Match the title/controls section styling to the audio player below
with glass-card, rounded corners, border, shadow, and padding.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 12:19:23 -08:00
Fran Fitzpatrick
6944e6719c fix: keep header controls visible during auto-scroll
Make title, chat button, and settings dropdown sticky so users can
toggle auto-scroll without pausing playback. Wraps both the title
section and audio player in a single sticky container.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 12:19:23 -08:00
Fran Fitzpatrick
1dedde96a8 feat: fix auto-scroll and add active segment highlighting in Timeline View
The "Auto Scroll On" feature was broken because it relied on a word-level ref
that was never assigned. This fix implements segment-level auto-scroll for
Timeline View.

Changes:
- Enable autoScrollEnabled prop usage in TranscriptView
- Add activeSegmentIndex computation to track current playback position
- Add auto-scroll effect that scrolls to active segment on segment change
- Add subtle background highlight to indicate the currently playing segment

The auto-scroll only triggers when:
- Mode is 'expanded' (Timeline View)
- Auto-scroll is enabled
- Audio is playing
- The segment actually changes (debounced to prevent jitter)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 12:19:23 -08:00
Fran Fitzpatrick
0db419e5cd fix: speaker rename now updates in real-time without page reload
After renaming speakers in Timeline View, the changes now appear immediately
in both the transcript display and downloads (JSON, TXT, SRT).

Root cause: The onSpeakerMappingsUpdate callback was a no-op, so the React
Query cache wasn't being invalidated after saving speaker mappings.

Fix: Invalidate the speakerMappings cache when the dialog saves, triggering
an automatic refetch that updates all components using the hook.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 12:18:57 -08:00
Peter Somlo
df5de714c4 fix: make transcription temp and output directories configurable
- Add TempDir field to Config struct to read TEMP_DIR env var
- Update NewUnifiedTranscriptionService to accept tempDir and outputDir parameters
- Remove hardcoded "data/temp" and "data/transcripts" paths from unified service
- Update NewUnifiedJobProcessor to pass directory paths from config
- Update main.go to use cfg.TempDir and cfg.TranscriptsDir
- Update all test files to use new function signatures
- Fix database.go to use directory from DATABASE_PATH instead of hardcoded "data/"
2026-01-07 12:18:26 -08:00
Peter Somlo
93abf6eb21 feat: expand language support in the UI to 58 languages for Whisper and OpenAI models
Expands language selection from 24 to 58 languages for Whisper and OpenAI transcription profiles.

Changes:
- Expand LANGUAGES array to 58 languages (all with WER >50%)
- Add 34 new languages including Afrikaans, Armenian, Czech, Danish, Hungarian, Norwegian, Romanian, Serbian, Slovak, Thai, and many more
- Create VOXTRAL_LANGUAGES array with original 24-language subset for Voxtral
- Update VoxtralConfig to use VOXTRAL_LANGUAGES instead of LANGUAGES
- All languages alphabetically sorted

Language array usage:
- LANGUAGES (58) → Whisper and OpenAI models
- VOXTRAL_LANGUAGES (24) → Voxtral model
- CANARY_LANGUAGES (4) → NVIDIA Canary model
2026-01-01 12:47:41 -08:00
rishikanthc
b8fd360ca2 fix: streamline API docs generation to sync both locations
Updated make docs to generate swagger.json to both api-docs/ and
web/project-site/public/api/ to match CI workflow behavior.

This fixes CI failures where the project site swagger.json was out
of sync with code changes (max_new_tokens field for Voxtral).
2025-12-31 16:03:33 -08:00
rishikanthc
f9a58baa1e clean lint 2025-12-31 15:53:35 -08:00
rishikanthc
0248b01cbd update docs 2025-12-31 15:47:19 -08:00
rishikanthc
73a82b9f6b fix auto device detection in voxtral 2025-12-31 15:47:19 -08:00
rishikanthc
97eb45ea67 feat: add 'make dev' command to replace dev.sh script
- Auto-installs Air if not found (with GOPATH/bin PATH handling)
- Creates placeholder files for Go embed directive in dev mode
- Starts backend with Air live reload (or falls back to go run)
- Starts frontend with Vite HMR
- Handles cleanup on Ctrl+C/SIGTERM
- Removed dev.sh in favor of unified Makefile command
2025-12-31 15:47:19 -08:00
rishikanthc
efff1a3a7c fix: use Literata font for all transcripts
Changed from font-inter to font-literata to ensure consistent
typography across all transcript views regardless of model used.
2025-12-31 15:47:19 -08:00
rishikanthc
f08504eaa3 fix: disable timeline view for transcripts without word-level timestamps
- Check for presence of word_segments in transcript data
- Show disabled menu item with explanation when timestamps unavailable
- Applies to Voxtral and other models without word-level timestamps
2025-12-31 15:47:19 -08:00
rishikanthc
ad3053cc9b fix: add Voxtral model selection and fix dependencies
- Add FamilyMistralVoxtral and ModelVoxtral constants
- Add case for Voxtral in selectModels switch statement
- Add convertToVoxtralParams function for parameter conversion
- Add MaxNewTokens field to WhisperXParams model
- Map language and max_new_tokens parameters correctly
- Fix parameter name in buffered script (output_path -> output_file)
- Add mistral-common dependency to pyproject.toml
- Check for both VoxtralForConditionalGeneration AND mistral_common

On next server restart, the environment will be re-synced automatically
to install the missing mistral-common dependency.
2025-12-31 15:47:19 -08:00
rishikanthc
56c540da36 forgot to commit removal of old project site 2025-12-31 15:47:19 -08:00
rishikanthc
1485b01488 feat: add buffered transcription for Voxtral to handle long audio
- Create voxtral_transcribe_buffered.py for audio > 30 minutes
- Split audio into 25-minute chunks for processing
- Automatically detect long audio and use buffered mode
- Concatenate text results from all chunks
- No timestamp adjustment needed (text-only model)
- Handles unlimited audio length via chunking
2025-12-31 15:47:19 -08:00
rishikanthc
5a947e8739 fix: update Voxtral token limits based on 32k context window
- Default: 4096 → 8192 tokens
- Maximum: 8192 → 16384 tokens
- Minimum: 512 → 1024 tokens
- Voxtral has 32k context window, handles 30-40 min audio
- Updated UI description to reflect capabilities
2025-12-31 15:47:19 -08:00
rishikanthc
95ecbf6d21 fix: increase Voxtral max_new_tokens to 4096 (max 8192)
- Default increased from 500 to 4096 tokens
- Maximum increased from 2000 to 8192 tokens
- Minimum increased from 100 to 512 tokens
- Add max_new_tokens to TypeScript interface
- Fix UI to use correct parameter (was using max_line_width)
2025-12-31 15:47:19 -08:00
rishikanthc
1ae7b2bf71 feat: add Voxtral-mini transcription support
- Add VoxtralAdapter using transformers library with direct model loading
- Add Python transcription script with apply_transcription_request() method
- Register Voxtral adapter in main.go with dedicated environment
- Add UI configuration in TranscriptionConfigDialog with warning banner
- Support multilingual transcription without word-level timestamps
- Auto GPU/CPU detection, no device parameter needed
- Graceful degradation for missing timestamp features

Voxtral provides high-quality text-only transcription but does not
support word-level timestamps. UI warns users that synchronized
playback and seek features won't be available.
2025-12-31 15:47:19 -08:00
rishikanthc
923b39e415 fix: ensure directories exist before writing adapter scripts
- Create env directory in copy script functions before writing
- Fixes initialization errors for Parakeet, Canary, and Sortformer adapters
- Update Makefile to use web/project-site for website commands
- Add build target to Makefile for building Scriberr binary
2025-12-31 15:47:19 -08:00
rishikanthc
5e5dc17a13 fix colors and styles 2025-12-29 21:11:47 -08:00
rishikanthc
c433db07b7 feat: add toggle for Automatic Gain Control
Allow users to enable/disable AGC before starting recording.
AGC automatically adjusts microphone volume for consistent levels.
2025-12-29 21:11:47 -08:00
rishikanthc
ca2ed2fd72 fix: use remote-only echo cancellation for microphone
Use Chrome/Edge's 'remote-only' echo cancellation mode to allow
microphone input during local system audio playback while still
preventing acoustic echo from remote sources in video calls
2025-12-29 21:11:47 -08:00
rishikanthc
76d92e2055 style: apply brand gradient to Upload Recording button 2025-12-29 21:11:47 -08:00
rishikanthc
8ca2b5ba2b refactor: use consistent design system in SystemAudioRecorder
- Replace hardcoded colors with CSS variables
- Match button design with transcription settings
- Apply brand gradient to Start Recording button
2025-12-29 21:11:47 -08:00
rishikanthc
b13d1b360d refactor: use default button components in SystemAudioRecorder 2025-12-29 21:11:47 -08:00
rishikanthc
eb6192960f refactor: restrict system audio to Chromium browsers only
Removed Firefox/Safari support as only Chromium browsers (Chrome, Edge, Brave)
reliably support tab audio capture via getDisplayMedia API.

Changes:
- Added Chromium browser detection (Chrome, Edge, Brave, Chromium)
- Show compatibility error dialog for non-Chromium browsers
- Removed all Firefox-specific code and constraints
- Simplified UI instructions (tab selection only)
- Cleaner error messages focused on tab audio

Tested working on: Chrome, Edge, Brave
Not supported: Firefox, Safari, other browsers
2025-12-29 21:11:47 -08:00
rishikanthc
f5379464f6 feat: add system audio recording with microphone mixing
Implements Screen Capture API based system audio recording for meeting recordings.
Works on Chrome/Edge with tab audio capture.

Features:
- Client-side audio mixing (system audio + microphone) using Web Audio API
- Real-time volume controls via GainNode
- Simple timer-based recording (no visualization complexity)
- Echo cancellation enabled for microphone to prevent feedback loops
- Browser compatibility checks
- Graceful error handling for permissions and stream interruptions

Technical details:
- Uses getDisplayMedia() for system audio capture (requires video=true, immediately stopped)
- getUserMedia() for microphone with echo cancellation
- MediaRecorder for direct recording without WaveSurfer dependency
- Cyan/blue themed UI to differentiate from regular microphone recording

Tested and working on Chrome. Firefox support needs investigation (v146.0.1).
2025-12-29 21:11:47 -08:00
rishikanthc
2afd6a1ecf fixes #317 2025-12-29 21:11:47 -08:00
Paul Irish
0029078b8a project site 2025-12-29 21:10:13 -08:00
Paul Irish
9975e6fb02 fix duplicated openapi annotations pt 2 2025-12-29 21:10:13 -08:00
Paul Irish
a7aaf06bbb fix duplicated openapi annotations 2025-12-29 21:10:13 -08:00
Paul Irish
ab912a6b6e always copy scripts 2025-12-29 21:09:53 -08:00
Paul Irish
7471a2a1b6 Add test suite for python adapter scripts 2025-12-29 21:09:53 -08:00
Paul Irish
50dd4130ff Extract python adapter scripts to proper files 2025-12-29 21:09:53 -08:00
Paul Irish
edb65339b8 dont blank on vite startup 2025-12-29 21:09:53 -08:00
Paul Irish
d013fe288a build: adopt gotestsum for go test output formatting 2025-12-26 20:40:52 -08:00
Fran Fitzpatrick
8f537548d4 feat: add RTX 5090 Blackwell GPU support (sm_120)
Add support for NVIDIA RTX 50-series GPUs (Blackwell architecture) which
require CUDA 12.8+ and PyTorch cu128 wheels due to the new sm_120 compute
capability.

Changes:
- Add configurable PYTORCH_CUDA_VERSION environment variable to control
  PyTorch wheel version at runtime (cu126 for legacy, cu128 for Blackwell)
- Update all model adapters to use dynamic CUDA version instead of
  hardcoded cu126 URLs
- Update Dockerfile.cuda.12.9 for Blackwell with CUDA 12.9.1 base image,
  PYTORCH_CUDA_VERSION=cu128, and missing WHISPERX_ENV/yt-dlp
- Update Dockerfile.cuda with explicit PYTORCH_CUDA_VERSION=cu126
- Add docker-compose.blackwell.yml for pre-built Blackwell image
- Add docker-compose.build.blackwell.yml for local Blackwell builds
- Add GPU compatibility documentation to README

Fixes: rishikanthc/Scriberr#104
2025-12-24 14:46:44 -08:00
Paul Irish
718cb74b70 simpler name of job 2025-12-21 08:40:49 -08:00
Paul Irish
64953f9dde only on main and PRs 2025-12-21 08:40:49 -08:00
Paul Irish
4f75db3856 cleaner 2025-12-21 08:40:49 -08:00
Paul Irish
57127b6ec6 Revert "fix lint in TOC"
This reverts commit 15c919e327.
2025-12-21 08:40:49 -08:00
Paul Irish
2db72409da fix lint in TOC 2025-12-21 08:40:49 -08:00
Paul Irish
5d11f318d5 any 2025-12-21 08:40:49 -08:00
Paul Irish
03c8f76a1d flesh it out 2025-12-21 08:40:49 -08:00
Paul Irish
007b344f60 ci: add basic build and test workflow 2025-12-21 08:40:49 -08:00