Commit Graph

839 Commits

Author SHA1 Message Date
rishikanthc
4dc6810015 Add partial index predicate regression tests 2026-04-23 12:32:36 -07:00
rishikanthc
5993f418d0 Fix partial index WHERE extraction 2026-04-23 12:30:40 -07:00
rishikanthc
42ff560afc Harden partial index predicate verification 2026-04-23 12:29:02 -07:00
rishikanthc
4960a2d528 Finalize database migration correctness fixes 2026-04-23 12:22:01 -07:00
rishikanthc
a5f88fb638 Fix database migration backfill regressions 2026-04-23 12:10:36 -07:00
rishikanthc
f926509ac9 Harden database schema detection and migration errors 2026-04-23 12:01:25 -07:00
rishikanthc
5d6a60d793 Refactor database migration and persistence layer 2026-04-23 11:38:25 -07:00
rishikanthc
c5f758cddd repo: make execution creation deterministic and add user-scoped APIs 2026-04-23 11:24:13 -07:00
rishikanthc
fc3e933104 db: enforce single default per user and migration normalization 2026-04-23 11:24:08 -07:00
rishikanthc
5067a40790 test database migration with real sqlite fixtures 2026-04-23 10:35:59 -07:00
rishikanthc
e3a7c48bd7 fix schema compatibility gaps after migration 2026-04-23 10:35:15 -07:00
rishikanthc
0be71a63a0 refactor database schema and legacy migration flow 2026-04-23 10:17:03 -07:00
Booth
bdb8838b8b fix: pin torchcodec to 0.7.x for compatibility with PyTorch 2.8.x
torchcodec>=0.6.0 (upstream default) resolves to 0.10.0+ which requires
PyTorch 2.9. Scriberr ships PyTorch 2.8.x, causing a C++ ABI symbol
mismatch at load time. Pin to ~=0.7.0, the last release compatible with
PyTorch 2.8.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 10:41:40 -07:00
Claude
9fd0943b92 fix: move override-dependencies to correct TOML scope [tool.uv]
The override-dependencies key was placed after [tool.uv.sources], causing
it to be parsed as tool.uv.sources.override-dependencies instead of
tool.uv.override-dependencies. uv would silently ignore it, meaning
torchcodec was never actually excluded on Linux aarch64.

https://claude.ai/code/session_01YMyUwpk577EradV93tMMqS
2026-04-21 10:38:20 -07:00
Claude
6221843866 fix: restore diarization on Linux ARM64 and add WhisperX model selector
- Default sortformer output format to json; RTTM path fails silently
  on NeMo annotation objects, producing zero diarization segments
- Exclude torchcodec on Linux aarch64 via uv platform marker; no
  wheels exist for any torchcodec version on manylinux aarch64, causing
  pyannote environment setup to fail entirely on ARM64 Docker
- Add diarization model selector to WhisperX config UI; Parakeet and
  Canary sections already had this but WhisperX was missing it, making
  it impossible to select nvidia_sortformer as the diarization backend

https://claude.ai/code/session_01YMyUwpk577EradV93tMMqS
2026-04-21 10:38:20 -07:00
Claude
00923e8898 refactor: use shared FormHelpers in SummaryTemplateDialog
Remove 4 duplicated CSS class constants and replace manual Select/Switch
blocks with SelectField and SwitchField from FormHelpers.

225 → 185 lines (-18%)

https://claude.ai/code/session_01YMyUwpk577EradV93tMMqS
2026-04-21 10:38:20 -07:00
Claude
5cdc91cf48 refactor: deduplicate TranscriptionConfigDialog with shared form helpers
- Extract SelectField, SwitchField, SliderField, AdvancedAccordion to FormHelpers
- Move shared CSS class constants (inputClassName, etc.) to FormHelpers
- Extract DiarizationSection to eliminate 3x copy-pasted diarization blocks
- Replace 11 inline Select blocks, 5 Switch+label blocks, 2 Slider blocks,
  2 Accordion wrappers with single-line helper calls

TranscriptionConfigDialog: 1178 → 702 lines (-40%)
FormHelpers: 119 → 264 lines (reusable form infrastructure)

https://claude.ai/code/session_01YMyUwpk577EradV93tMMqS
2026-04-21 10:38:20 -07:00
scnerd
77c3365a4a Add TODO comment to remember that this fix is really about technical debt and we should just remove the legacy workers argument entirely. 2026-04-21 10:38:09 -07:00
scnerd
d0a1dcbd6c docs(queue): add comments to test functions for consistency
All six test functions now have // TestFoo verifies... comments
matching the project's existing convention.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-21 10:38:09 -07:00
scnerd
8b3f25f801 refactor(queue): use testify assertions and t.Setenv in queue tests
Switch from raw t.Errorf to testify/assert for consistency with the
rest of the codebase. Use t.Setenv() instead of manual os.Setenv/defer
os.Unsetenv for automatic cleanup. Simplify table structs where min
and max are always equal.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-21 10:38:09 -07:00
scnerd
8d05a7cdd9 fix(queue): allow QUEUE_WORKERS env var to override hardcoded worker count
The QUEUE_WORKERS environment variable was defined and read in
getOptimalWorkerCount(), but NewTaskQueue() unconditionally overwrote
the result with the hardcoded legacyWorkers parameter (always 2).
This made QUEUE_WORKERS effectively dead code.

Now legacyWorkers is only used as a fallback when QUEUE_WORKERS is
not set, preserving the default of 2 workers while allowing users
to control concurrency via the environment variable.

Set QUEUE_WORKERS=1 to serialize all transcription jobs and prevent
system overload during bulk uploads.

Fixes: rishikanthc/Scriberr#379

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-21 10:38:09 -07:00
scnerd
1ab08ddc11 test(queue): add unit tests for QUEUE_WORKERS env var behavior
Add tests verifying that getOptimalWorkerCount() respects the
QUEUE_WORKERS environment variable and that NewTaskQueue() should
allow QUEUE_WORKERS to override the hardcoded legacy worker count.

Includes a failing test (TestNewTaskQueue_EnvOverridesLegacy) that
reproduces the bug where QUEUE_WORKERS is always overridden by the
hardcoded legacyWorkers parameter.

Ref: rishikanthc/Scriberr#379

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-21 10:38:09 -07:00
Paul Irish
a378335ebd refactor(frontend): extract auth logic to helpers and interceptor
needed because I was adding a new SpeakerSettings component but
the useAuth hook triggered an infinite recusion bug because of the
window.fetch wrappings.
2026-03-22 12:10:24 -07:00
Rishikanth Chandrasekaran
bccf81d3e6 Update README with referral request and personal link
Added a request for AI/ML engineering referrals and linked personal website.
2026-03-19 10:38:46 -07:00
Rishikanth Chandrasekaran
abd6d62a60 Add project status update and collaboration invitation
Added an update on project status, including personal circumstances and a call for community contributions.
2026-03-19 10:07:09 -07:00
Peter Somlo
175832e8e7 fix: voxtral duration comparison and "auto" language handling
- use shared LANGUAGES constant in frontend config dialog
2026-02-28 10:59:07 -08:00
Fran Fitzpatrick
5aacc58bad docs: regenerate swagger docs for speaker identification toggle
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-28 10:58:49 -08:00
Fran Fitzpatrick
4e75295019 feat: add speaker identification toggle to summary templates
Add option to include speaker labels in summary prompts when diarization
is available. When enabled, transcripts are formatted as:
[SPEAKER_NAME] Text here...

The prompt also includes a hint to the LLM that speaker labels are present,
helping it produce summaries that attribute statements to specific speakers.

Changes:
- Add IncludeSpeakerInfo field to SummaryTemplate model
- Add toggle UI in summary template dialog
- Format transcript with speaker labels when generating summary
- Update prompt prefix to indicate speaker labels are present

Closes #353

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-28 10:58:49 -08:00
Paul Irish
71e7588c75 fix the 'any's 2026-02-28 10:58:17 -08:00
Paul Irish
6e707f363c fix(auth): prevent infinite fetch recursion and multiple wrapper layers
This fixes an issue where the frontend would spam the auth endpoints repeatedly when logged out or when a session expired.

1. Infinite Recursion on 401: The window.fetch wrapper would catch a 401, call tryRefresh(), which then called fetch() again, triggering the wrapper recursively if the refresh also failed. We now use the original fetch for refresh attempts and exclude auth endpoints from auto-refresh logic.
2. Multiple Wrapper Layers: Since useAuth is a hook used by many components, multiple instances were independently wrapping window.fetch. We now store the original fetch globally and ensure wrapping only happens once.
2026-02-28 10:58:17 -08:00
Fran Fitzpatrick
850af1fb6e test: update PyAnnote test to reflect optional HF token
The HF token parameter is now optional at validation time since
it can be provided via the HF_TOKEN environment variable at runtime.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 12:23:27 -08:00
Fran Fitzpatrick
ff12270419 feat: add HF_TOKEN environment variable fallback for diarization
Previously, users had to enter their Hugging Face token in the UI
for every transcription job that used diarization. Now the token
can be set via the HF_TOKEN environment variable, which is
especially useful for Docker deployments.

Changes:
- Add HFToken to backend config (reads from HF_TOKEN env var)
- Update PyAnnote adapter to fall back to env var when no UI token
- Update WhisperX adapter to fall back to env var when no UI token
- Update documentation to clarify both configuration options

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 12:23:27 -08:00
Fran Fitzpatrick
f6df31b500 feat: add VAD segmentation thresholds for Pyannote diarization
Add configurable voice activity detection thresholds to improve
speaker diarization accuracy for noisy or distant audio recordings.

- Add --segmentation-onset and --segmentation-offset CLI args to
  pyannote_diarize.py
- Pass segmentation thresholds from Go adapter to Python script
- Map existing vad_onset/vad_offset params to Pyannote segmentation
- Add VAD Onset/Offset inputs to UI when Pyannote diarization is
  selected (Whisper, Parakeet, Canary model families)

Lower onset values (0.3-0.4) help detect quieter/distant speakers.
Lower offset values improve detection of speech endings.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 12:20:31 -08:00
Fran Fitzpatrick
f8c0c6759d fix: Listen button in selection menu now works in Timeline View
The selection menu's "Listen" button wasn't working in Timeline View because
the character-to-timestamp mapping was incorrectly counting text from timestamp
and speaker name elements.

Changes:
- Add data-transcript-text attribute to transcript text containers
- Update TreeWalker in useSelectionMenu to only count text inside these marked elements

This fixes the character index calculation so word timestamps are correctly looked up.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 12:20:07 -08:00
Fran Fitzpatrick
8c3f345cee style: add glass-card styling to sticky title section
Match the title/controls section styling to the audio player below
with glass-card, rounded corners, border, shadow, and padding.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 12:19:23 -08:00
Fran Fitzpatrick
6944e6719c fix: keep header controls visible during auto-scroll
Make title, chat button, and settings dropdown sticky so users can
toggle auto-scroll without pausing playback. Wraps both the title
section and audio player in a single sticky container.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 12:19:23 -08:00
Fran Fitzpatrick
1dedde96a8 feat: fix auto-scroll and add active segment highlighting in Timeline View
The "Auto Scroll On" feature was broken because it relied on a word-level ref
that was never assigned. This fix implements segment-level auto-scroll for
Timeline View.

Changes:
- Enable autoScrollEnabled prop usage in TranscriptView
- Add activeSegmentIndex computation to track current playback position
- Add auto-scroll effect that scrolls to active segment on segment change
- Add subtle background highlight to indicate the currently playing segment

The auto-scroll only triggers when:
- Mode is 'expanded' (Timeline View)
- Auto-scroll is enabled
- Audio is playing
- The segment actually changes (debounced to prevent jitter)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 12:19:23 -08:00
Fran Fitzpatrick
0db419e5cd fix: speaker rename now updates in real-time without page reload
After renaming speakers in Timeline View, the changes now appear immediately
in both the transcript display and downloads (JSON, TXT, SRT).

Root cause: The onSpeakerMappingsUpdate callback was a no-op, so the React
Query cache wasn't being invalidated after saving speaker mappings.

Fix: Invalidate the speakerMappings cache when the dialog saves, triggering
an automatic refetch that updates all components using the hook.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-07 12:18:57 -08:00
Peter Somlo
df5de714c4 fix: make transcription temp and output directories configurable
- Add TempDir field to Config struct to read TEMP_DIR env var
- Update NewUnifiedTranscriptionService to accept tempDir and outputDir parameters
- Remove hardcoded "data/temp" and "data/transcripts" paths from unified service
- Update NewUnifiedJobProcessor to pass directory paths from config
- Update main.go to use cfg.TempDir and cfg.TranscriptsDir
- Update all test files to use new function signatures
- Fix database.go to use directory from DATABASE_PATH instead of hardcoded "data/"
2026-01-07 12:18:26 -08:00
Peter Somlo
93abf6eb21 feat: expand language support in the UI to 58 languages for Whisper and OpenAI models
Expands language selection from 24 to 58 languages for Whisper and OpenAI transcription profiles.

Changes:
- Expand LANGUAGES array to 58 languages (all with WER >50%)
- Add 34 new languages including Afrikaans, Armenian, Czech, Danish, Hungarian, Norwegian, Romanian, Serbian, Slovak, Thai, and many more
- Create VOXTRAL_LANGUAGES array with original 24-language subset for Voxtral
- Update VoxtralConfig to use VOXTRAL_LANGUAGES instead of LANGUAGES
- All languages alphabetically sorted

Language array usage:
- LANGUAGES (58) → Whisper and OpenAI models
- VOXTRAL_LANGUAGES (24) → Voxtral model
- CANARY_LANGUAGES (4) → NVIDIA Canary model
2026-01-01 12:47:41 -08:00
rishikanthc
b8fd360ca2 fix: streamline API docs generation to sync both locations
Updated make docs to generate swagger.json to both api-docs/ and
web/project-site/public/api/ to match CI workflow behavior.

This fixes CI failures where the project site swagger.json was out
of sync with code changes (max_new_tokens field for Voxtral).
2025-12-31 16:03:33 -08:00
rishikanthc
f9a58baa1e clean lint 2025-12-31 15:53:35 -08:00
rishikanthc
0248b01cbd update docs 2025-12-31 15:47:19 -08:00
rishikanthc
73a82b9f6b fix auto device detection in voxtral 2025-12-31 15:47:19 -08:00
rishikanthc
97eb45ea67 feat: add 'make dev' command to replace dev.sh script
- Auto-installs Air if not found (with GOPATH/bin PATH handling)
- Creates placeholder files for Go embed directive in dev mode
- Starts backend with Air live reload (or falls back to go run)
- Starts frontend with Vite HMR
- Handles cleanup on Ctrl+C/SIGTERM
- Removed dev.sh in favor of unified Makefile command
2025-12-31 15:47:19 -08:00
rishikanthc
efff1a3a7c fix: use Literata font for all transcripts
Changed from font-inter to font-literata to ensure consistent
typography across all transcript views regardless of model used.
2025-12-31 15:47:19 -08:00
rishikanthc
f08504eaa3 fix: disable timeline view for transcripts without word-level timestamps
- Check for presence of word_segments in transcript data
- Show disabled menu item with explanation when timestamps unavailable
- Applies to Voxtral and other models without word-level timestamps
2025-12-31 15:47:19 -08:00
rishikanthc
ad3053cc9b fix: add Voxtral model selection and fix dependencies
- Add FamilyMistralVoxtral and ModelVoxtral constants
- Add case for Voxtral in selectModels switch statement
- Add convertToVoxtralParams function for parameter conversion
- Add MaxNewTokens field to WhisperXParams model
- Map language and max_new_tokens parameters correctly
- Fix parameter name in buffered script (output_path -> output_file)
- Add mistral-common dependency to pyproject.toml
- Check for both VoxtralForConditionalGeneration AND mistral_common

On next server restart, the environment will be re-synced automatically
to install the missing mistral-common dependency.
2025-12-31 15:47:19 -08:00
rishikanthc
56c540da36 forgot to commit removal of old project site 2025-12-31 15:47:19 -08:00
rishikanthc
1485b01488 feat: add buffered transcription for Voxtral to handle long audio
- Create voxtral_transcribe_buffered.py for audio > 30 minutes
- Split audio into 25-minute chunks for processing
- Automatically detect long audio and use buffered mode
- Concatenate text results from all chunks
- No timestamp adjustment needed (text-only model)
- Handles unlimited audio length via chunking
2025-12-31 15:47:19 -08:00