Simon/Scriberr

Fork 0

mirror of https://github.com/rishikanthc/Scriberr.git synced 2026-07-01 08:15:46 +00:00

Files

rishikanthc c94e779d4d docs: plan engine worker integration sprints

2026-04-26 14:40:14 -07:00

20 KiB

Raw Blame History

Sprint Run: Engine Worker Integration

Run ID: EWI

Status: planning only. Do not implement code from this document until the user explicitly starts an implementation sprint.

Scope

This sprint run implements devnotes/engine-worker-integration-spec.md end to end: local Go engine integration, durable transcription workers, canonical transcript JSON, executions/logs/models APIs, removal of legacy Python adapter startup paths, docs, and verification with fixture audio.

The work should stay backend-first. Frontend changes are out of scope unless an API contract change makes the current UI unable to compile or use the canonical endpoints.

Engineering Rules

Follow test-driven development: write the narrow failing tests first, implement, then refactor.
Keep commits small and intentional. Each sprint should usually produce 2-5 commits, grouped by behavior:
- tests that define the target behavior,
- implementation,
- cleanup/docs,
- verification fixes.
Do not mix unrelated cleanup into implementation commits.
Keep the API layer thin. Handlers validate, map requests/responses, and call service interfaces.
Keep SQLite as the source of truth. In-memory state may wake workers and cancel local jobs only.
Do not leak local file paths, model cache paths, tokens, raw command output, or env-specific internals through API responses, logs endpoints, or SSE events.
Preserve future multi-user scheduling hooks by carrying user_id through claim, execution, cancellation, and stats code.
Prefer fake providers/processors for fast tests. Real engine tests must be opt-in and skipped cleanly when runtime/model dependencies are unavailable.
Use test-audio/jfk.wav for fast real-path smoke tests. Use longer audio only for opt-in local performance checks.

Validation Baseline

Run these checks before and after each implementation sprint when possible:

GOCACHE=/tmp/scriberr-go-cache go test ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware
GOCACHE=/tmp/scriberr-go-cache go vet ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware
git diff --check

Real engine validation is opt-in:

SCRIBERR_ENGINE_ITEST=1 SPEECH_ENGINE_AUTO_DOWNLOAD=true GOCACHE=/tmp/scriberr-go-cache go test ./internal/transcription/... -run 'Test.*RealEngine|Test.*JFK'

Performance-oriented manual validation should record:

audio file used,
selected provider and resolved provider,
model download time if any,
transcription wall time,
CPU/GPU mode,
resulting transcript word/segment counts.

EWI-Sprint 0: Integration Inventory and Commit Plan

Goal: lock down the exact files, dependencies, and deletion targets before changing runtime behavior.

Tasks:

Inventory current transcription API handlers, queue package usage, internal/transcription packages, server startup wiring, config fields, schema fields, docs, and Docker env examples.
Identify all imports of legacy adapter/registry/pipeline code and decide whether each file is removed, replaced, or kept as a compatibility wrapper.
Confirm references/engine compiles as a local replacement module and document any platform/runtime prerequisites.
Write a route/API impact matrix for create/submit/retry/cancel/models/logs/executions/events.
Create a commit checklist for the remaining sprints.

Acceptance criteria:

Deletion list for legacy Python adapter stack is explicit.
No implementation sprint starts with unknown server startup dependencies.
The intended commit grouping is documented.

Testing focus:

Compile-only discovery if Go dependencies are available.
No product behavior tests required.

EWI-Sprint 1: Config and Engine Module Wiring

Goal: add engine/worker configuration and module wiring without starting workers or downloading models.

Tasks:

Add require scriberr-engine v0.0.0 and replace scriberr-engine => ./references/engine.
Add EngineConfig and WorkerConfig to internal/config.
Parse and validate:
- SPEECH_ENGINE_CACHE_DIR
- SPEECH_ENGINE_PROVIDER
- SPEECH_ENGINE_THREADS
- SPEECH_ENGINE_MAX_LOADED
- SPEECH_ENGINE_AUTO_DOWNLOAD
- TRANSCRIPTION_WORKERS
- TRANSCRIPTION_QUEUE_POLL_INTERVAL
- TRANSCRIPTION_LEASE_TIMEOUT
Make invalid numeric, duration, boolean, and provider values fail startup with clear config errors.
Keep startup free of model downloads.
Start de-emphasizing WHISPERX_ENV in config without breaking existing callers until legacy startup code is removed.

Acceptance criteria:

Defaults match the spec exactly.
auto, cpu, and cuda provider values parse; other values fail.
Config errors are actionable and do not panic.

Testing focus:

Defaults.
Invalid provider.
Invalid integer/duration/boolean.
Auto-download default.
Worker default remains one worker.

Commit guidance:

Commit config tests first.
Commit config implementation and module wiring second.

EWI-Sprint 2: Engine Provider Abstraction

Goal: create a provider boundary that hides scriberr-engine from API, repository, and worker callers.

Tasks:

Add internal/transcription/engineprovider.
Define provider, registry, capability, request, result, transcript word/segment, and diarization segment types.
Implement an in-memory registry with default provider lookup.
Implement a fake provider for tests.
Implement local provider wrapping scriberr-engine/speech/engine.
Map Scriberr requests to engine requests and map engine results back to internal provider results.
Implement provider capabilities and installed state via engine model metadata and IsModelInstalled.
Sanitize provider errors before they can be returned to API clients.
Log detailed errors internally without public path leakage.

Acceptance criteria:

Only engineprovider imports scriberr-engine.
Local provider ID() is local.
Defaults are whisper-base and diarization-default.
Missing words return Words: [], never nil-dependent API behavior.
Provider registry supports future non-local providers.

Testing focus:

Request mapping.
Result mapping with words.
Empty words.
Diarization result mapping.
Capability listing and installed flags.
Error sanitization.

Commit guidance:

Commit interface/fake-provider tests first.
Commit local provider implementation separately.

EWI-Sprint 3: Queue Schema and Repository Methods

Goal: make transcription job state durable enough for workers, leases, progress, and executions.

Tasks:

Add schema fields to models.TranscriptionJob:
- queued_at
- started_at
- failed_at
- progress
- progress_stage
- claimed_by
- claim_expires_at
- engine_id
Ensure TranscriptionJobExecution stores provider, model name/family, started/completed/failed timestamps, sanitized error, output JSON path, and request/config JSON.
Add indexes:
- idx_transcriptions_queue_claim(status, queued_at)
- idx_transcriptions_claim_expires_at(claim_expires_at)
Add repository methods for enqueue, claim, renew, progress, complete, fail, cancel, execution listing, and startup recovery.
Use transactions for claim and terminal-state updates.
Keep claim policy isolated behind a FIFO scheduler policy.

Acceptance criteria:

Claim returns the oldest queued job by queued_at, created_at, and id.
Concurrent claims do not return the same job.
Lease renewal updates only the owning worker.
Startup recovery requeues processing rows regardless of stale process-local owner state.
Terminal updates keep job and latest execution consistent.

Testing focus:

Migration/schema fields and indexes.
Enqueue state transition.
FIFO claim.
Concurrent claim race.
Lease renewal owner mismatch.
Startup recovery.
Complete/fail/cancel terminal transactions.

Commit guidance:

Commit schema/repository tests first.
Commit schema/model updates and repository implementation second.

EWI-Sprint 4: Durable Worker Service

Goal: implement the worker loop, wake signals, cancellation, stats, leases, and shutdown behavior using fake processors.

Tasks:

Add internal/transcription/worker.
Define QueueService as specified.
Implement enqueue as durable DB update plus non-blocking wake signal.
Implement workers that poll, claim one job, renew lease, process with cancellable context, and write terminal state through repository/orchestrator results.
Track cancel funcs for currently running process-local jobs.
Implement cancel behavior for queued, local running, and orphaned processing jobs.
Implement queue stats by user.
Implement clean Start and Stop.

Acceptance criteria:

Enqueue moves jobs to queued and wakes workers.
Workers process jobs without duplicate claims.
Running jobs renew leases until terminal state.
Stop cancels local running jobs and waits within a bounded timeout.
Cancel returns conflict for completed/failed/canceled jobs.

Testing focus:

Enqueue/wake hot path.
Fake processor completes a job.
Fake processor observes cancellation.
Cancel queued.
Cancel running.
Claim lease renewal.
Worker shutdown.
Queue stats.

Commit guidance:

Commit queue-service tests first.
Commit worker implementation separately.

EWI-Sprint 5: Orchestrator, Transcript Mapping, and Speaker Merge

Goal: convert claimed jobs into completed canonical transcripts through provider calls.

Tasks:

Add internal/transcription/orchestrator.
Implement Processor with job repository, provider registry, event publisher, and job logger dependencies.
Resolve audio path, provider, transcription model, diarization model, language, task, and diarization options.
Create execution rows at processing start.
Publish progress stages:
- queued,
- preparing,
- transcribing,
- diarizing,
- merging,
- saving,
- completed/failed/canceled.
Persist canonical transcript JSON in transcriptions.transcript_text.
Write the same JSON to data/transcripts/{jobID}/transcript.json and store internal output path.
Generate fallback segments when needed.
Preserve words: [] when words are absent.
Merge diarization speakers into words and segments using overlap.
Sanitize failures and distinguish user cancellation from engine failure.

Acceptance criteria:

Fake provider can complete a transcription job end to end.
Canonical transcript JSON matches the spec.
No diarization leaves speaker fields absent.
Diarization assigns stable SPEAKER_00 style labels.
Failures update job and execution consistently.
Context cancellation marks canceled, not failed.

Testing focus:

Transcript mapper with words.
Transcript mapper without words.
Plain-text legacy fallback.
Older JSON without words.
Segment fallback.
Word and segment speaker overlap assignment.
Provider failure sanitization.
Cancellation path.

Commit guidance:

Commit mapper/merge tests first.
Commit orchestrator tests and implementation second.

EWI-Sprint 6: API Wiring for Real Queue Execution

Goal: replace transcription placeholders with queue-backed behavior while preserving canonical API contracts.

Tasks:

Inject queue service, provider registry/model service, execution service, and log reader into API handler construction.
Update create/submit to create queued rows and call QueueService.Enqueue.
Update retry to reset eligible terminal jobs and enqueue a new attempt.
Update cancel to call QueueService.Cancel.
Add progress fields to transcription get/list responses.
Implement transcript endpoint through canonical transcript parser.
Implement executions endpoint with sanitized metadata.
Implement logs endpoint as authenticated plain text with sanitization.
Implement models endpoint from provider capabilities.
Keep SSE payloads path-safe and progress-shaped.

Acceptance criteria:

Create/submit return 202 queued resources.
Queue shutdown produces 503 without losing durable job state.
Fake engine worker can complete a job, and transcript endpoint returns text, segments, and words.
Executions endpoint returns execution metadata.
Logs endpoint returns sanitized text.
Models endpoint returns local capabilities and installed/default flags.
No API response/event leaks upload path, temp path, model cache path, or raw internal error details.

Testing focus:

Create enqueues.
Submit upload creates file, job, and enqueue.
Retry valid and conflict states.
Cancel queued/running/conflict states.
Transcript canonical response.
Fake worker completion through API-visible state.
Events progress and completed payloads.
Executions/logs/models endpoints.
Path-leak regression tests.

Commit guidance:

Commit API tests first.
Commit dependency injection and service wiring separately.
Commit endpoint behavior updates last.

EWI-Sprint 7: Server Startup, Shutdown, and Legacy Adapter Removal

Goal: make the real engine worker the default runtime path and remove obsolete Python adapter bootstrapping.

Tasks:

Update cmd/server/main.go startup order to match the spec.
Initialize local engine provider registry after DB/repositories and before worker service.
Run DB recovery before starting workers.
Start workers before HTTP serving.
On shutdown, stop workers, cancel running jobs, close engine providers, then close DB.
Remove server startup dependencies on Python, WHISPERX_ENV, and legacy adapter registration.
Delete or stop compiling:
- internal/transcription/adapters/**
- legacy registry/pipeline abstractions that only serve Python adapters
- obsolete adapter tests
Keep compatibility wrappers only when needed to compile API or service boundaries.

Acceptance criteria:

Fresh server startup does not require Python env setup.
Startup logs include cache dir, requested provider, resolved provider when available, threads, max-loaded, and auto-download.
Shutdown releases worker and provider resources.
Legacy Python adapter code no longer participates in server runtime.

Testing focus:

Server construction with fake provider/worker where practical.
Startup recovery is invoked before worker start.
Shutdown order test or focused unit test around lifecycle coordinator.
Compile regression proving deleted adapter code is no longer referenced.

Commit guidance:

Commit lifecycle tests first.
Commit startup wiring.
Commit legacy deletion as its own reviewable commit.

EWI-Sprint 8: Real Engine Integration Tests and Performance Smoke

Goal: prove the local engine path works with fixture audio without making CI slow or flaky.

Tasks:

Add opt-in real engine integration tests gated by SCRIBERR_ENGINE_ITEST=1.
Use test-audio/jfk.wav for fast transcription verification.
Skip cleanly with clear messages when ffmpeg, engine runtime, model downloads, CUDA runtime, or network access are unavailable.
Validate that first-run auto-download works when enabled.
Validate SPEECH_ENGINE_AUTO_DOWNLOAD=false fails jobs cleanly when models are missing.
Add a small benchmark or timed smoke test that reports wall time without setting brittle pass/fail thresholds.
Document manual performance commands for jfk.wav, sample.wav, and optional longer fixtures.

Acceptance criteria:

Default CI remains fake-provider only.
Opt-in real test can produce a completed transcript for test-audio/jfk.wav.
Real test asserts non-empty text and path-safe public outputs.
Auto-download disabled path fails with a sanitized model-unavailable error.

Testing focus:

Real local provider transcription with jfk.wav.
Optional diarization only if model/runtime availability makes it practical.
Auto-download enabled/disabled behavior.
No path leakage from real failures.

Commit guidance:

Commit gated integration tests separately from runtime implementation fixes.

EWI-Sprint 9: Docs, Docker, and Setup UX

Goal: make first-run local and Docker usage understandable and aligned with the new engine architecture.

Tasks:

Update README configuration table with SPEECH_ENGINE_* and TRANSCRIPTION_* variables.
Remove or de-emphasize WHISPERX_ENV.
Update Docker compose files to persist data/models or map SPEECH_ENGINE_CACHE_DIR.
Keep CUDA Docker configuration on provider auto and document NVIDIA runtime/container toolkit prerequisites.
Add troubleshooting notes for:
- missing ffmpeg,
- model download failure,
- forced CUDA unavailable,
- first job slow due to model download,
- auto-download disabled.
Document the opt-in real engine test command and fixture audio expectations.

Acceptance criteria:

README and Docker examples match implemented defaults.
CPU fresh install path is documented without pre-download steps.
CUDA behavior is explicit: auto falls back, cuda requires CUDA.

Testing focus:

Docs command snippets are syntax-checked where practical.
docker-compose YAML remains parseable if tooling is available.

Commit guidance:

Commit docs and Docker changes separately from code.

EWI-Sprint 10: Hardening, Cleanup, and Release Candidate

Goal: audit the full implementation for correctness, performance, privacy, and maintainability.

Tasks:

Run the full backend test/vet baseline.
Run focused race/concurrency tests for queue claiming if practical.
Run opt-in jfk.wav real engine smoke.
Audit public API responses and events for path/token leakage.
Audit handler packages for business logic that should move behind services.
Audit repository terminal updates for transaction boundaries.
Audit worker shutdown and cancellation behavior.
Remove dead code, obsolete TODOs, and stale docs from earlier sprints.
Update devnotes/engine-worker-sprint-tracker.md with final verification results.

Acceptance criteria:

Full fake-provider test suite passes.
Real jfk.wav smoke passes or has a documented external dependency blocker.
No known path leaks in API, events, logs endpoint, or tests.
Queue restart recovery is covered.
Commit history is organized by sprint and behavior.

Testing focus:

Full package tests and vet.
API regression tests.
Queue concurrency/recovery tests.
Transcript compatibility tests.
Real engine smoke.

Commit guidance:

Commit hardening fixes in narrow patches.
Final commit should be docs/tracker updates only.

Minimum Test Coverage Set

The target is not 100% coverage. The minimum set must cover the highest-risk paths:

Config parse defaults and invalid values.
Provider request/result mapping and sanitized errors.
Queue enqueue, claim, lease renew, recovery, cancel queued, cancel running, and no duplicate claim.
Orchestrator success, provider failure, cancellation, canonical JSON, words absent, and diarization merge.
API create/submit/retry/cancel/transcript/events/executions/logs/models hot paths.
Security/path-leak regression for API responses, events, logs, and provider errors.
Gated real engine transcription using test-audio/jfk.wav.

Completion Definition

This sprint run is complete when:

Fresh install starts without Python environment setup.
Missing models download automatically on first job when enabled.
A transcription moves from queued to processing to completed or failed.
Completed transcripts include text, segments, and word timestamps when available.
Missing word timestamps are represented as words: [].
Diarization assigns public-safe speaker labels when requested and available.
Events report progress and terminal states.
Executions and logs endpoints are implemented and sanitized.
Queue state survives restart and recovers orphaned processing jobs.
Legacy Python adapter bootstrap is gone from server startup.
Real jfk.wav smoke has been run or blocked by a documented external dependency.

20 KiB Raw Blame History

Sprint Run: Engine Worker Integration

Scope

Engineering Rules

Validation Baseline

EWI-Sprint 0: Integration Inventory and Commit Plan

EWI-Sprint 1: Config and Engine Module Wiring

EWI-Sprint 2: Engine Provider Abstraction

EWI-Sprint 3: Queue Schema and Repository Methods

EWI-Sprint 4: Durable Worker Service

EWI-Sprint 5: Orchestrator, Transcript Mapping, and Speaker Merge

EWI-Sprint 6: API Wiring for Real Queue Execution

EWI-Sprint 7: Server Startup, Shutdown, and Legacy Adapter Removal

EWI-Sprint 8: Real Engine Integration Tests and Performance Smoke

EWI-Sprint 9: Docs, Docker, and Setup UX

EWI-Sprint 10: Hardening, Cleanup, and Release Candidate

Minimum Test Coverage Set

Completion Definition

20 KiB

Raw Blame History