# Sprint Run: Engine Worker Integration Run ID: `EWI` Status: planning only. Do not implement code from this document until the user explicitly starts an implementation sprint. ## Scope This sprint run implements `devnotes/engine-worker-integration-spec.md` end to end: local Go engine integration, durable transcription workers, canonical transcript JSON, executions/logs/models APIs, removal of legacy Python adapter startup paths, docs, and verification with fixture audio. The work should stay backend-first. Frontend changes are out of scope unless an API contract change makes the current UI unable to compile or use the canonical endpoints. ## Engineering Rules - Follow test-driven development: write the narrow failing tests first, implement, then refactor. - Keep commits small and intentional. Each sprint should usually produce 2-5 commits, grouped by behavior: - tests that define the target behavior, - implementation, - cleanup/docs, - verification fixes. - Do not mix unrelated cleanup into implementation commits. - Keep the API layer thin. Handlers validate, map requests/responses, and call service interfaces. - Keep SQLite as the source of truth. In-memory state may wake workers and cancel local jobs only. - Do not leak local file paths, model cache paths, tokens, raw command output, or env-specific internals through API responses, logs endpoints, or SSE events. - Preserve future multi-user scheduling hooks by carrying `user_id` through claim, execution, cancellation, and stats code. - Prefer fake providers/processors for fast tests. Real engine tests must be opt-in and skipped cleanly when runtime/model dependencies are unavailable. - Use `test-audio/jfk.wav` for fast real-path smoke tests. Use longer audio only for opt-in local performance checks. ## Validation Baseline Run these checks before and after each implementation sprint when possible: ```sh GOCACHE=/tmp/scriberr-go-cache go test ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware GOCACHE=/tmp/scriberr-go-cache go vet ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware git diff --check ``` Real engine validation is opt-in: ```sh SCRIBERR_ENGINE_ITEST=1 SPEECH_ENGINE_AUTO_DOWNLOAD=true GOCACHE=/tmp/scriberr-go-cache go test ./internal/transcription/... -run 'Test.*RealEngine|Test.*JFK' ``` Performance-oriented manual validation should record: - audio file used, - selected provider and resolved provider, - model download time if any, - transcription wall time, - CPU/GPU mode, - resulting transcript word/segment counts. ## EWI-Sprint 0: Integration Inventory and Commit Plan Goal: lock down the exact files, dependencies, and deletion targets before changing runtime behavior. Tasks: - Inventory current transcription API handlers, queue package usage, `internal/transcription` packages, server startup wiring, config fields, schema fields, docs, and Docker env examples. - Identify all imports of legacy adapter/registry/pipeline code and decide whether each file is removed, replaced, or kept as a compatibility wrapper. - Confirm `references/engine` compiles as a local replacement module and document any platform/runtime prerequisites. - Write a route/API impact matrix for create/submit/retry/cancel/models/logs/executions/events. - Create a commit checklist for the remaining sprints. Acceptance criteria: - Deletion list for legacy Python adapter stack is explicit. - No implementation sprint starts with unknown server startup dependencies. - The intended commit grouping is documented. Testing focus: - Compile-only discovery if Go dependencies are available. - No product behavior tests required. ## EWI-Sprint 1: Config and Engine Module Wiring Goal: add engine/worker configuration and module wiring without starting workers or downloading models. Tasks: - Add `require scriberr-engine v0.0.0` and `replace scriberr-engine => ./references/engine`. - Add `EngineConfig` and `WorkerConfig` to `internal/config`. - Parse and validate: - `SPEECH_ENGINE_CACHE_DIR` - `SPEECH_ENGINE_PROVIDER` - `SPEECH_ENGINE_THREADS` - `SPEECH_ENGINE_MAX_LOADED` - `SPEECH_ENGINE_AUTO_DOWNLOAD` - `TRANSCRIPTION_WORKERS` - `TRANSCRIPTION_QUEUE_POLL_INTERVAL` - `TRANSCRIPTION_LEASE_TIMEOUT` - Make invalid numeric, duration, boolean, and provider values fail startup with clear config errors. - Keep startup free of model downloads. - Start de-emphasizing `WHISPERX_ENV` in config without breaking existing callers until legacy startup code is removed. Acceptance criteria: - Defaults match the spec exactly. - `auto`, `cpu`, and `cuda` provider values parse; other values fail. - Config errors are actionable and do not panic. Testing focus: - Defaults. - Invalid provider. - Invalid integer/duration/boolean. - Auto-download default. - Worker default remains one worker. Commit guidance: - Commit config tests first. - Commit config implementation and module wiring second. ## EWI-Sprint 2: Engine Provider Abstraction Goal: create a provider boundary that hides `scriberr-engine` from API, repository, and worker callers. Tasks: - Add `internal/transcription/engineprovider`. - Define provider, registry, capability, request, result, transcript word/segment, and diarization segment types. - Implement an in-memory registry with default provider lookup. - Implement a fake provider for tests. - Implement local provider wrapping `scriberr-engine/speech/engine`. - Map Scriberr requests to engine requests and map engine results back to internal provider results. - Implement provider capabilities and installed state via engine model metadata and `IsModelInstalled`. - Sanitize provider errors before they can be returned to API clients. - Log detailed errors internally without public path leakage. Acceptance criteria: - Only `engineprovider` imports `scriberr-engine`. - Local provider `ID()` is `local`. - Defaults are `whisper-base` and `diarization-default`. - Missing words return `Words: []`, never nil-dependent API behavior. - Provider registry supports future non-local providers. Testing focus: - Request mapping. - Result mapping with words. - Empty words. - Diarization result mapping. - Capability listing and installed flags. - Error sanitization. Commit guidance: - Commit interface/fake-provider tests first. - Commit local provider implementation separately. ## EWI-Sprint 3: Queue Schema and Repository Methods Goal: make transcription job state durable enough for workers, leases, progress, and executions. Tasks: - Add schema fields to `models.TranscriptionJob`: - `queued_at` - `started_at` - `failed_at` - `progress` - `progress_stage` - `claimed_by` - `claim_expires_at` - `engine_id` - Ensure `TranscriptionJobExecution` stores provider, model name/family, started/completed/failed timestamps, sanitized error, output JSON path, and request/config JSON. - Add indexes: - `idx_transcriptions_queue_claim(status, queued_at)` - `idx_transcriptions_claim_expires_at(claim_expires_at)` - Add repository methods for enqueue, claim, renew, progress, complete, fail, cancel, execution listing, and startup recovery. - Use transactions for claim and terminal-state updates. - Keep claim policy isolated behind a FIFO scheduler policy. Acceptance criteria: - Claim returns the oldest queued job by `queued_at`, `created_at`, and `id`. - Concurrent claims do not return the same job. - Lease renewal updates only the owning worker. - Startup recovery requeues processing rows regardless of stale process-local owner state. - Terminal updates keep job and latest execution consistent. Testing focus: - Migration/schema fields and indexes. - Enqueue state transition. - FIFO claim. - Concurrent claim race. - Lease renewal owner mismatch. - Startup recovery. - Complete/fail/cancel terminal transactions. Commit guidance: - Commit schema/repository tests first. - Commit schema/model updates and repository implementation second. ## EWI-Sprint 4: Durable Worker Service Goal: implement the worker loop, wake signals, cancellation, stats, leases, and shutdown behavior using fake processors. Tasks: - Add `internal/transcription/worker`. - Define `QueueService` as specified. - Implement enqueue as durable DB update plus non-blocking wake signal. - Implement workers that poll, claim one job, renew lease, process with cancellable context, and write terminal state through repository/orchestrator results. - Track cancel funcs for currently running process-local jobs. - Implement cancel behavior for queued, local running, and orphaned processing jobs. - Implement queue stats by user. - Implement clean `Start` and `Stop`. Acceptance criteria: - `Enqueue` moves jobs to queued and wakes workers. - Workers process jobs without duplicate claims. - Running jobs renew leases until terminal state. - `Stop` cancels local running jobs and waits within a bounded timeout. - Cancel returns conflict for completed/failed/canceled jobs. Testing focus: - Enqueue/wake hot path. - Fake processor completes a job. - Fake processor observes cancellation. - Cancel queued. - Cancel running. - Claim lease renewal. - Worker shutdown. - Queue stats. Commit guidance: - Commit queue-service tests first. - Commit worker implementation separately. ## EWI-Sprint 5: Orchestrator, Transcript Mapping, and Speaker Merge Goal: convert claimed jobs into completed canonical transcripts through provider calls. Tasks: - Add `internal/transcription/orchestrator`. - Implement `Processor` with job repository, provider registry, event publisher, and job logger dependencies. - Resolve audio path, provider, transcription model, diarization model, language, task, and diarization options. - Create execution rows at processing start. - Publish progress stages: - queued, - preparing, - transcribing, - diarizing, - merging, - saving, - completed/failed/canceled. - Persist canonical transcript JSON in `transcriptions.transcript_text`. - Write the same JSON to `data/transcripts/{jobID}/transcript.json` and store internal output path. - Generate fallback segments when needed. - Preserve `words: []` when words are absent. - Merge diarization speakers into words and segments using overlap. - Sanitize failures and distinguish user cancellation from engine failure. Acceptance criteria: - Fake provider can complete a transcription job end to end. - Canonical transcript JSON matches the spec. - No diarization leaves speaker fields absent. - Diarization assigns stable `SPEAKER_00` style labels. - Failures update job and execution consistently. - Context cancellation marks canceled, not failed. Testing focus: - Transcript mapper with words. - Transcript mapper without words. - Plain-text legacy fallback. - Older JSON without `words`. - Segment fallback. - Word and segment speaker overlap assignment. - Provider failure sanitization. - Cancellation path. Commit guidance: - Commit mapper/merge tests first. - Commit orchestrator tests and implementation second. ## EWI-Sprint 6: API Wiring for Real Queue Execution Goal: replace transcription placeholders with queue-backed behavior while preserving canonical API contracts. Tasks: - Inject queue service, provider registry/model service, execution service, and log reader into API handler construction. - Update create/submit to create queued rows and call `QueueService.Enqueue`. - Update retry to reset eligible terminal jobs and enqueue a new attempt. - Update cancel to call `QueueService.Cancel`. - Add progress fields to transcription get/list responses. - Implement transcript endpoint through canonical transcript parser. - Implement executions endpoint with sanitized metadata. - Implement logs endpoint as authenticated plain text with sanitization. - Implement models endpoint from provider capabilities. - Keep SSE payloads path-safe and progress-shaped. Acceptance criteria: - Create/submit return `202` queued resources. - Queue shutdown produces `503` without losing durable job state. - Fake engine worker can complete a job, and transcript endpoint returns text, segments, and words. - Executions endpoint returns execution metadata. - Logs endpoint returns sanitized text. - Models endpoint returns local capabilities and installed/default flags. - No API response/event leaks upload path, temp path, model cache path, or raw internal error details. Testing focus: - Create enqueues. - Submit upload creates file, job, and enqueue. - Retry valid and conflict states. - Cancel queued/running/conflict states. - Transcript canonical response. - Fake worker completion through API-visible state. - Events progress and completed payloads. - Executions/logs/models endpoints. - Path-leak regression tests. Commit guidance: - Commit API tests first. - Commit dependency injection and service wiring separately. - Commit endpoint behavior updates last. ## EWI-Sprint 7: Server Startup, Shutdown, and Legacy Adapter Removal Goal: make the real engine worker the default runtime path and remove obsolete Python adapter bootstrapping. Tasks: - Update `cmd/server/main.go` startup order to match the spec. - Initialize local engine provider registry after DB/repositories and before worker service. - Run DB recovery before starting workers. - Start workers before HTTP serving. - On shutdown, stop workers, cancel running jobs, close engine providers, then close DB. - Remove server startup dependencies on Python, `WHISPERX_ENV`, and legacy adapter registration. - Delete or stop compiling: - `internal/transcription/adapters/**` - legacy registry/pipeline abstractions that only serve Python adapters - obsolete adapter tests - Keep compatibility wrappers only when needed to compile API or service boundaries. Acceptance criteria: - Fresh server startup does not require Python env setup. - Startup logs include cache dir, requested provider, resolved provider when available, threads, max-loaded, and auto-download. - Shutdown releases worker and provider resources. - Legacy Python adapter code no longer participates in server runtime. Testing focus: - Server construction with fake provider/worker where practical. - Startup recovery is invoked before worker start. - Shutdown order test or focused unit test around lifecycle coordinator. - Compile regression proving deleted adapter code is no longer referenced. Commit guidance: - Commit lifecycle tests first. - Commit startup wiring. - Commit legacy deletion as its own reviewable commit. ## EWI-Sprint 8: Real Engine Integration Tests and Performance Smoke Goal: prove the local engine path works with fixture audio without making CI slow or flaky. Tasks: - Add opt-in real engine integration tests gated by `SCRIBERR_ENGINE_ITEST=1`. - Use `test-audio/jfk.wav` for fast transcription verification. - Skip cleanly with clear messages when ffmpeg, engine runtime, model downloads, CUDA runtime, or network access are unavailable. - Validate that first-run auto-download works when enabled. - Validate `SPEECH_ENGINE_AUTO_DOWNLOAD=false` fails jobs cleanly when models are missing. - Add a small benchmark or timed smoke test that reports wall time without setting brittle pass/fail thresholds. - Document manual performance commands for `jfk.wav`, `sample.wav`, and optional longer fixtures. Acceptance criteria: - Default CI remains fake-provider only. - Opt-in real test can produce a completed transcript for `test-audio/jfk.wav`. - Real test asserts non-empty text and path-safe public outputs. - Auto-download disabled path fails with a sanitized model-unavailable error. Testing focus: - Real local provider transcription with `jfk.wav`. - Optional diarization only if model/runtime availability makes it practical. - Auto-download enabled/disabled behavior. - No path leakage from real failures. Commit guidance: - Commit gated integration tests separately from runtime implementation fixes. ## EWI-Sprint 9: Hardening, Cleanup Goal: audit the full implementation for correctness, performance, privacy, and maintainability. Tasks: - Run the full backend test/vet baseline. - Run focused race/concurrency tests for queue claiming if practical. - Run opt-in `jfk.wav` real engine smoke. - Audit public API responses and events for path/token leakage. - Audit handler packages for business logic that should move behind services. - Audit repository terminal updates for transaction boundaries. - Audit worker shutdown and cancellation behavior. - Remove dead code, obsolete TODOs, and stale docs from earlier sprints. - Update `devnotes/engine-worker-sprint-tracker.md` with final verification results. Acceptance criteria: - Full fake-provider test suite passes. - Real `jfk.wav` smoke passes or has a documented external dependency blocker. - No known path leaks in API, events, logs endpoint, or tests. - Queue restart recovery is covered. - Commit history is organized by sprint and behavior. Testing focus: - Full package tests and vet. - API regression tests. - Queue concurrency/recovery tests. - Transcript compatibility tests. - Real engine smoke. Commit guidance: - Commit hardening fixes in narrow patches. - Final commit should be docs/tracker updates only. ## Minimum Test Coverage Set The target is not 100% coverage. The minimum set must cover the highest-risk paths: - Config parse defaults and invalid values. - Provider request/result mapping and sanitized errors. - Queue enqueue, claim, lease renew, recovery, cancel queued, cancel running, and no duplicate claim. - Orchestrator success, provider failure, cancellation, canonical JSON, words absent, and diarization merge. - API create/submit/retry/cancel/transcript/events/executions/logs/models hot paths. - Security/path-leak regression for API responses, events, logs, and provider errors. - Gated real engine transcription using `test-audio/jfk.wav`. ## Completion Definition This sprint run is complete when: - Fresh install starts without Python environment setup. - Missing models download automatically on first job when enabled. - A transcription moves from queued to processing to completed or failed. - Completed transcripts include text, segments, and word timestamps when available. - Missing word timestamps are represented as `words: []`. - Diarization assigns public-safe speaker labels when requested and available. - Events report progress and terminal states. - Executions and logs endpoints are implemented and sanitized. - Queue state survives restart and recovers orphaned processing jobs. - Legacy Python adapter bootstrap is gone from server startup. - Real `jfk.wav` smoke has been run or blocked by a documented external dependency.