20 KiB
Sprint Run: Engine Worker Integration
Run ID: EWI
Status: planning only. Do not implement code from this document until the user explicitly starts an implementation sprint.
Scope
This sprint run implements devnotes/engine-worker-integration-spec.md end to end: local Go engine integration, durable transcription workers, canonical transcript JSON, executions/logs/models APIs, removal of legacy Python adapter startup paths, docs, and verification with fixture audio.
The work should stay backend-first. Frontend changes are out of scope unless an API contract change makes the current UI unable to compile or use the canonical endpoints.
Engineering Rules
- Follow test-driven development: write the narrow failing tests first, implement, then refactor.
- Keep commits small and intentional. Each sprint should usually produce 2-5 commits, grouped by behavior:
- tests that define the target behavior,
- implementation,
- cleanup/docs,
- verification fixes.
- Do not mix unrelated cleanup into implementation commits.
- Keep the API layer thin. Handlers validate, map requests/responses, and call service interfaces.
- Keep SQLite as the source of truth. In-memory state may wake workers and cancel local jobs only.
- Do not leak local file paths, model cache paths, tokens, raw command output, or env-specific internals through API responses, logs endpoints, or SSE events.
- Preserve future multi-user scheduling hooks by carrying
user_idthrough claim, execution, cancellation, and stats code. - Prefer fake providers/processors for fast tests. Real engine tests must be opt-in and skipped cleanly when runtime/model dependencies are unavailable.
- Use
test-audio/jfk.wavfor fast real-path smoke tests. Use longer audio only for opt-in local performance checks.
Validation Baseline
Run these checks before and after each implementation sprint when possible:
GOCACHE=/tmp/scriberr-go-cache go test ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware
GOCACHE=/tmp/scriberr-go-cache go vet ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware
git diff --check
Real engine validation is opt-in:
SCRIBERR_ENGINE_ITEST=1 SPEECH_ENGINE_AUTO_DOWNLOAD=true GOCACHE=/tmp/scriberr-go-cache go test ./internal/transcription/... -run 'Test.*RealEngine|Test.*JFK'
Performance-oriented manual validation should record:
- audio file used,
- selected provider and resolved provider,
- model download time if any,
- transcription wall time,
- CPU/GPU mode,
- resulting transcript word/segment counts.
EWI-Sprint 0: Integration Inventory and Commit Plan
Goal: lock down the exact files, dependencies, and deletion targets before changing runtime behavior.
Tasks:
- Inventory current transcription API handlers, queue package usage,
internal/transcriptionpackages, server startup wiring, config fields, schema fields, docs, and Docker env examples. - Identify all imports of legacy adapter/registry/pipeline code and decide whether each file is removed, replaced, or kept as a compatibility wrapper.
- Confirm
references/enginecompiles as a local replacement module and document any platform/runtime prerequisites. - Write a route/API impact matrix for create/submit/retry/cancel/models/logs/executions/events.
- Create a commit checklist for the remaining sprints.
Acceptance criteria:
- Deletion list for legacy Python adapter stack is explicit.
- No implementation sprint starts with unknown server startup dependencies.
- The intended commit grouping is documented.
Testing focus:
- Compile-only discovery if Go dependencies are available.
- No product behavior tests required.
EWI-Sprint 1: Config and Engine Module Wiring
Goal: add engine/worker configuration and module wiring without starting workers or downloading models.
Tasks:
- Add
require scriberr-engine v0.0.0andreplace scriberr-engine => ./references/engine. - Add
EngineConfigandWorkerConfigtointernal/config. - Parse and validate:
SPEECH_ENGINE_CACHE_DIRSPEECH_ENGINE_PROVIDERSPEECH_ENGINE_THREADSSPEECH_ENGINE_MAX_LOADEDSPEECH_ENGINE_AUTO_DOWNLOADTRANSCRIPTION_WORKERSTRANSCRIPTION_QUEUE_POLL_INTERVALTRANSCRIPTION_LEASE_TIMEOUT
- Make invalid numeric, duration, boolean, and provider values fail startup with clear config errors.
- Keep startup free of model downloads.
- Start de-emphasizing
WHISPERX_ENVin config without breaking existing callers until legacy startup code is removed.
Acceptance criteria:
- Defaults match the spec exactly.
auto,cpu, andcudaprovider values parse; other values fail.- Config errors are actionable and do not panic.
Testing focus:
- Defaults.
- Invalid provider.
- Invalid integer/duration/boolean.
- Auto-download default.
- Worker default remains one worker.
Commit guidance:
- Commit config tests first.
- Commit config implementation and module wiring second.
EWI-Sprint 2: Engine Provider Abstraction
Goal: create a provider boundary that hides scriberr-engine from API, repository, and worker callers.
Tasks:
- Add
internal/transcription/engineprovider. - Define provider, registry, capability, request, result, transcript word/segment, and diarization segment types.
- Implement an in-memory registry with default provider lookup.
- Implement a fake provider for tests.
- Implement local provider wrapping
scriberr-engine/speech/engine. - Map Scriberr requests to engine requests and map engine results back to internal provider results.
- Implement provider capabilities and installed state via engine model metadata and
IsModelInstalled. - Sanitize provider errors before they can be returned to API clients.
- Log detailed errors internally without public path leakage.
Acceptance criteria:
- Only
engineproviderimportsscriberr-engine. - Local provider
ID()islocal. - Defaults are
whisper-baseanddiarization-default. - Missing words return
Words: [], never nil-dependent API behavior. - Provider registry supports future non-local providers.
Testing focus:
- Request mapping.
- Result mapping with words.
- Empty words.
- Diarization result mapping.
- Capability listing and installed flags.
- Error sanitization.
Commit guidance:
- Commit interface/fake-provider tests first.
- Commit local provider implementation separately.
EWI-Sprint 3: Queue Schema and Repository Methods
Goal: make transcription job state durable enough for workers, leases, progress, and executions.
Tasks:
- Add schema fields to
models.TranscriptionJob:queued_atstarted_atfailed_atprogressprogress_stageclaimed_byclaim_expires_atengine_id
- Ensure
TranscriptionJobExecutionstores provider, model name/family, started/completed/failed timestamps, sanitized error, output JSON path, and request/config JSON. - Add indexes:
idx_transcriptions_queue_claim(status, queued_at)idx_transcriptions_claim_expires_at(claim_expires_at)
- Add repository methods for enqueue, claim, renew, progress, complete, fail, cancel, execution listing, and startup recovery.
- Use transactions for claim and terminal-state updates.
- Keep claim policy isolated behind a FIFO scheduler policy.
Acceptance criteria:
- Claim returns the oldest queued job by
queued_at,created_at, andid. - Concurrent claims do not return the same job.
- Lease renewal updates only the owning worker.
- Startup recovery requeues processing rows regardless of stale process-local owner state.
- Terminal updates keep job and latest execution consistent.
Testing focus:
- Migration/schema fields and indexes.
- Enqueue state transition.
- FIFO claim.
- Concurrent claim race.
- Lease renewal owner mismatch.
- Startup recovery.
- Complete/fail/cancel terminal transactions.
Commit guidance:
- Commit schema/repository tests first.
- Commit schema/model updates and repository implementation second.
EWI-Sprint 4: Durable Worker Service
Goal: implement the worker loop, wake signals, cancellation, stats, leases, and shutdown behavior using fake processors.
Tasks:
- Add
internal/transcription/worker. - Define
QueueServiceas specified. - Implement enqueue as durable DB update plus non-blocking wake signal.
- Implement workers that poll, claim one job, renew lease, process with cancellable context, and write terminal state through repository/orchestrator results.
- Track cancel funcs for currently running process-local jobs.
- Implement cancel behavior for queued, local running, and orphaned processing jobs.
- Implement queue stats by user.
- Implement clean
StartandStop.
Acceptance criteria:
Enqueuemoves jobs to queued and wakes workers.- Workers process jobs without duplicate claims.
- Running jobs renew leases until terminal state.
Stopcancels local running jobs and waits within a bounded timeout.- Cancel returns conflict for completed/failed/canceled jobs.
Testing focus:
- Enqueue/wake hot path.
- Fake processor completes a job.
- Fake processor observes cancellation.
- Cancel queued.
- Cancel running.
- Claim lease renewal.
- Worker shutdown.
- Queue stats.
Commit guidance:
- Commit queue-service tests first.
- Commit worker implementation separately.
EWI-Sprint 5: Orchestrator, Transcript Mapping, and Speaker Merge
Goal: convert claimed jobs into completed canonical transcripts through provider calls.
Tasks:
- Add
internal/transcription/orchestrator. - Implement
Processorwith job repository, provider registry, event publisher, and job logger dependencies. - Resolve audio path, provider, transcription model, diarization model, language, task, and diarization options.
- Create execution rows at processing start.
- Publish progress stages:
- queued,
- preparing,
- transcribing,
- diarizing,
- merging,
- saving,
- completed/failed/canceled.
- Persist canonical transcript JSON in
transcriptions.transcript_text. - Write the same JSON to
data/transcripts/{jobID}/transcript.jsonand store internal output path. - Generate fallback segments when needed.
- Preserve
words: []when words are absent. - Merge diarization speakers into words and segments using overlap.
- Sanitize failures and distinguish user cancellation from engine failure.
Acceptance criteria:
- Fake provider can complete a transcription job end to end.
- Canonical transcript JSON matches the spec.
- No diarization leaves speaker fields absent.
- Diarization assigns stable
SPEAKER_00style labels. - Failures update job and execution consistently.
- Context cancellation marks canceled, not failed.
Testing focus:
- Transcript mapper with words.
- Transcript mapper without words.
- Plain-text legacy fallback.
- Older JSON without
words. - Segment fallback.
- Word and segment speaker overlap assignment.
- Provider failure sanitization.
- Cancellation path.
Commit guidance:
- Commit mapper/merge tests first.
- Commit orchestrator tests and implementation second.
EWI-Sprint 6: API Wiring for Real Queue Execution
Goal: replace transcription placeholders with queue-backed behavior while preserving canonical API contracts.
Tasks:
- Inject queue service, provider registry/model service, execution service, and log reader into API handler construction.
- Update create/submit to create queued rows and call
QueueService.Enqueue. - Update retry to reset eligible terminal jobs and enqueue a new attempt.
- Update cancel to call
QueueService.Cancel. - Add progress fields to transcription get/list responses.
- Implement transcript endpoint through canonical transcript parser.
- Implement executions endpoint with sanitized metadata.
- Implement logs endpoint as authenticated plain text with sanitization.
- Implement models endpoint from provider capabilities.
- Keep SSE payloads path-safe and progress-shaped.
Acceptance criteria:
- Create/submit return
202queued resources. - Queue shutdown produces
503without losing durable job state. - Fake engine worker can complete a job, and transcript endpoint returns text, segments, and words.
- Executions endpoint returns execution metadata.
- Logs endpoint returns sanitized text.
- Models endpoint returns local capabilities and installed/default flags.
- No API response/event leaks upload path, temp path, model cache path, or raw internal error details.
Testing focus:
- Create enqueues.
- Submit upload creates file, job, and enqueue.
- Retry valid and conflict states.
- Cancel queued/running/conflict states.
- Transcript canonical response.
- Fake worker completion through API-visible state.
- Events progress and completed payloads.
- Executions/logs/models endpoints.
- Path-leak regression tests.
Commit guidance:
- Commit API tests first.
- Commit dependency injection and service wiring separately.
- Commit endpoint behavior updates last.
EWI-Sprint 7: Server Startup, Shutdown, and Legacy Adapter Removal
Goal: make the real engine worker the default runtime path and remove obsolete Python adapter bootstrapping.
Tasks:
- Update
cmd/server/main.gostartup order to match the spec. - Initialize local engine provider registry after DB/repositories and before worker service.
- Run DB recovery before starting workers.
- Start workers before HTTP serving.
- On shutdown, stop workers, cancel running jobs, close engine providers, then close DB.
- Remove server startup dependencies on Python,
WHISPERX_ENV, and legacy adapter registration. - Delete or stop compiling:
internal/transcription/adapters/**- legacy registry/pipeline abstractions that only serve Python adapters
- obsolete adapter tests
- Keep compatibility wrappers only when needed to compile API or service boundaries.
Acceptance criteria:
- Fresh server startup does not require Python env setup.
- Startup logs include cache dir, requested provider, resolved provider when available, threads, max-loaded, and auto-download.
- Shutdown releases worker and provider resources.
- Legacy Python adapter code no longer participates in server runtime.
Testing focus:
- Server construction with fake provider/worker where practical.
- Startup recovery is invoked before worker start.
- Shutdown order test or focused unit test around lifecycle coordinator.
- Compile regression proving deleted adapter code is no longer referenced.
Commit guidance:
- Commit lifecycle tests first.
- Commit startup wiring.
- Commit legacy deletion as its own reviewable commit.
EWI-Sprint 8: Real Engine Integration Tests and Performance Smoke
Goal: prove the local engine path works with fixture audio without making CI slow or flaky.
Tasks:
- Add opt-in real engine integration tests gated by
SCRIBERR_ENGINE_ITEST=1. - Use
test-audio/jfk.wavfor fast transcription verification. - Skip cleanly with clear messages when ffmpeg, engine runtime, model downloads, CUDA runtime, or network access are unavailable.
- Validate that first-run auto-download works when enabled.
- Validate
SPEECH_ENGINE_AUTO_DOWNLOAD=falsefails jobs cleanly when models are missing. - Add a small benchmark or timed smoke test that reports wall time without setting brittle pass/fail thresholds.
- Document manual performance commands for
jfk.wav,sample.wav, and optional longer fixtures.
Acceptance criteria:
- Default CI remains fake-provider only.
- Opt-in real test can produce a completed transcript for
test-audio/jfk.wav. - Real test asserts non-empty text and path-safe public outputs.
- Auto-download disabled path fails with a sanitized model-unavailable error.
Testing focus:
- Real local provider transcription with
jfk.wav. - Optional diarization only if model/runtime availability makes it practical.
- Auto-download enabled/disabled behavior.
- No path leakage from real failures.
Commit guidance:
- Commit gated integration tests separately from runtime implementation fixes.
EWI-Sprint 9: Docs, Docker, and Setup UX
Goal: make first-run local and Docker usage understandable and aligned with the new engine architecture.
Tasks:
- Update README configuration table with
SPEECH_ENGINE_*andTRANSCRIPTION_*variables. - Remove or de-emphasize
WHISPERX_ENV. - Update Docker compose files to persist
data/modelsor mapSPEECH_ENGINE_CACHE_DIR. - Keep CUDA Docker configuration on provider
autoand document NVIDIA runtime/container toolkit prerequisites. - Add troubleshooting notes for:
- missing
ffmpeg, - model download failure,
- forced CUDA unavailable,
- first job slow due to model download,
- auto-download disabled.
- missing
- Document the opt-in real engine test command and fixture audio expectations.
Acceptance criteria:
- README and Docker examples match implemented defaults.
- CPU fresh install path is documented without pre-download steps.
- CUDA behavior is explicit:
autofalls back,cudarequires CUDA.
Testing focus:
- Docs command snippets are syntax-checked where practical.
docker-composeYAML remains parseable if tooling is available.
Commit guidance:
- Commit docs and Docker changes separately from code.
EWI-Sprint 10: Hardening, Cleanup, and Release Candidate
Goal: audit the full implementation for correctness, performance, privacy, and maintainability.
Tasks:
- Run the full backend test/vet baseline.
- Run focused race/concurrency tests for queue claiming if practical.
- Run opt-in
jfk.wavreal engine smoke. - Audit public API responses and events for path/token leakage.
- Audit handler packages for business logic that should move behind services.
- Audit repository terminal updates for transaction boundaries.
- Audit worker shutdown and cancellation behavior.
- Remove dead code, obsolete TODOs, and stale docs from earlier sprints.
- Update
devnotes/engine-worker-sprint-tracker.mdwith final verification results.
Acceptance criteria:
- Full fake-provider test suite passes.
- Real
jfk.wavsmoke passes or has a documented external dependency blocker. - No known path leaks in API, events, logs endpoint, or tests.
- Queue restart recovery is covered.
- Commit history is organized by sprint and behavior.
Testing focus:
- Full package tests and vet.
- API regression tests.
- Queue concurrency/recovery tests.
- Transcript compatibility tests.
- Real engine smoke.
Commit guidance:
- Commit hardening fixes in narrow patches.
- Final commit should be docs/tracker updates only.
Minimum Test Coverage Set
The target is not 100% coverage. The minimum set must cover the highest-risk paths:
- Config parse defaults and invalid values.
- Provider request/result mapping and sanitized errors.
- Queue enqueue, claim, lease renew, recovery, cancel queued, cancel running, and no duplicate claim.
- Orchestrator success, provider failure, cancellation, canonical JSON, words absent, and diarization merge.
- API create/submit/retry/cancel/transcript/events/executions/logs/models hot paths.
- Security/path-leak regression for API responses, events, logs, and provider errors.
- Gated real engine transcription using
test-audio/jfk.wav.
Completion Definition
This sprint run is complete when:
- Fresh install starts without Python environment setup.
- Missing models download automatically on first job when enabled.
- A transcription moves from queued to processing to completed or failed.
- Completed transcripts include text, segments, and word timestamps when available.
- Missing word timestamps are represented as
words: []. - Diarization assigns public-safe speaker labels when requested and available.
- Events report progress and terminal states.
- Executions and logs endpoints are implemented and sanitized.
- Queue state survives restart and recovers orphaned processing jobs.
- Legacy Python adapter bootstrap is gone from server startup.
- Real
jfk.wavsmoke has been run or blocked by a documented external dependency.