20 KiB
Sprint Run Tracker: Engine Worker Integration
Run ID: EWI
Status: completed through EWI-Sprint 10. Docker/deployment packaging updates remain deferred by request.
This tracker belongs to devnotes/engine-worker-sprints.md and the implementation spec in devnotes/engine-worker-integration-spec.md.
EWI-Sprint 0: Integration Inventory and Commit Plan
Status: completed
Completed tasks:
- Inventoried server startup, config, schema, repository, queue, transcription stack, API placeholders, docs, Docker, and test fixtures.
- Documented the legacy adapter deletion targets.
- Documented API/service seams for create, submit, retry, cancel, transcript, events, logs, executions, models, and queue stats.
- Added structured logging requirements for config, provider, worker, queue, orchestration, and terminal states.
- Added a sprint-by-sprint commit plan for EWI-Sprints 1-10.
Artifacts:
devnotes/engine-worker-sprint-0-inventory.md
Verification:
- Inventory-only sprint. No runtime code changed.
- Focused repository inspection completed with
rg,find, and targeted source reads.
EWI-Sprint 1: Config and Engine Module Wiring
Status: completed
Completed tasks:
- Added local engine module wiring with
require scriberr-engine v0.0.0andreplace scriberr-engine => ./references/engine. - Added
config.EngineConfigandconfig.WorkerConfig. - Added
config.LoadWithError()for startup-failing validation while retainingconfig.Load()for compatibility. - Parsed and validated all
SPEECH_ENGINE_*andTRANSCRIPTION_*env vars from the spec. - Updated server startup to fail clearly on invalid config.
- Added structured startup logging for engine and worker configuration.
- Added focused config tests before implementation.
Artifacts:
go.modcmd/server/main.gointernal/config/config.gointernal/config/config_test.go
Verification:
GOCACHE=/tmp/scriberr-go-cache go test ./internal/configpassed.GOCACHE=/tmp/scriberr-go-cache go test ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middlewarepassed.GOCACHE=/tmp/scriberr-go-cache go vet ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middlewarepassed.git diff --checkpassed.
EWI-Sprint 2: Engine Provider Abstraction
Status: completed
Completed tasks:
- Added
internal/transcription/engineproviderprovider and registry interfaces. - Added internal provider request/result/capability types so
scriberr-enginetypes do not leak outside the provider boundary. - Added static provider registry with deterministic capability aggregation.
- Added local provider wrapper for
scriberr-engine/speech/engine. - Mapped Scriberr transcription and diarization requests to local engine requests.
- Forced token timestamps for local transcription requests.
- Mapped engine words and diarization segments to public-safe internal result structs.
- Added model capability discovery from the engine model specs with install state through
IsModelInstalled. - Added provider error sanitization for paths and token-like values.
- Added focused fake-engine tests for mapping, empty words, capabilities, diarization speakers, close behavior, and sanitized errors.
- Updated the main module to
go 1.26because the localscriberr-enginemodule declaresgo 1.26.
Artifacts:
internal/transcription/engineprovider/types.gointernal/transcription/engineprovider/registry.gointernal/transcription/engineprovider/local_provider.gointernal/transcription/engineprovider/sanitize.gointernal/transcription/engineprovider/*_test.gogo.modgo.sum
Verification:
GOCACHE=/tmp/scriberr-go-cache go test ./internal/transcription/engineproviderpassed.GOCACHE=/tmp/scriberr-go-cache go vet ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middlewarepassed.GOCACHE=/tmp/scriberr-go-cache go test ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middlewarepassed with escalation because an existing webhook integration test opens a localhttptestlistener.git diff --checkpassed.- Verified no non-provider Go package imports
scriberr-engine.
EWI-Sprint 3: Queue Schema and Repository Methods
Status: completed
Completed tasks:
- Added durable queue/lease/progress fields to
models.TranscriptionJob. - Added queue claim and claim-expiry indexes to the target schema.
- Extended
JobRepositorywith durable worker methods for enqueue, FIFO claim, lease renewal, startup recovery, progress, completion, failure, cancellation, and execution listing. - Implemented transactional terminal updates that keep the job row and latest execution row consistent.
- Added focused repository tests for schema/indexes, enqueue, FIFO claim, concurrent claim deduplication, owner-only lease renewal, orphan recovery, progress updates, terminal transitions, and execution listing.
- Updated existing legacy transcription test mocks to satisfy the expanded repository interface until the legacy stack is removed in later sprints.
Artifacts:
internal/models/transcription.gointernal/database/schema.gointernal/repository/implementations.gointernal/repository/job_queue_test.gointernal/transcription/adapters_test.gotests/test_helpers.go
Verification:
GOCACHE=/tmp/scriberr-go-cache go test ./internal/repository -run 'TestJobRepository'passed.GOCACHE=/tmp/scriberr-go-cache go vet ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middlewarepassed.GOCACHE=/tmp/scriberr-go-cache go test ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middlewarepassed with escalation because an existing webhook integration test opens a localhttptestlistener.git diff --checkpassed.
EWI-Sprint 4: Durable Worker Service
Status: completed
Completed tasks:
- Added
internal/transcription/workerwith the public queue service interface from the sprint plan. - Implemented durable enqueue plus non-blocking worker wake signaling.
- Implemented worker startup recovery through
RecoverOrphanedProcessing. - Implemented polling/claim loop with configurable worker count, poll interval, lease timeout, renew interval, and stop timeout.
- Implemented lease renewal while processors are running.
- Implemented process-local cancel tracking for running jobs.
- Implemented cancel behavior for queued jobs, process-local running jobs, orphaned processing jobs, and terminal-state conflicts.
- Implemented user-scoped queue stats with process-local running counts.
- Added structured lifecycle, enqueue, worker, lease-renewal, cancellation, and shutdown logs.
- Added focused worker tests with fake processors for enqueue/wake/complete, cancel queued, cancel running, lease renewal, stop cancellation, stats, and cancel conflicts.
- Added repository status-count support needed by worker stats.
Artifacts:
internal/transcription/worker/service.gointernal/transcription/worker/service_test.gointernal/repository/implementations.gointernal/transcription/adapters_test.gotests/test_helpers.go
Verification:
GOCACHE=/tmp/scriberr-go-cache go test ./internal/transcription/workerpassed.GOCACHE=/tmp/scriberr-go-cache go test ./tests -run '^$'passed.GOCACHE=/tmp/scriberr-go-cache go vet ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middlewarepassed.GOCACHE=/tmp/scriberr-go-cache go test ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middlewarepassed with escalation because an existing webhook integration test opens a localhttptestlistener.git diff --checkpassed.
EWI-Sprint 5: Orchestrator, Transcript Mapping, and Speaker Merge
Status: completed
Completed tasks:
- Added
internal/transcription/orchestratorwith a worker-compatible processor. - Added canonical transcript structs, JSON parsing, mapper, fallback segment generation, and legacy plain-text/older-JSON fallback parsing.
- Implemented overlap-based speaker assignment for words and segments with stable public
SPEAKER_00labels. - Implemented provider/model/language/task/diarization request resolution.
- Created execution rows at processor start with sanitized request/config metadata.
- Published progress stages for preparing, transcribing, diarizing, merging, saving, completed, failed, and canceled paths.
- Wrote canonical transcript JSON to the configured transcript output directory and returned the internal output path for worker completion.
- Preserved
words: []when token timestamps are absent. - Sanitized provider failures to redact paths and token-like values.
- Distinguished context cancellation from provider failure.
Artifacts:
internal/transcription/orchestrator/processor.gointernal/transcription/orchestrator/transcript.gointernal/transcription/orchestrator/processor_test.gointernal/transcription/orchestrator/transcript_test.go
Verification:
GOCACHE=/tmp/scriberr-go-cache go test ./internal/transcription/...passed.GOCACHE=/tmp/scriberr-go-cache go test ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middlewarepassed.GOCACHE=/tmp/scriberr-go-cache go vet ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middlewarepassed.git diff --checkpassed.
EWI-Sprint 6: API Wiring for Real Queue Execution
Status: completed
Completed tasks:
- Added API handler injection for durable queue service and engine provider registry.
- Wired create, submit, and retry to enqueue through the queue service.
- Mapped queue shutdown to
503 SERVICE_UNAVAILABLEwithout deleting durable job rows. - Wired cancel to queue service cancellation and mapped terminal-state conflicts to
409. - Added progress fields to transcription get/list responses.
- Implemented canonical transcript endpoint parsing for JSON, legacy text, and older JSON without
words. - Implemented executions endpoint with sanitized execution metadata and processing duration.
- Implemented logs endpoint as authenticated plain text derived from execution metadata/log files with path/token redaction.
- Implemented model listing from provider capabilities with installed/default flags.
- Updated queue stats to use queue service stats when injected, including canceled/running counts.
- Added an API event publisher adapter for orchestrator progress events with path-safe payloads.
- Added focused API tests for queue-backed create/retry/cancel, queue unavailable errors, transcript/execution/log/model/stats responses, and leak-safe errors.
Artifacts:
internal/api/router.gointernal/api/transcription_handlers.gointernal/api/admin_handlers.gointernal/api/response_models.gointernal/api/engine_worker_api_test.go- API test helper updates
Verification:
GOCACHE=/tmp/scriberr-go-cache go test ./internal/transcription/...passed.GOCACHE=/tmp/scriberr-go-cache go test ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middlewarepassed.GOCACHE=/tmp/scriberr-go-cache go vet ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middlewarepassed.git diff --checkpassed.
EWI-Sprint 7: Server Startup, Shutdown, and Legacy Adapter Removal
Status: completed
Completed tasks:
- Replaced server startup wiring with the local engine provider registry, orchestrator processor, and durable worker service.
- Removed server startup dependencies on legacy
internal/queue, Python adapter registration, unified processor, quick transcription, and embedded Python environment bootstrap. - Started durable transcription workers after database/repository/provider/API construction so worker startup recovery runs before claims.
- Wired API handler to the queue service and provider registry from real server startup.
- Updated shutdown to stop HTTP serving, stop workers, close the local provider, and close the database.
- Added server regression coverage proving
cmd/server/main.gono longer references legacy Python startup symbols. - Added worker coverage for recovering orphaned processing jobs before workers claim work.
- Stopped compiling the legacy Python adapter stack by placing adapters, registry, pipeline, unified service, quick transcription, and obsolete adapter/webhook tests behind a
legacy_pythonbuild tag. - Added package stubs for legacy-tagged packages so normal package discovery stays clean.
Artifacts:
cmd/server/main.gocmd/server/main_test.gointernal/transcription/worker/service_test.gointernal/transcription/doc.gointernal/transcription/adapters/doc.gointernal/transcription/registry/doc.gointernal/transcription/pipeline/doc.go- legacy Python adapter files tagged with
legacy_python
Verification:
GOCACHE=/tmp/scriberr-go-cache go test ./internal/transcription/...passed.GOCACHE=/tmp/scriberr-go-cache go test ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middlewarepassed.GOCACHE=/tmp/scriberr-go-cache go vet ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middlewarepassed.GOCACHE=/tmp/scriberr-go-cache go test ./tests -run '^$'passed.git diff --checkpassed.
EWI-Sprint 8: Real Engine Integration Tests and Performance Smoke
Status: completed
Completed tasks:
- Added opt-in real local engine integration tests gated by
SCRIBERR_ENGINE_ITEST=1. - Added
test-audio/jfk.wavreal transcription smoke coverage with non-empty text, non-nil words, provider identity, timing logs, and path-leak assertions. - Added auto-download-disabled missing-model coverage that uses an isolated empty cache and asserts sanitized model-unavailable behavior without downloads.
- Added a one-iteration-friendly benchmark for local JFK timing without brittle pass/fail thresholds.
- Added clean skip handling for disabled opt-in flag, missing
ffmpeg, missing fixture audio, unavailable runtime libraries, CUDA/runtime issues, and network/download failures. - Added concise smoke notes with commands for JFK transcription, optional cache override, and benchmark/manual performance recording.
Artifacts:
internal/transcription/engineprovider/real_engine_integration_test.godevnotes/engine-worker-sprint-8-smoke.md
Verification:
GOCACHE=/tmp/scriberr-go-cache go test ./internal/transcription/...passed.GOCACHE=/tmp/scriberr-go-cache go test ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middlewarepassed.GOCACHE=/tmp/scriberr-go-cache go vet ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middlewarepassed.SCRIBERR_ENGINE_ITEST=1 GOCACHE=/tmp/scriberr-go-cache go test ./internal/transcription/engineprovider -run TestRealEngineAutoDownloadDisabledMissingModelIsSanitized -vpassed.git diff --checkpassed.
EWI-Sprint 9: Hardening, Cleanup
Status: completed
Completed tasks:
- Ran the full backend test/vet baseline.
- Ran focused race checks for repository queue claiming and worker recovery/cancellation paths.
- Ran the opt-in
jfk.wavreal engine smoke path; the test passed with a documented external DNS/model-download skip. - Removed stale transcription package architecture docs and replaced them with the active engine provider, orchestrator, and worker flow.
- Deferred Docker, compose, and deployment documentation updates by request.
Artifacts:
internal/transcription/README.mddevnotes/engine-worker-sprint-tracker.md
Verification:
GOCACHE=/tmp/scriberr-go-cache go test ./internal/transcription/...passed.GOCACHE=/tmp/scriberr-go-cache go test ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middlewarepassed.GOCACHE=/tmp/scriberr-go-cache go vet ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middlewarepassed.GOCACHE=/tmp/scriberr-go-cache go test -race ./internal/repository -run TestJobRepositoryConcurrentClaimsDoNotDuplicateJobspassed.GOCACHE=/tmp/scriberr-go-cache go test -race ./internal/transcription/worker -run 'TestService(EnqueueWakeAndComplete|StartRecoversOrphanedProcessingBeforeWorkersClaim|CancelRunning)'passed.SCRIBERR_ENGINE_ITEST=1 GOCACHE=/tmp/scriberr-go-cache go test ./internal/transcription/engineprovider -run TestRealEngineJFKTranscription -vpassed with a skip because external model download DNS was unavailable.git diff --checkpassed.
EWI-Sprint 10: Hardening, Cleanup, and Release Candidate
Status: completed
Completed tasks:
- Reviewed the branch implementation against the engine worker integration spec and sprint acceptance criteria.
- Reconciled enqueue failure behavior with the durable queue contract:
503 SERVICE_UNAVAILABLEresponses keep the queued job durable for later recovery. - Applied default and selected transcription profiles to create/submit jobs, with request options overriding profile values.
- Removed duplicate global SSE progress events while preserving job-specific and global delivery through the broker.
- Sanitized
last_errorin public transcription responses, matching executions/logs redaction. - Preserved log endpoint line breaks while redacting paths and token-like values.
- Hardened the local provider against nil engine transcription/diarization results.
- Confirmed
scriberr-engineimports remain isolated to the provider package and opt-in real-engine tests. - Confirmed Docker/deployment work is intentionally not part of this release-candidate pass.
Artifacts:
internal/api/admin_handlers.gointernal/api/engine_worker_api_test.gointernal/api/events_test.gointernal/api/response_models.gointernal/api/router.gointernal/api/transcription_handlers.gointernal/api/transcriptions_test.gointernal/api/types.gointernal/transcription/engineprovider/local_provider.gointernal/transcription/engineprovider/local_provider_test.godevnotes/engine-worker-sprint-tracker.md
Verification:
GOCACHE=/tmp/scriberr-go-cache go test ./internal/api -run 'Test(CreateReturnsServiceUnavailableWhenQueueStopped|RetryPreservesNewJobWhenQueueStopped|TranscriptionCreateAppliesDefaultAndSelectedProfiles|GlobalSSEReceivesTranscriptionProgressOnce)'passed.GOCACHE=/tmp/scriberr-go-cache go test ./internal/transcription/engineprovider -run 'TestLocalProvider(TranscribeRejectsNilEngineResult|DiarizeRejectsNilEngineResult)'passed.GOCACHE=/tmp/scriberr-go-cache go test ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middlewarepassed.GOCACHE=/tmp/scriberr-go-cache go vet ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middlewarepassed.GOCACHE=/tmp/scriberr-go-cache go test -race ./internal/repository -run TestJobRepositoryConcurrentClaimsDoNotDuplicateJobspassed.GOCACHE=/tmp/scriberr-go-cache go test -race ./internal/transcription/worker -run 'TestService(EnqueueWakeAndComplete|StartRecoversOrphanedProcessingBeforeWorkersClaim|CancelRunning)'passed.SCRIBERR_ENGINE_ITEST=1 GOCACHE=/tmp/scriberr-go-cache go test ./internal/transcription/engineprovider -run TestRealEngineJFKTranscription -vpassed with a skip because external model download DNS was unavailable.git diff --checkpassed.
Additional sprint assessment:
- No additional runtime implementation sprint is required for the current engine worker integration spec.
- A separate deployment/packaging sprint is still needed before changing Dockerfiles, compose files, release packaging, or deployment docs.