Files
Scriberr/devnotes/engine-worker-sprint-tracker.md
2026-04-26 19:16:23 -07:00

20 KiB

Sprint Run Tracker: Engine Worker Integration

Run ID: EWI

Status: completed through EWI-Sprint 10. Docker/deployment packaging updates remain deferred by request.

This tracker belongs to devnotes/engine-worker-sprints.md and the implementation spec in devnotes/engine-worker-integration-spec.md.

EWI-Sprint 0: Integration Inventory and Commit Plan

Status: completed

Completed tasks:

  • Inventoried server startup, config, schema, repository, queue, transcription stack, API placeholders, docs, Docker, and test fixtures.
  • Documented the legacy adapter deletion targets.
  • Documented API/service seams for create, submit, retry, cancel, transcript, events, logs, executions, models, and queue stats.
  • Added structured logging requirements for config, provider, worker, queue, orchestration, and terminal states.
  • Added a sprint-by-sprint commit plan for EWI-Sprints 1-10.

Artifacts:

  • devnotes/engine-worker-sprint-0-inventory.md

Verification:

  • Inventory-only sprint. No runtime code changed.
  • Focused repository inspection completed with rg, find, and targeted source reads.

EWI-Sprint 1: Config and Engine Module Wiring

Status: completed

Completed tasks:

  • Added local engine module wiring with require scriberr-engine v0.0.0 and replace scriberr-engine => ./references/engine.
  • Added config.EngineConfig and config.WorkerConfig.
  • Added config.LoadWithError() for startup-failing validation while retaining config.Load() for compatibility.
  • Parsed and validated all SPEECH_ENGINE_* and TRANSCRIPTION_* env vars from the spec.
  • Updated server startup to fail clearly on invalid config.
  • Added structured startup logging for engine and worker configuration.
  • Added focused config tests before implementation.

Artifacts:

  • go.mod
  • cmd/server/main.go
  • internal/config/config.go
  • internal/config/config_test.go

Verification:

  • GOCACHE=/tmp/scriberr-go-cache go test ./internal/config passed.
  • GOCACHE=/tmp/scriberr-go-cache go test ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware passed.
  • GOCACHE=/tmp/scriberr-go-cache go vet ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware passed.
  • git diff --check passed.

EWI-Sprint 2: Engine Provider Abstraction

Status: completed

Completed tasks:

  • Added internal/transcription/engineprovider provider and registry interfaces.
  • Added internal provider request/result/capability types so scriberr-engine types do not leak outside the provider boundary.
  • Added static provider registry with deterministic capability aggregation.
  • Added local provider wrapper for scriberr-engine/speech/engine.
  • Mapped Scriberr transcription and diarization requests to local engine requests.
  • Forced token timestamps for local transcription requests.
  • Mapped engine words and diarization segments to public-safe internal result structs.
  • Added model capability discovery from the engine model specs with install state through IsModelInstalled.
  • Added provider error sanitization for paths and token-like values.
  • Added focused fake-engine tests for mapping, empty words, capabilities, diarization speakers, close behavior, and sanitized errors.
  • Updated the main module to go 1.26 because the local scriberr-engine module declares go 1.26.

Artifacts:

  • internal/transcription/engineprovider/types.go
  • internal/transcription/engineprovider/registry.go
  • internal/transcription/engineprovider/local_provider.go
  • internal/transcription/engineprovider/sanitize.go
  • internal/transcription/engineprovider/*_test.go
  • go.mod
  • go.sum

Verification:

  • GOCACHE=/tmp/scriberr-go-cache go test ./internal/transcription/engineprovider passed.
  • GOCACHE=/tmp/scriberr-go-cache go vet ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware passed.
  • GOCACHE=/tmp/scriberr-go-cache go test ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware passed with escalation because an existing webhook integration test opens a local httptest listener.
  • git diff --check passed.
  • Verified no non-provider Go package imports scriberr-engine.

EWI-Sprint 3: Queue Schema and Repository Methods

Status: completed

Completed tasks:

  • Added durable queue/lease/progress fields to models.TranscriptionJob.
  • Added queue claim and claim-expiry indexes to the target schema.
  • Extended JobRepository with durable worker methods for enqueue, FIFO claim, lease renewal, startup recovery, progress, completion, failure, cancellation, and execution listing.
  • Implemented transactional terminal updates that keep the job row and latest execution row consistent.
  • Added focused repository tests for schema/indexes, enqueue, FIFO claim, concurrent claim deduplication, owner-only lease renewal, orphan recovery, progress updates, terminal transitions, and execution listing.
  • Updated existing legacy transcription test mocks to satisfy the expanded repository interface until the legacy stack is removed in later sprints.

Artifacts:

  • internal/models/transcription.go
  • internal/database/schema.go
  • internal/repository/implementations.go
  • internal/repository/job_queue_test.go
  • internal/transcription/adapters_test.go
  • tests/test_helpers.go

Verification:

  • GOCACHE=/tmp/scriberr-go-cache go test ./internal/repository -run 'TestJobRepository' passed.
  • GOCACHE=/tmp/scriberr-go-cache go vet ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware passed.
  • GOCACHE=/tmp/scriberr-go-cache go test ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware passed with escalation because an existing webhook integration test opens a local httptest listener.
  • git diff --check passed.

EWI-Sprint 4: Durable Worker Service

Status: completed

Completed tasks:

  • Added internal/transcription/worker with the public queue service interface from the sprint plan.
  • Implemented durable enqueue plus non-blocking worker wake signaling.
  • Implemented worker startup recovery through RecoverOrphanedProcessing.
  • Implemented polling/claim loop with configurable worker count, poll interval, lease timeout, renew interval, and stop timeout.
  • Implemented lease renewal while processors are running.
  • Implemented process-local cancel tracking for running jobs.
  • Implemented cancel behavior for queued jobs, process-local running jobs, orphaned processing jobs, and terminal-state conflicts.
  • Implemented user-scoped queue stats with process-local running counts.
  • Added structured lifecycle, enqueue, worker, lease-renewal, cancellation, and shutdown logs.
  • Added focused worker tests with fake processors for enqueue/wake/complete, cancel queued, cancel running, lease renewal, stop cancellation, stats, and cancel conflicts.
  • Added repository status-count support needed by worker stats.

Artifacts:

  • internal/transcription/worker/service.go
  • internal/transcription/worker/service_test.go
  • internal/repository/implementations.go
  • internal/transcription/adapters_test.go
  • tests/test_helpers.go

Verification:

  • GOCACHE=/tmp/scriberr-go-cache go test ./internal/transcription/worker passed.
  • GOCACHE=/tmp/scriberr-go-cache go test ./tests -run '^$' passed.
  • GOCACHE=/tmp/scriberr-go-cache go vet ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware passed.
  • GOCACHE=/tmp/scriberr-go-cache go test ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware passed with escalation because an existing webhook integration test opens a local httptest listener.
  • git diff --check passed.

EWI-Sprint 5: Orchestrator, Transcript Mapping, and Speaker Merge

Status: completed

Completed tasks:

  • Added internal/transcription/orchestrator with a worker-compatible processor.
  • Added canonical transcript structs, JSON parsing, mapper, fallback segment generation, and legacy plain-text/older-JSON fallback parsing.
  • Implemented overlap-based speaker assignment for words and segments with stable public SPEAKER_00 labels.
  • Implemented provider/model/language/task/diarization request resolution.
  • Created execution rows at processor start with sanitized request/config metadata.
  • Published progress stages for preparing, transcribing, diarizing, merging, saving, completed, failed, and canceled paths.
  • Wrote canonical transcript JSON to the configured transcript output directory and returned the internal output path for worker completion.
  • Preserved words: [] when token timestamps are absent.
  • Sanitized provider failures to redact paths and token-like values.
  • Distinguished context cancellation from provider failure.

Artifacts:

  • internal/transcription/orchestrator/processor.go
  • internal/transcription/orchestrator/transcript.go
  • internal/transcription/orchestrator/processor_test.go
  • internal/transcription/orchestrator/transcript_test.go

Verification:

  • GOCACHE=/tmp/scriberr-go-cache go test ./internal/transcription/... passed.
  • GOCACHE=/tmp/scriberr-go-cache go test ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware passed.
  • GOCACHE=/tmp/scriberr-go-cache go vet ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware passed.
  • git diff --check passed.

EWI-Sprint 6: API Wiring for Real Queue Execution

Status: completed

Completed tasks:

  • Added API handler injection for durable queue service and engine provider registry.
  • Wired create, submit, and retry to enqueue through the queue service.
  • Mapped queue shutdown to 503 SERVICE_UNAVAILABLE without deleting durable job rows.
  • Wired cancel to queue service cancellation and mapped terminal-state conflicts to 409.
  • Added progress fields to transcription get/list responses.
  • Implemented canonical transcript endpoint parsing for JSON, legacy text, and older JSON without words.
  • Implemented executions endpoint with sanitized execution metadata and processing duration.
  • Implemented logs endpoint as authenticated plain text derived from execution metadata/log files with path/token redaction.
  • Implemented model listing from provider capabilities with installed/default flags.
  • Updated queue stats to use queue service stats when injected, including canceled/running counts.
  • Added an API event publisher adapter for orchestrator progress events with path-safe payloads.
  • Added focused API tests for queue-backed create/retry/cancel, queue unavailable errors, transcript/execution/log/model/stats responses, and leak-safe errors.

Artifacts:

  • internal/api/router.go
  • internal/api/transcription_handlers.go
  • internal/api/admin_handlers.go
  • internal/api/response_models.go
  • internal/api/engine_worker_api_test.go
  • API test helper updates

Verification:

  • GOCACHE=/tmp/scriberr-go-cache go test ./internal/transcription/... passed.
  • GOCACHE=/tmp/scriberr-go-cache go test ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware passed.
  • GOCACHE=/tmp/scriberr-go-cache go vet ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware passed.
  • git diff --check passed.

EWI-Sprint 7: Server Startup, Shutdown, and Legacy Adapter Removal

Status: completed

Completed tasks:

  • Replaced server startup wiring with the local engine provider registry, orchestrator processor, and durable worker service.
  • Removed server startup dependencies on legacy internal/queue, Python adapter registration, unified processor, quick transcription, and embedded Python environment bootstrap.
  • Started durable transcription workers after database/repository/provider/API construction so worker startup recovery runs before claims.
  • Wired API handler to the queue service and provider registry from real server startup.
  • Updated shutdown to stop HTTP serving, stop workers, close the local provider, and close the database.
  • Added server regression coverage proving cmd/server/main.go no longer references legacy Python startup symbols.
  • Added worker coverage for recovering orphaned processing jobs before workers claim work.
  • Stopped compiling the legacy Python adapter stack by placing adapters, registry, pipeline, unified service, quick transcription, and obsolete adapter/webhook tests behind a legacy_python build tag.
  • Added package stubs for legacy-tagged packages so normal package discovery stays clean.

Artifacts:

  • cmd/server/main.go
  • cmd/server/main_test.go
  • internal/transcription/worker/service_test.go
  • internal/transcription/doc.go
  • internal/transcription/adapters/doc.go
  • internal/transcription/registry/doc.go
  • internal/transcription/pipeline/doc.go
  • legacy Python adapter files tagged with legacy_python

Verification:

  • GOCACHE=/tmp/scriberr-go-cache go test ./internal/transcription/... passed.
  • GOCACHE=/tmp/scriberr-go-cache go test ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware passed.
  • GOCACHE=/tmp/scriberr-go-cache go vet ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware passed.
  • GOCACHE=/tmp/scriberr-go-cache go test ./tests -run '^$' passed.
  • git diff --check passed.

EWI-Sprint 8: Real Engine Integration Tests and Performance Smoke

Status: completed

Completed tasks:

  • Added opt-in real local engine integration tests gated by SCRIBERR_ENGINE_ITEST=1.
  • Added test-audio/jfk.wav real transcription smoke coverage with non-empty text, non-nil words, provider identity, timing logs, and path-leak assertions.
  • Added auto-download-disabled missing-model coverage that uses an isolated empty cache and asserts sanitized model-unavailable behavior without downloads.
  • Added a one-iteration-friendly benchmark for local JFK timing without brittle pass/fail thresholds.
  • Added clean skip handling for disabled opt-in flag, missing ffmpeg, missing fixture audio, unavailable runtime libraries, CUDA/runtime issues, and network/download failures.
  • Added concise smoke notes with commands for JFK transcription, optional cache override, and benchmark/manual performance recording.

Artifacts:

  • internal/transcription/engineprovider/real_engine_integration_test.go
  • devnotes/engine-worker-sprint-8-smoke.md

Verification:

  • GOCACHE=/tmp/scriberr-go-cache go test ./internal/transcription/... passed.
  • GOCACHE=/tmp/scriberr-go-cache go test ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware passed.
  • GOCACHE=/tmp/scriberr-go-cache go vet ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware passed.
  • SCRIBERR_ENGINE_ITEST=1 GOCACHE=/tmp/scriberr-go-cache go test ./internal/transcription/engineprovider -run TestRealEngineAutoDownloadDisabledMissingModelIsSanitized -v passed.
  • git diff --check passed.

EWI-Sprint 9: Hardening, Cleanup

Status: completed

Completed tasks:

  • Ran the full backend test/vet baseline.
  • Ran focused race checks for repository queue claiming and worker recovery/cancellation paths.
  • Ran the opt-in jfk.wav real engine smoke path; the test passed with a documented external DNS/model-download skip.
  • Removed stale transcription package architecture docs and replaced them with the active engine provider, orchestrator, and worker flow.
  • Deferred Docker, compose, and deployment documentation updates by request.

Artifacts:

  • internal/transcription/README.md
  • devnotes/engine-worker-sprint-tracker.md

Verification:

  • GOCACHE=/tmp/scriberr-go-cache go test ./internal/transcription/... passed.
  • GOCACHE=/tmp/scriberr-go-cache go test ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware passed.
  • GOCACHE=/tmp/scriberr-go-cache go vet ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware passed.
  • GOCACHE=/tmp/scriberr-go-cache go test -race ./internal/repository -run TestJobRepositoryConcurrentClaimsDoNotDuplicateJobs passed.
  • GOCACHE=/tmp/scriberr-go-cache go test -race ./internal/transcription/worker -run 'TestService(EnqueueWakeAndComplete|StartRecoversOrphanedProcessingBeforeWorkersClaim|CancelRunning)' passed.
  • SCRIBERR_ENGINE_ITEST=1 GOCACHE=/tmp/scriberr-go-cache go test ./internal/transcription/engineprovider -run TestRealEngineJFKTranscription -v passed with a skip because external model download DNS was unavailable.
  • git diff --check passed.

EWI-Sprint 10: Hardening, Cleanup, and Release Candidate

Status: completed

Completed tasks:

  • Reviewed the branch implementation against the engine worker integration spec and sprint acceptance criteria.
  • Reconciled enqueue failure behavior with the durable queue contract: 503 SERVICE_UNAVAILABLE responses keep the queued job durable for later recovery.
  • Applied default and selected transcription profiles to create/submit jobs, with request options overriding profile values.
  • Removed duplicate global SSE progress events while preserving job-specific and global delivery through the broker.
  • Sanitized last_error in public transcription responses, matching executions/logs redaction.
  • Preserved log endpoint line breaks while redacting paths and token-like values.
  • Hardened the local provider against nil engine transcription/diarization results.
  • Confirmed scriberr-engine imports remain isolated to the provider package and opt-in real-engine tests.
  • Confirmed Docker/deployment work is intentionally not part of this release-candidate pass.

Artifacts:

  • internal/api/admin_handlers.go
  • internal/api/engine_worker_api_test.go
  • internal/api/events_test.go
  • internal/api/response_models.go
  • internal/api/router.go
  • internal/api/transcription_handlers.go
  • internal/api/transcriptions_test.go
  • internal/api/types.go
  • internal/transcription/engineprovider/local_provider.go
  • internal/transcription/engineprovider/local_provider_test.go
  • devnotes/engine-worker-sprint-tracker.md

Verification:

  • GOCACHE=/tmp/scriberr-go-cache go test ./internal/api -run 'Test(CreateReturnsServiceUnavailableWhenQueueStopped|RetryPreservesNewJobWhenQueueStopped|TranscriptionCreateAppliesDefaultAndSelectedProfiles|GlobalSSEReceivesTranscriptionProgressOnce)' passed.
  • GOCACHE=/tmp/scriberr-go-cache go test ./internal/transcription/engineprovider -run 'TestLocalProvider(TranscribeRejectsNilEngineResult|DiarizeRejectsNilEngineResult)' passed.
  • GOCACHE=/tmp/scriberr-go-cache go test ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware passed.
  • GOCACHE=/tmp/scriberr-go-cache go vet ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware passed.
  • GOCACHE=/tmp/scriberr-go-cache go test -race ./internal/repository -run TestJobRepositoryConcurrentClaimsDoNotDuplicateJobs passed.
  • GOCACHE=/tmp/scriberr-go-cache go test -race ./internal/transcription/worker -run 'TestService(EnqueueWakeAndComplete|StartRecoversOrphanedProcessingBeforeWorkersClaim|CancelRunning)' passed.
  • SCRIBERR_ENGINE_ITEST=1 GOCACHE=/tmp/scriberr-go-cache go test ./internal/transcription/engineprovider -run TestRealEngineJFKTranscription -v passed with a skip because external model download DNS was unavailable.
  • git diff --check passed.

Additional sprint assessment:

  • No additional runtime implementation sprint is required for the current engine worker integration spec.
  • A separate deployment/packaging sprint is still needed before changing Dockerfiles, compose files, release packaging, or deployment docs.