mirror of
https://github.com/rishikanthc/Scriberr.git
synced 2026-07-01 08:15:46 +00:00
356 lines
20 KiB
Markdown
356 lines
20 KiB
Markdown
# Sprint Run Tracker: Engine Worker Integration
|
|
|
|
Run ID: `EWI`
|
|
|
|
Status: completed through EWI-Sprint 10. Docker/deployment packaging updates remain deferred by request.
|
|
|
|
This tracker belongs to `devnotes/engine-worker-sprints.md` and the implementation spec in `devnotes/engine-worker-integration-spec.md`.
|
|
|
|
## EWI-Sprint 0: Integration Inventory and Commit Plan
|
|
|
|
Status: completed
|
|
|
|
Completed tasks:
|
|
|
|
- Inventoried server startup, config, schema, repository, queue, transcription stack, API placeholders, docs, Docker, and test fixtures.
|
|
- Documented the legacy adapter deletion targets.
|
|
- Documented API/service seams for create, submit, retry, cancel, transcript, events, logs, executions, models, and queue stats.
|
|
- Added structured logging requirements for config, provider, worker, queue, orchestration, and terminal states.
|
|
- Added a sprint-by-sprint commit plan for EWI-Sprints 1-10.
|
|
|
|
Artifacts:
|
|
|
|
- `devnotes/engine-worker-sprint-0-inventory.md`
|
|
|
|
Verification:
|
|
|
|
- Inventory-only sprint. No runtime code changed.
|
|
- Focused repository inspection completed with `rg`, `find`, and targeted source reads.
|
|
|
|
## EWI-Sprint 1: Config and Engine Module Wiring
|
|
|
|
Status: completed
|
|
|
|
Completed tasks:
|
|
|
|
- Added local engine module wiring with `require scriberr-engine v0.0.0` and `replace scriberr-engine => ./references/engine`.
|
|
- Added `config.EngineConfig` and `config.WorkerConfig`.
|
|
- Added `config.LoadWithError()` for startup-failing validation while retaining `config.Load()` for compatibility.
|
|
- Parsed and validated all `SPEECH_ENGINE_*` and `TRANSCRIPTION_*` env vars from the spec.
|
|
- Updated server startup to fail clearly on invalid config.
|
|
- Added structured startup logging for engine and worker configuration.
|
|
- Added focused config tests before implementation.
|
|
|
|
Artifacts:
|
|
|
|
- `go.mod`
|
|
- `cmd/server/main.go`
|
|
- `internal/config/config.go`
|
|
- `internal/config/config_test.go`
|
|
|
|
Verification:
|
|
|
|
- `GOCACHE=/tmp/scriberr-go-cache go test ./internal/config` passed.
|
|
- `GOCACHE=/tmp/scriberr-go-cache go test ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware` passed.
|
|
- `GOCACHE=/tmp/scriberr-go-cache go vet ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware` passed.
|
|
- `git diff --check` passed.
|
|
|
|
## EWI-Sprint 2: Engine Provider Abstraction
|
|
|
|
Status: completed
|
|
|
|
Completed tasks:
|
|
|
|
- Added `internal/transcription/engineprovider` provider and registry interfaces.
|
|
- Added internal provider request/result/capability types so `scriberr-engine` types do not leak outside the provider boundary.
|
|
- Added static provider registry with deterministic capability aggregation.
|
|
- Added local provider wrapper for `scriberr-engine/speech/engine`.
|
|
- Mapped Scriberr transcription and diarization requests to local engine requests.
|
|
- Forced token timestamps for local transcription requests.
|
|
- Mapped engine words and diarization segments to public-safe internal result structs.
|
|
- Added model capability discovery from the engine model specs with install state through `IsModelInstalled`.
|
|
- Added provider error sanitization for paths and token-like values.
|
|
- Added focused fake-engine tests for mapping, empty words, capabilities, diarization speakers, close behavior, and sanitized errors.
|
|
- Updated the main module to `go 1.26` because the local `scriberr-engine` module declares `go 1.26`.
|
|
|
|
Artifacts:
|
|
|
|
- `internal/transcription/engineprovider/types.go`
|
|
- `internal/transcription/engineprovider/registry.go`
|
|
- `internal/transcription/engineprovider/local_provider.go`
|
|
- `internal/transcription/engineprovider/sanitize.go`
|
|
- `internal/transcription/engineprovider/*_test.go`
|
|
- `go.mod`
|
|
- `go.sum`
|
|
|
|
Verification:
|
|
|
|
- `GOCACHE=/tmp/scriberr-go-cache go test ./internal/transcription/engineprovider` passed.
|
|
- `GOCACHE=/tmp/scriberr-go-cache go vet ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware` passed.
|
|
- `GOCACHE=/tmp/scriberr-go-cache go test ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware` passed with escalation because an existing webhook integration test opens a local `httptest` listener.
|
|
- `git diff --check` passed.
|
|
- Verified no non-provider Go package imports `scriberr-engine`.
|
|
|
|
## EWI-Sprint 3: Queue Schema and Repository Methods
|
|
|
|
Status: completed
|
|
|
|
Completed tasks:
|
|
|
|
- Added durable queue/lease/progress fields to `models.TranscriptionJob`.
|
|
- Added queue claim and claim-expiry indexes to the target schema.
|
|
- Extended `JobRepository` with durable worker methods for enqueue, FIFO claim, lease renewal, startup recovery, progress, completion, failure, cancellation, and execution listing.
|
|
- Implemented transactional terminal updates that keep the job row and latest execution row consistent.
|
|
- Added focused repository tests for schema/indexes, enqueue, FIFO claim, concurrent claim deduplication, owner-only lease renewal, orphan recovery, progress updates, terminal transitions, and execution listing.
|
|
- Updated existing legacy transcription test mocks to satisfy the expanded repository interface until the legacy stack is removed in later sprints.
|
|
|
|
Artifacts:
|
|
|
|
- `internal/models/transcription.go`
|
|
- `internal/database/schema.go`
|
|
- `internal/repository/implementations.go`
|
|
- `internal/repository/job_queue_test.go`
|
|
- `internal/transcription/adapters_test.go`
|
|
- `tests/test_helpers.go`
|
|
|
|
Verification:
|
|
|
|
- `GOCACHE=/tmp/scriberr-go-cache go test ./internal/repository -run 'TestJobRepository'` passed.
|
|
- `GOCACHE=/tmp/scriberr-go-cache go vet ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware` passed.
|
|
- `GOCACHE=/tmp/scriberr-go-cache go test ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware` passed with escalation because an existing webhook integration test opens a local `httptest` listener.
|
|
- `git diff --check` passed.
|
|
|
|
## EWI-Sprint 4: Durable Worker Service
|
|
|
|
Status: completed
|
|
|
|
Completed tasks:
|
|
|
|
- Added `internal/transcription/worker` with the public queue service interface from the sprint plan.
|
|
- Implemented durable enqueue plus non-blocking worker wake signaling.
|
|
- Implemented worker startup recovery through `RecoverOrphanedProcessing`.
|
|
- Implemented polling/claim loop with configurable worker count, poll interval, lease timeout, renew interval, and stop timeout.
|
|
- Implemented lease renewal while processors are running.
|
|
- Implemented process-local cancel tracking for running jobs.
|
|
- Implemented cancel behavior for queued jobs, process-local running jobs, orphaned processing jobs, and terminal-state conflicts.
|
|
- Implemented user-scoped queue stats with process-local running counts.
|
|
- Added structured lifecycle, enqueue, worker, lease-renewal, cancellation, and shutdown logs.
|
|
- Added focused worker tests with fake processors for enqueue/wake/complete, cancel queued, cancel running, lease renewal, stop cancellation, stats, and cancel conflicts.
|
|
- Added repository status-count support needed by worker stats.
|
|
|
|
Artifacts:
|
|
|
|
- `internal/transcription/worker/service.go`
|
|
- `internal/transcription/worker/service_test.go`
|
|
- `internal/repository/implementations.go`
|
|
- `internal/transcription/adapters_test.go`
|
|
- `tests/test_helpers.go`
|
|
|
|
Verification:
|
|
|
|
- `GOCACHE=/tmp/scriberr-go-cache go test ./internal/transcription/worker` passed.
|
|
- `GOCACHE=/tmp/scriberr-go-cache go test ./tests -run '^$'` passed.
|
|
- `GOCACHE=/tmp/scriberr-go-cache go vet ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware` passed.
|
|
- `GOCACHE=/tmp/scriberr-go-cache go test ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware` passed with escalation because an existing webhook integration test opens a local `httptest` listener.
|
|
- `git diff --check` passed.
|
|
|
|
## EWI-Sprint 5: Orchestrator, Transcript Mapping, and Speaker Merge
|
|
|
|
Status: completed
|
|
|
|
Completed tasks:
|
|
|
|
- Added `internal/transcription/orchestrator` with a worker-compatible processor.
|
|
- Added canonical transcript structs, JSON parsing, mapper, fallback segment generation, and legacy plain-text/older-JSON fallback parsing.
|
|
- Implemented overlap-based speaker assignment for words and segments with stable public `SPEAKER_00` labels.
|
|
- Implemented provider/model/language/task/diarization request resolution.
|
|
- Created execution rows at processor start with sanitized request/config metadata.
|
|
- Published progress stages for preparing, transcribing, diarizing, merging, saving, completed, failed, and canceled paths.
|
|
- Wrote canonical transcript JSON to the configured transcript output directory and returned the internal output path for worker completion.
|
|
- Preserved `words: []` when token timestamps are absent.
|
|
- Sanitized provider failures to redact paths and token-like values.
|
|
- Distinguished context cancellation from provider failure.
|
|
|
|
Artifacts:
|
|
|
|
- `internal/transcription/orchestrator/processor.go`
|
|
- `internal/transcription/orchestrator/transcript.go`
|
|
- `internal/transcription/orchestrator/processor_test.go`
|
|
- `internal/transcription/orchestrator/transcript_test.go`
|
|
|
|
Verification:
|
|
|
|
- `GOCACHE=/tmp/scriberr-go-cache go test ./internal/transcription/...` passed.
|
|
- `GOCACHE=/tmp/scriberr-go-cache go test ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware` passed.
|
|
- `GOCACHE=/tmp/scriberr-go-cache go vet ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware` passed.
|
|
- `git diff --check` passed.
|
|
|
|
## EWI-Sprint 6: API Wiring for Real Queue Execution
|
|
|
|
Status: completed
|
|
|
|
Completed tasks:
|
|
|
|
- Added API handler injection for durable queue service and engine provider registry.
|
|
- Wired create, submit, and retry to enqueue through the queue service.
|
|
- Mapped queue shutdown to `503 SERVICE_UNAVAILABLE` without deleting durable job rows.
|
|
- Wired cancel to queue service cancellation and mapped terminal-state conflicts to `409`.
|
|
- Added progress fields to transcription get/list responses.
|
|
- Implemented canonical transcript endpoint parsing for JSON, legacy text, and older JSON without `words`.
|
|
- Implemented executions endpoint with sanitized execution metadata and processing duration.
|
|
- Implemented logs endpoint as authenticated plain text derived from execution metadata/log files with path/token redaction.
|
|
- Implemented model listing from provider capabilities with installed/default flags.
|
|
- Updated queue stats to use queue service stats when injected, including canceled/running counts.
|
|
- Added an API event publisher adapter for orchestrator progress events with path-safe payloads.
|
|
- Added focused API tests for queue-backed create/retry/cancel, queue unavailable errors, transcript/execution/log/model/stats responses, and leak-safe errors.
|
|
|
|
Artifacts:
|
|
|
|
- `internal/api/router.go`
|
|
- `internal/api/transcription_handlers.go`
|
|
- `internal/api/admin_handlers.go`
|
|
- `internal/api/response_models.go`
|
|
- `internal/api/engine_worker_api_test.go`
|
|
- API test helper updates
|
|
|
|
Verification:
|
|
|
|
- `GOCACHE=/tmp/scriberr-go-cache go test ./internal/transcription/...` passed.
|
|
- `GOCACHE=/tmp/scriberr-go-cache go test ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware` passed.
|
|
- `GOCACHE=/tmp/scriberr-go-cache go vet ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware` passed.
|
|
- `git diff --check` passed.
|
|
|
|
## EWI-Sprint 7: Server Startup, Shutdown, and Legacy Adapter Removal
|
|
|
|
Status: completed
|
|
|
|
Completed tasks:
|
|
|
|
- Replaced server startup wiring with the local engine provider registry, orchestrator processor, and durable worker service.
|
|
- Removed server startup dependencies on legacy `internal/queue`, Python adapter registration, unified processor, quick transcription, and embedded Python environment bootstrap.
|
|
- Started durable transcription workers after database/repository/provider/API construction so worker startup recovery runs before claims.
|
|
- Wired API handler to the queue service and provider registry from real server startup.
|
|
- Updated shutdown to stop HTTP serving, stop workers, close the local provider, and close the database.
|
|
- Added server regression coverage proving `cmd/server/main.go` no longer references legacy Python startup symbols.
|
|
- Added worker coverage for recovering orphaned processing jobs before workers claim work.
|
|
- Stopped compiling the legacy Python adapter stack by placing adapters, registry, pipeline, unified service, quick transcription, and obsolete adapter/webhook tests behind a `legacy_python` build tag.
|
|
- Added package stubs for legacy-tagged packages so normal package discovery stays clean.
|
|
|
|
Artifacts:
|
|
|
|
- `cmd/server/main.go`
|
|
- `cmd/server/main_test.go`
|
|
- `internal/transcription/worker/service_test.go`
|
|
- `internal/transcription/doc.go`
|
|
- `internal/transcription/adapters/doc.go`
|
|
- `internal/transcription/registry/doc.go`
|
|
- `internal/transcription/pipeline/doc.go`
|
|
- legacy Python adapter files tagged with `legacy_python`
|
|
|
|
Verification:
|
|
|
|
- `GOCACHE=/tmp/scriberr-go-cache go test ./internal/transcription/...` passed.
|
|
- `GOCACHE=/tmp/scriberr-go-cache go test ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware` passed.
|
|
- `GOCACHE=/tmp/scriberr-go-cache go vet ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware` passed.
|
|
- `GOCACHE=/tmp/scriberr-go-cache go test ./tests -run '^$'` passed.
|
|
- `git diff --check` passed.
|
|
|
|
## EWI-Sprint 8: Real Engine Integration Tests and Performance Smoke
|
|
|
|
Status: completed
|
|
|
|
Completed tasks:
|
|
|
|
- Added opt-in real local engine integration tests gated by `SCRIBERR_ENGINE_ITEST=1`.
|
|
- Added `test-audio/jfk.wav` real transcription smoke coverage with non-empty text, non-nil words, provider identity, timing logs, and path-leak assertions.
|
|
- Added auto-download-disabled missing-model coverage that uses an isolated empty cache and asserts sanitized model-unavailable behavior without downloads.
|
|
- Added a one-iteration-friendly benchmark for local JFK timing without brittle pass/fail thresholds.
|
|
- Added clean skip handling for disabled opt-in flag, missing `ffmpeg`, missing fixture audio, unavailable runtime libraries, CUDA/runtime issues, and network/download failures.
|
|
- Added concise smoke notes with commands for JFK transcription, optional cache override, and benchmark/manual performance recording.
|
|
|
|
Artifacts:
|
|
|
|
- `internal/transcription/engineprovider/real_engine_integration_test.go`
|
|
- `devnotes/engine-worker-sprint-8-smoke.md`
|
|
|
|
Verification:
|
|
|
|
- `GOCACHE=/tmp/scriberr-go-cache go test ./internal/transcription/...` passed.
|
|
- `GOCACHE=/tmp/scriberr-go-cache go test ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware` passed.
|
|
- `GOCACHE=/tmp/scriberr-go-cache go vet ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware` passed.
|
|
- `SCRIBERR_ENGINE_ITEST=1 GOCACHE=/tmp/scriberr-go-cache go test ./internal/transcription/engineprovider -run TestRealEngineAutoDownloadDisabledMissingModelIsSanitized -v` passed.
|
|
- `git diff --check` passed.
|
|
|
|
## EWI-Sprint 9: Hardening, Cleanup
|
|
|
|
Status: completed
|
|
|
|
Completed tasks:
|
|
|
|
- Ran the full backend test/vet baseline.
|
|
- Ran focused race checks for repository queue claiming and worker recovery/cancellation paths.
|
|
- Ran the opt-in `jfk.wav` real engine smoke path; the test passed with a documented external DNS/model-download skip.
|
|
- Removed stale transcription package architecture docs and replaced them with the active engine provider, orchestrator, and worker flow.
|
|
- Deferred Docker, compose, and deployment documentation updates by request.
|
|
|
|
Artifacts:
|
|
|
|
- `internal/transcription/README.md`
|
|
- `devnotes/engine-worker-sprint-tracker.md`
|
|
|
|
Verification:
|
|
|
|
- `GOCACHE=/tmp/scriberr-go-cache go test ./internal/transcription/...` passed.
|
|
- `GOCACHE=/tmp/scriberr-go-cache go test ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware` passed.
|
|
- `GOCACHE=/tmp/scriberr-go-cache go vet ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware` passed.
|
|
- `GOCACHE=/tmp/scriberr-go-cache go test -race ./internal/repository -run TestJobRepositoryConcurrentClaimsDoNotDuplicateJobs` passed.
|
|
- `GOCACHE=/tmp/scriberr-go-cache go test -race ./internal/transcription/worker -run 'TestService(EnqueueWakeAndComplete|StartRecoversOrphanedProcessingBeforeWorkersClaim|CancelRunning)'` passed.
|
|
- `SCRIBERR_ENGINE_ITEST=1 GOCACHE=/tmp/scriberr-go-cache go test ./internal/transcription/engineprovider -run TestRealEngineJFKTranscription -v` passed with a skip because external model download DNS was unavailable.
|
|
- `git diff --check` passed.
|
|
|
|
## EWI-Sprint 10: Hardening, Cleanup, and Release Candidate
|
|
|
|
Status: completed
|
|
|
|
Completed tasks:
|
|
|
|
- Reviewed the branch implementation against the engine worker integration spec and sprint acceptance criteria.
|
|
- Reconciled enqueue failure behavior with the durable queue contract: `503 SERVICE_UNAVAILABLE` responses keep the queued job durable for later recovery.
|
|
- Applied default and selected transcription profiles to create/submit jobs, with request options overriding profile values.
|
|
- Removed duplicate global SSE progress events while preserving job-specific and global delivery through the broker.
|
|
- Sanitized `last_error` in public transcription responses, matching executions/logs redaction.
|
|
- Preserved log endpoint line breaks while redacting paths and token-like values.
|
|
- Hardened the local provider against nil engine transcription/diarization results.
|
|
- Confirmed `scriberr-engine` imports remain isolated to the provider package and opt-in real-engine tests.
|
|
- Confirmed Docker/deployment work is intentionally not part of this release-candidate pass.
|
|
|
|
Artifacts:
|
|
|
|
- `internal/api/admin_handlers.go`
|
|
- `internal/api/engine_worker_api_test.go`
|
|
- `internal/api/events_test.go`
|
|
- `internal/api/response_models.go`
|
|
- `internal/api/router.go`
|
|
- `internal/api/transcription_handlers.go`
|
|
- `internal/api/transcriptions_test.go`
|
|
- `internal/api/types.go`
|
|
- `internal/transcription/engineprovider/local_provider.go`
|
|
- `internal/transcription/engineprovider/local_provider_test.go`
|
|
- `devnotes/engine-worker-sprint-tracker.md`
|
|
|
|
Verification:
|
|
|
|
- `GOCACHE=/tmp/scriberr-go-cache go test ./internal/api -run 'Test(CreateReturnsServiceUnavailableWhenQueueStopped|RetryPreservesNewJobWhenQueueStopped|TranscriptionCreateAppliesDefaultAndSelectedProfiles|GlobalSSEReceivesTranscriptionProgressOnce)'` passed.
|
|
- `GOCACHE=/tmp/scriberr-go-cache go test ./internal/transcription/engineprovider -run 'TestLocalProvider(TranscribeRejectsNilEngineResult|DiarizeRejectsNilEngineResult)'` passed.
|
|
- `GOCACHE=/tmp/scriberr-go-cache go test ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware` passed.
|
|
- `GOCACHE=/tmp/scriberr-go-cache go vet ./internal/api ./internal/config ./internal/database ./internal/repository ./internal/transcription/... ./cmd/server ./pkg/logger ./pkg/middleware` passed.
|
|
- `GOCACHE=/tmp/scriberr-go-cache go test -race ./internal/repository -run TestJobRepositoryConcurrentClaimsDoNotDuplicateJobs` passed.
|
|
- `GOCACHE=/tmp/scriberr-go-cache go test -race ./internal/transcription/worker -run 'TestService(EnqueueWakeAndComplete|StartRecoversOrphanedProcessingBeforeWorkersClaim|CancelRunning)'` passed.
|
|
- `SCRIBERR_ENGINE_ITEST=1 GOCACHE=/tmp/scriberr-go-cache go test ./internal/transcription/engineprovider -run TestRealEngineJFKTranscription -v` passed with a skip because external model download DNS was unavailable.
|
|
- `git diff --check` passed.
|
|
|
|
Additional sprint assessment:
|
|
|
|
- No additional runtime implementation sprint is required for the current engine worker integration spec.
|
|
- A separate deployment/packaging sprint is still needed before changing Dockerfiles, compose files, release packaging, or deployment docs.
|