docs: add app startup troubleshooting section

2026-06-28 06:46:25 +00:00 · 2025-12-17 11:20:50 -08:00
parent 9925e12d26
commit a066a73c7b
1 changed files with 89 additions and 5 deletions
--- a/web/project-site/src/docs/Installation.mdx
+++ b/web/project-site/src/docs/Installation.mdx
@@ -23,7 +23,7 @@ Open [http://localhost:8080](http://localhost:8080) in your browser.

 Scriberr works out of the box. However, for Homebrew or manual installations, you can customize the application behavior using environment variables or a `.env` file placed in the same directory as the binary (or where you run the command from).

-> **Docker Users:** You can ignore this section if you are using `docker-compose.yml`, as these values are set in the `environment` section there.
+> **Docker Users:** You can ignore this section if you are using `docker-compose.yml`, as these values are already configured with sane defaults.

 ### Environment Variables

@@ -73,11 +73,18 @@ services:
    ports:
      - "8080:8080"
    volumes:
-      - scriberr_data:/app/data
+      - scriberr_data:/app/data # volume for data
+      - env_data:/app/whisperx-env # volume for models and python envs
+    environment:
+      - APP_ENV=production # DO NOT CHANGE THIS
+      # CORS: comma-separated list of allowed origins for production
+      # - ALLOWED_ORIGINS=https://your-domain.com
+      # - SECURE_COOKIES=false # Uncomment this ONLY if you are not using SSL
    restart: unless-stopped

 volumes:
-  scriberr_data:
+  scriberr_data: {}
+  env_data: {}
 ```

 2.  Run the container:
@@ -94,14 +101,14 @@ If you have a compatible NVIDIA GPU, this configuration enables hardware acceler
 2.  Create a file named `docker-compose.cuda.yml`:

 ```yaml
-version: "3.9"
 services:
  scriberr:
    image: ghcr.io/rishikanthc/scriberr:v1.0.4-cuda
    ports:
      - "8080:8080"
    volumes:
-      - scriberr_data:/app/data
+      - scriberr_data:/app/data # volume for data
+      - env_data:/app/whisperx-env # volume for models and python envs
    restart: unless-stopped
    deploy:
      resources:
@@ -114,9 +121,14 @@ services:
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
      - NVIDIA_DRIVER_CAPABILITIES=compute,utility
+      - APP_ENV=production # DO NOT CHANGE THIS
+      # CORS: comma-separated list of allowed origins for production
+      # - ALLOWED_ORIGINS=https://your-domain.com
+      # - SECURE_COOKIES=false # Uncomment this ONLY if you are not using SSL

 volumes:
  scriberr_data: {}
+  env_data: {}
 ```

 3.  Run the container with the CUDA configuration:
@@ -124,3 +136,75 @@ volumes:
 ```bash
 docker compose -f docker-compose.cuda.yml up -d
 ```
+
+## App Startup
+
+When you run Scriberr for the first time, it may take several minutes to start. This is normal!
+
+The application needs to initialize the Python environments and download the necessary machine learning models (NVIDIA Sortformer, NVIDIA Canary, NVIDIA Parakeet).
+
+**Subsequent runs will be much faster** because all models and environments are persisted to the `env_data` volume (or your local mapped folders).
+
+You will know the application is ready when you see the line: `msg="Scriberr is ready" url=http://0.0.0.0:8080`.
+
+**Example Startup Log:**
+
+```text
+scriberr  | === Scriberr Container Setup ===
+scriberr  | Requested UID: 10001, GID: 10001
+scriberr  | Setting up custom user with UID=10001, GID=10001...
+scriberr  | Group with GID 10001 already exists, using it
+scriberr  | usermod: no changes
+scriberr  | Setting up data directories...
+scriberr  | === Setup Complete ===
+scriberr  | Switching to user appuser (UID=10001, GID=10001) and starting application...
+scriberr  | time=02:50:36 level="INFO " msg="Starting Scriberr" version=dev
+scriberr  | [+] Loading configuration
+scriberr  | time=02:50:36 level="INFO " msg="Registering adapters with environment path" whisperx_env=/app/whisperx-env
+scriberr  | time=02:50:36 level="INFO " msg="Adapter registration complete"
+scriberr  | [+] Connecting to database
+scriberr  | [+] Setting up authentication
+scriberr  | [+] Initializing SSE broadcaster
+scriberr  | [+] Initializing repositories
+scriberr  | [+] Initializing services
+scriberr  | [+] Initializing transcription service
+scriberr  | [+] Initializing transcription service
+scriberr  | [+] Preparing Python environment
+scriberr  | time=02:50:36 level="INFO " msg="Initializing unified transcription service"
+scriberr  | time=02:50:36 level="INFO " msg="Initializing registered models in parallel..."
+scriberr  | time=02:50:36 level="INFO " msg="Preparing NVIDIA Sortformer environment" env_path=/app/whisperx-env/parakeet
+scriberr  | time=02:50:36 level="INFO " msg="transcription model initialized" model_id=openai_whisper
+scriberr  | time=02:50:36 level="INFO " msg="Preparing NVIDIA Canary environment" env_path=/app/whisperx-env/parakeet
+scriberr  | time=02:50:36 level="INFO " msg="Preparing PyAnnote environment" env_path=/app/whisperx-env/pyannote
+scriberr  | time=02:50:36 level="INFO " msg="Preparing NVIDIA Parakeet environment" env_path=/app/whisperx-env/parakeet
+scriberr  | time=02:50:36 level="INFO " msg="Preparing WhisperX environment" env_path=/app/whisperx-env
+scriberr  | time=02:50:36 level="INFO " msg="Installing PyAnnote dependencies"
+scriberr  | time=02:50:36 level="INFO " msg="Parakeet environment not ready, setting up"
+scriberr  | time=02:50:36 level="INFO " msg="Installing Canary dependencies"
+scriberr  | time=02:50:36 level="INFO " msg="Installing Parakeet dependencies"
+scriberr  | time=02:50:36 level="INFO " msg="Downloading Sortformer model" path=/app/whisperx-env/parakeet/diar_streaming_sortformer_4spk-v2.nemo
+Downloading diar_streaming_sortformer_4spk-v2.nemo: 100% (449.5 MB / 449.5 MB)
+scriberr  | time=02:50:53 level="INFO " msg="Successfully downloaded Sortformer model" size=471367680
+scriberr  | time=02:50:53 level="INFO " msg="Sortformer environment prepared successfully"
+scriberr  | time=02:50:53 level="INFO " msg="diarization model initialized" model_id=sortformer
+scriberr  | time=02:53:11 level="INFO " msg="WhisperX environment prepared successfully"
+scriberr  | time=02:53:11 level="INFO " msg="transcription model initialized" model_id=whisperx
+scriberr  | time=02:53:14 level="INFO " msg="PyAnnote environment prepared successfully"
+scriberr  | time=02:53:14 level="INFO " msg="diarization model initialized" model_id=pyannote
+scriberr  | time=02:53:28 level="INFO " msg="Downloading Canary model" path=/app/whisperx-env/parakeet/canary-1b-v2.nemo
+scriberr  | time=02:53:28 level="INFO " msg="Downloading Parakeet model" path=/app/whisperx-env/parakeet/parakeet-tdt-0.6b-v3.nemo
+Downloading parakeet-tdt-0.6b-v3.nemo: 100% (2.3 GB / 2.3 GB)
+scriberr  | time=02:54:37 level="INFO " msg="Successfully downloaded Parakeet model" size=2509332480
+scriberr  | time=02:54:37 level="INFO " msg="Created buffered transcription script" path=/app/whisperx-env/parakeet/transcribe_buffered.py
+scriberr  | time=02:54:37 level="INFO " msg="Parakeet environment prepared successfully"
+scriberr  | time=02:54:37 level="INFO " msg="transcription model initialized" model_id=parakeet
+Downloading canary-1b-v2.nemo: 100% (5.9 GB / 5.9 GB)
+scriberr  | time=02:55:54 level="INFO " msg="Successfully downloaded Canary model" size=6358958080
+scriberr  | time=02:55:54 level="INFO " msg="Canary environment prepared successfully"
+scriberr  | time=02:55:54 level="INFO " msg="transcription model initialized" model_id=canary
+scriberr  | time=02:55:54 level="INFO " msg="Model initialization completed"
+scriberr  | time=02:55:54 level="INFO " msg="Unified transcription service initialized successfully"
+scriberr  | [+] Initializing quick transcription service
+scriberr  | [+] Starting background processing
+scriberr  | time=02:55:54 level="INFO " msg="Scriberr is ready" url=http://0.0.0.0:8080
+```