Files
Scriberr/web/project-site/src/docs/Installation.mdx
2025-12-17 11:20:50 -08:00

211 lines
9.3 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Installation
Get Scriberr running on your system in a few minutes.
## Install with Homebrew (macOS & Linux)
The easiest way to install Scriberr is using Homebrew. If you dont have Homebrew installed, [get it here first](https://brew.sh/).
```bash
# Add the Scriberr tap
brew tap rishikanthc/scriberr
# Install Scriberr (automatically installs UV dependency)
brew install scriberr
# Start the server
scriberr
```
Open [http://localhost:8080](http://localhost:8080) in your browser.
## Configuration
Scriberr works out of the box. However, for Homebrew or manual installations, you can customize the application behavior using environment variables or a `.env` file placed in the same directory as the binary (or where you run the command from).
> **Docker Users:** You can ignore this section if you are using `docker-compose.yml`, as these values are already configured with sane defaults.
### Environment Variables
| Variable | Description | Default |
| :--- | :--- | :--- |
| `PORT` | The port the server listens on. | `8080` |
| `HOST` | The interface to bind to. | `0.0.0.0` |
| `APP_ENV` | Application environment (`development` or `production`). | `development` |
| `ALLOWED_ORIGINS` | CORS allowed origins (comma separated). | `http://localhost:5173,http://localhost:8080` |
| `DATABASE_PATH` | Path to the SQLite database file. | `data/scriberr.db` |
| `UPLOAD_DIR` | Directory for storing uploaded files. | `data/uploads` |
| `TRANSCRIPTS_DIR` | Directory for storing transcripts. | `data/transcripts` |
| `WHISPERX_ENV` | Path to the managed Python environment for models. | `data/whisperx-env` |
| `OPENAI_API_KEY` | API Key for OpenAI (optional). | `""` |
| `JWT_SECRET` | Secret for signing JWTs. Auto-generated if not set. | Auto-generated |
**Example `.env` file:**
```bash
# Server settings
HOST=localhost
PORT=8080
APP_ENV=production
# Paths
DATABASE_PATH=/var/lib/scriberr/data/scriberr.db
UPLOAD_DIR=/var/lib/scriberr/data/uploads
# Security
JWT_SECRET=your-super-secret-key-change-this
```
## Docker Deployment
For a containerized setup, you can use Docker. We provide two configurations: one for standard CPU usage and one optimized for NVIDIA GPUs (CUDA).
### Standard Deployment (CPU)
Use this configuration for running Scriberr on any machine without a dedicated NVIDIA GPU.
1. Create a file named `docker-compose.yml`:
```yaml
services:
scriberr:
image: ghcr.io/rishikanthc/scriberr:latest
ports:
- "8080:8080"
volumes:
- scriberr_data:/app/data # volume for data
- env_data:/app/whisperx-env # volume for models and python envs
environment:
- APP_ENV=production # DO NOT CHANGE THIS
# CORS: comma-separated list of allowed origins for production
# - ALLOWED_ORIGINS=https://your-domain.com
# - SECURE_COOKIES=false # Uncomment this ONLY if you are not using SSL
restart: unless-stopped
volumes:
scriberr_data: {}
env_data: {}
```
2. Run the container:
```bash
docker compose up -d
```
### NVIDIA GPU Deployment (CUDA)
If you have a compatible NVIDIA GPU, this configuration enables hardware acceleration for significantly faster transcription.
1. Ensure you have the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) installed.
2. Create a file named `docker-compose.cuda.yml`:
```yaml
services:
scriberr:
image: ghcr.io/rishikanthc/scriberr:v1.0.4-cuda
ports:
- "8080:8080"
volumes:
- scriberr_data:/app/data # volume for data
- env_data:/app/whisperx-env # volume for models and python envs
restart: unless-stopped
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities:
- gpu
environment:
- NVIDIA_VISIBLE_DEVICES=all
- NVIDIA_DRIVER_CAPABILITIES=compute,utility
- APP_ENV=production # DO NOT CHANGE THIS
# CORS: comma-separated list of allowed origins for production
# - ALLOWED_ORIGINS=https://your-domain.com
# - SECURE_COOKIES=false # Uncomment this ONLY if you are not using SSL
volumes:
scriberr_data: {}
env_data: {}
```
3. Run the container with the CUDA configuration:
```bash
docker compose -f docker-compose.cuda.yml up -d
```
## App Startup
When you run Scriberr for the first time, it may take several minutes to start. This is normal!
The application needs to initialize the Python environments and download the necessary machine learning models (NVIDIA Sortformer, NVIDIA Canary, NVIDIA Parakeet).
**Subsequent runs will be much faster** because all models and environments are persisted to the `env_data` volume (or your local mapped folders).
You will know the application is ready when you see the line: `msg="Scriberr is ready" url=http://0.0.0.0:8080`.
**Example Startup Log:**
```text
scriberr | === Scriberr Container Setup ===
scriberr | Requested UID: 10001, GID: 10001
scriberr | Setting up custom user with UID=10001, GID=10001...
scriberr | Group with GID 10001 already exists, using it
scriberr | usermod: no changes
scriberr | Setting up data directories...
scriberr | === Setup Complete ===
scriberr | Switching to user appuser (UID=10001, GID=10001) and starting application...
scriberr | time=02:50:36 level="INFO " msg="Starting Scriberr" version=dev
scriberr | [+] Loading configuration
scriberr | time=02:50:36 level="INFO " msg="Registering adapters with environment path" whisperx_env=/app/whisperx-env
scriberr | time=02:50:36 level="INFO " msg="Adapter registration complete"
scriberr | [+] Connecting to database
scriberr | [+] Setting up authentication
scriberr | [+] Initializing SSE broadcaster
scriberr | [+] Initializing repositories
scriberr | [+] Initializing services
scriberr | [+] Initializing transcription service
scriberr | [+] Initializing transcription service
scriberr | [+] Preparing Python environment
scriberr | time=02:50:36 level="INFO " msg="Initializing unified transcription service"
scriberr | time=02:50:36 level="INFO " msg="Initializing registered models in parallel..."
scriberr | time=02:50:36 level="INFO " msg="Preparing NVIDIA Sortformer environment" env_path=/app/whisperx-env/parakeet
scriberr | time=02:50:36 level="INFO " msg="transcription model initialized" model_id=openai_whisper
scriberr | time=02:50:36 level="INFO " msg="Preparing NVIDIA Canary environment" env_path=/app/whisperx-env/parakeet
scriberr | time=02:50:36 level="INFO " msg="Preparing PyAnnote environment" env_path=/app/whisperx-env/pyannote
scriberr | time=02:50:36 level="INFO " msg="Preparing NVIDIA Parakeet environment" env_path=/app/whisperx-env/parakeet
scriberr | time=02:50:36 level="INFO " msg="Preparing WhisperX environment" env_path=/app/whisperx-env
scriberr | time=02:50:36 level="INFO " msg="Installing PyAnnote dependencies"
scriberr | time=02:50:36 level="INFO " msg="Parakeet environment not ready, setting up"
scriberr | time=02:50:36 level="INFO " msg="Installing Canary dependencies"
scriberr | time=02:50:36 level="INFO " msg="Installing Parakeet dependencies"
scriberr | time=02:50:36 level="INFO " msg="Downloading Sortformer model" path=/app/whisperx-env/parakeet/diar_streaming_sortformer_4spk-v2.nemo
Downloading diar_streaming_sortformer_4spk-v2.nemo: 100% (449.5 MB / 449.5 MB)
scriberr | time=02:50:53 level="INFO " msg="Successfully downloaded Sortformer model" size=471367680
scriberr | time=02:50:53 level="INFO " msg="Sortformer environment prepared successfully"
scriberr | time=02:50:53 level="INFO " msg="diarization model initialized" model_id=sortformer
scriberr | time=02:53:11 level="INFO " msg="WhisperX environment prepared successfully"
scriberr | time=02:53:11 level="INFO " msg="transcription model initialized" model_id=whisperx
scriberr | time=02:53:14 level="INFO " msg="PyAnnote environment prepared successfully"
scriberr | time=02:53:14 level="INFO " msg="diarization model initialized" model_id=pyannote
scriberr | time=02:53:28 level="INFO " msg="Downloading Canary model" path=/app/whisperx-env/parakeet/canary-1b-v2.nemo
scriberr | time=02:53:28 level="INFO " msg="Downloading Parakeet model" path=/app/whisperx-env/parakeet/parakeet-tdt-0.6b-v3.nemo
Downloading parakeet-tdt-0.6b-v3.nemo: 100% (2.3 GB / 2.3 GB)
scriberr | time=02:54:37 level="INFO " msg="Successfully downloaded Parakeet model" size=2509332480
scriberr | time=02:54:37 level="INFO " msg="Created buffered transcription script" path=/app/whisperx-env/parakeet/transcribe_buffered.py
scriberr | time=02:54:37 level="INFO " msg="Parakeet environment prepared successfully"
scriberr | time=02:54:37 level="INFO " msg="transcription model initialized" model_id=parakeet
Downloading canary-1b-v2.nemo: 100% (5.9 GB / 5.9 GB)
scriberr | time=02:55:54 level="INFO " msg="Successfully downloaded Canary model" size=6358958080
scriberr | time=02:55:54 level="INFO " msg="Canary environment prepared successfully"
scriberr | time=02:55:54 level="INFO " msg="transcription model initialized" model_id=canary
scriberr | time=02:55:54 level="INFO " msg="Model initialization completed"
scriberr | time=02:55:54 level="INFO " msg="Unified transcription service initialized successfully"
scriberr | [+] Initializing quick transcription service
scriberr | [+] Starting background processing
scriberr | time=02:55:54 level="INFO " msg="Scriberr is ready" url=http://0.0.0.0:8080
```