Skip to content

Configuration

The Whisper Server is configured via environment variables.

Environment Variables

Variable Default Description
WHISPER_HOST 0.0.0.0 Host address to bind
WHISPER_PORT 9002 Port to listen on
WHISPER_MODEL base Whisper model to use
WHISPER_DEVICE auto Device: auto, cpu, cuda
WHISPER_COMPUTE_TYPE auto Compute type for quantization
WHISPER_LANGUAGE (none) Force specific language (e.g., en)

Whisper Models

Available models from smallest to largest:

Model Size Speed Accuracy
tiny 39M Fastest Basic
base 74M Fast Good
small 244M Medium Better
medium 769M Slow Great
large-v3 1.5G Slowest Best

Recommendation

Start with base for a balance of speed and accuracy. Use small or medium if you need better results and have the hardware.

Compute Types

For GPU inference, you can specify quantization:

Type Description
auto Automatic selection
float16 Half precision (GPU)
float32 Full precision
int8 8-bit quantization
int8_float16 Mixed precision

Lower precision = faster inference but slightly reduced accuracy.

Example Configurations

Development (CPU)

export WHISPER_HOST=127.0.0.1
export WHISPER_PORT=9002
export WHISPER_MODEL=base
export WHISPER_DEVICE=cpu

Production (GPU)

export WHISPER_HOST=0.0.0.0
export WHISPER_PORT=9002
export WHISPER_MODEL=medium
export WHISPER_DEVICE=cuda
export WHISPER_COMPUTE_TYPE=float16

Specific Language

export WHISPER_LANGUAGE=en  # Force English
# or
export WHISPER_LANGUAGE=it  # Force Italian

When WHISPER_LANGUAGE is not set, the model auto-detects the language.

Using with Just

The justfile supports port and model arguments:

# Default (port 9002, model base)
just run

# Custom port
just run 8080

# Custom port and model
just run 9002 medium

# Docker with custom settings
just start-docker 9002 small

Docker Environment

Pass environment variables to Docker:

docker run -d --rm \
    --name whisper-server \
    -p 9002:9002 \
    -e WHISPER_MODEL=medium \
    -e WHISPER_DEVICE=cpu \
    -e WHISPER_LANGUAGE=en \
    ghcr.io/paolino/whisper-server:latest

NixOS Configuration

In your NixOS configuration:

{
  systemd.services.whisper-server = {
    environment = {
      WHISPER_HOST = "0.0.0.0";
      WHISPER_PORT = "9002";
      WHISPER_MODEL = "medium";
      WHISPER_DEVICE = "cuda";
      WHISPER_COMPUTE_TYPE = "float16";
    };
    # ... rest of service config
  };
}

Performance Tuning

Memory Usage

Larger models require more RAM/VRAM:

Model RAM (CPU) VRAM (GPU)
tiny ~1GB ~1GB
base ~1GB ~1GB
small ~2GB ~2GB
medium ~5GB ~5GB
large-v3 ~10GB ~10GB

First Request Latency

The model is loaded on first request. To preload at startup, the server loads the model when it starts.

Concurrent Requests

The current implementation processes one request at a time. For multiple concurrent users, consider running multiple server instances behind a load balancer.