ITH-Backend

Info

Currently this page is only available in English.

What does the ITH-Backend do?

  • Provides API for ITH session
  • Handels transcription and speech generation
  • Computes mouth movements and facial expressions
  • Serves all nessesary files for the client.

Hosting ITH-Backend with Docker

1. Install docker

If docker is not alreay installed install it from docker.com.

  • Linux
  • Windows, Mac

Install Docker Engine.

1. Create docker-compose.yaml

services:
  caddy:
    image: caddy:latest
    container_name: caddy
    restart: unless-stopped
    network_mode: host
    ports:
      - "443:443"
      - "80:80"
    volumes:
      - ./Caddyfile:/etc/caddy/Caddyfile
  ith-backend:
    build:
      context: .
    ports:
      - "8000:8000"
    volumes:
      - ./assets:/app/assets
      - ./configs/.docker.env:/app/configs/.env
      - ./static:/app/static
      - ./templates:/app/templates
    env_file:
      - ./configs/.secret_keys.env
  redis:
    image: redis:latest
    container_name: redis
    ports:
      - "6379:6379"
    command: [ "redis-server", "--requirepass", "123456" ]
  influxdb:
    image: influxdb:2.7
    container_name: influxdb
    ports:
      - "8086:8086"
    volumes:
      - influxdb-data:/var/lib/influxdb2
      - influxdb-config:/etc/influxdb2
    environment:
      - DOCKER_INFLUXDB_INIT_MODE=setup
      - DOCKER_INFLUXDB_INIT_USERNAME=admin
      - DOCKER_INFLUXDB_INIT_PASSWORD=adminpassword
      - DOCKER_INFLUXDB_INIT_ORG=avaluma
      - DOCKER_INFLUXDB_INIT_BUCKET=ith-v1
      - DOCKER_INFLUXDB_INIT_ADMIN_TOKEN=secret-influx-token
      - DOCKER_INFLUXDB_INIT_RETENTION=14d
  whisper:
    image: onerahmet/openai-whisper-asr-webservice
    container_name: whisper
    environment:
      ASR_ENGINE: openai_whisper # [openai_whisper, faster_whisper, whisperx]
      ASR_MODEL: base # (tiny, base, small, medium, large-v3, etc.)
      ASR_DEVICE: cpu
    ports:
      - "9000:9000"
    volumes:
      - $PWD/.cache/whisper:/root/.cache/ # reduce container startup time

volumes:
  influxdb-data:
  influxdb-config:

  • Remove the whisper service from docker-compose.yml
  • Set the following environment variables for ith-backend:
      WHISPER_ENDPOINT="https://api.openai.com/v1/audio/transcriptions" # or use openai compatible endpoint
      WHISPER_MODEL="whisper-1"
      WHISPER_REQUEST_BODY_AUDIO_FIELD_NAME="file"
    

Edit Env Variables

Ith-Backend

VARIABLEDescriptionDefault ValuePossible Values
SERVE_DEMO_PAGEIf it is TRUE you can test speaking and transcribing functionality on localhost:8000/demo. Should be FALSE for production.FALSETRUE, FALSE
TTS_SYSTEMDefines the default TTS provider. Further comming soon.elevenlabselevenlabs, cartesia
REDIS_HOSTredis serverredisURL to redis server
REDIS_PORTredis server port6379port number
REDIS_DB=0redis database00-15
REDIS_PASSWORDredis password123456strong password
OPENAI_API_KEYNessesary for using OpenAI hosted Whisper-sk-….
WHISPER_ENDPOINTopenai compatible endpoint of a self hosted whisperhttps://api.openai.com/v1/audio/transcriptionshttps://api.openai.com/v1/audio/transcriptions", “http://whisper:9000/asr?encode=true&task=transcribe&output=json”
WHISPER_REQUEST_BODY_AUDIO_FIELD_NAMEUse “file” for OpenAI hosted Whisper or “audio_file” for self-hosted Whisperfilefile, audio_file

Start server

docker-compose up

2. Set Up Reverse Proxy for HTTPS

Info

Modern browsers only allow microphone access over secure contexts (https) or localhost.

For providing SSL certificates for the own domain or the ip address of the server, using Caddy as reverse proxy is quick and easy solution.

  1. Create Caddyfile Replace YOUR-IP with your server’s IP address.
YOUR-IP:443 {
    reverse_proxy YOUR-IP:8000
    tls internal
}
  1. Add Caddy Service to docker-compose.yaml
# services:
  caddy:
    image: caddy:latest
    container_name: caddy
    restart: unless-stopped
    network_mode: host
    ports:
      - "443:443"
      - "80:80"
    volumes:
      - ./Caddyfile:/etc/caddy/Caddyfile

Run docker-compose up

Done!