Usually, when you run the reference PDS as found from here it is the simplest configuration of it, which makes perfect sense and covers most use cases. But once you start reading the source and between the lines there are a TON of other things you can set up. Like you can store blobs in an S3 bucket, or that you can run more than one instance of the PDS container and have the two work together to process requests for the overall "PDS".

The PDS uses SQLite for the database, so scaling can only work if the containers can access those folders holding the DBs, which is usually limited to scaling to multiple containers on the same machine/VPS. Although I'm sure the more craft of y'all out there can get it working with something like Kubernetes, but I'm not as familiar with it and will not cover that here.

Scaling on the same machine may sound a bit funny, but it does help a bit with performance and lets you use more of the local resources. Especially during long run processes like repo imports. There was some work done to make the import async and not block, but it still helps to have two PDS containers if you have a lot of traffic and want to use the whole CPU you pay for.

TLDR; need a few things

  • Need Redis for shared state between the PDS containers

  • Edits to the /pds/compose.yaml to add Redis, and a new PDS instance

  • Caddyfile edit to allow load balancing

  • Edits to /pds/pds.env. Most importantly, a PDS_DPOP_SECRET

Need Redis installed

The PDS uses Redis to share various parts of details between the PDS instances. Like DPoP nonces (yay oauth) and rate limits. So it's important that you have this, or the PDS will error since each instance doesn't have the critical shared state. For this, I added a new entry to my /pds/compose.yaml for redis. I also set a password, but it's important to also block the redis public port on the server, which is usually 6379. The password is the bare minimum security, but can be brute-forced. I did this by blocking it on DigitalOcean's firewall. There are other ways in Docker Compose to make sure it's accessible to the PDS containers, but I went for the easier solution.

My config entry for redis

  redis:
    image: 'redis:alpine'
    command: redis-server --requirepass {a secure password}
    restart: unless-stopped
    ports:
      - '${FORWARD_REDIS_PORT:-6379}:6379'
    volumes:
      - 'redis-data:/data'
    healthcheck:
      test: [ "CMD", "redis-cli", "ping" ]
      retries: 3
      timeout: 5s

#This goes at the very end of the compose for the volume
volumes:
  redis-data:

Then for your /pds/pds.env you set these

#Redis
PDS_REDIS_SCRATCH_ADDRESS=localhost:6379
PDS_REDIS_SCRATCH_PASSWORD={your redis password}

New PDS compose entry

The star of the show. Adding a second entry for another PDS. It's pretty much the same. Needs access to same /pds folder for DBs, and mostly the same env vars. You do need to set a PDS_PORT on one so that your second PDS instance is on a different port. I went with 3001. Will also need to edit the depends_on: entries on things like Caddy to reflect the new container names. Don't worry compose will let you know if you missed one on compose up. Mine look about like this

  pds-one:
    container_name: pds-one
    image: ghcr.io/bluesky-social/pds:0.4
    network_mode: host
    restart: unless-stopped
    volumes:
      - type: bind
        source: /pds
        target: /pds
    env_file:
      - /pds/pds.env
  pds-two:
    container_name: pds-two
    image: ghcr.io/bluesky-social/pds:0.4
    network_mode: host
    restart: unless-stopped
    volumes:
      - type: bind
        source: /pds
        target: /pds
    env_file:
      - /pds/pds.env
    environment:
      PDS_PORT: 3001

Add load balancing to Caddy

Your Caddyfile is usually found at /pds/caddy/etc/caddy/Caddyfile. You can add a few different types of load balancing, but I went with round_robin, I like to start simple and go from there. Round robin alternates between the two PDS instances. Request 1 to pds-one, request 2 to pds-two, request 3 to pds-one, etc. Can see other options here. Depending on how you set it up your Caddyfile would look a bit like below. Main thing is find the reverse_proxy http://localhost:3000 entry and extend it with the new PDS instance url and { to go into setting upload balancing.

reverse_proxy http://localhost:3000 http://localhost:3001 {
    lb_policy round_rob
}

New env variables

In addition to the env vars we added for Redis we also have to add one for PDS_DPOP_SECRET this very important because if you don't OAuth(yay) requests will fail every other request since each instance is generating nonces from different secrets. This also has to be exactly 32 characters. Can generate one with openssl rand --hex 32. Set it will look like this

#This secret is 31 long to make sure you don't use it. will error
PDS_DPOP_SECRET=f89e1f48ff6aa5e813d5c99b85051a4f650665e451c8aab03afe7c58955dcbd

Mixing bowl

That's pretty much it! At the end, you want to run docker compose down in your /pds directory, may get an error about "PDS not found" from us changing the container names. If you do can run docker container stop pds to stop that one, then do docker compose down and docker compose up -d. Logs are a bit harder to read between two containers. But I use this to give me a nice jq read out of them. Needs to be ran in /pds

docker compose logs --no-log-prefix -f | jq -R ". as \$line | try (fromjson) catch \$line"

Also worth noting, you don't have to stop at 2 PDSs instances. Scale to your needs. Use all of those CPU cores you're paying for.