Usually, when you run the reference PDS as found from here it is the simplest configuration of it, which makes perfect sense and covers most use cases. But once you start reading the source and between the lines there are a TON of other things you can set up. Like you can store blobs in an S3 bucket, or that you can run more than one instance of the PDS container and have the two work together to process requests for the overall "PDS".
The PDS uses SQLite for the database, so scaling can only work if the containers can access those folders holding the DBs, which is usually limited to scaling to multiple containers on the same machine/VPS. Although I'm sure the more craft of y'all out there can get it working with something like Kubernetes, but I'm not as familiar with it and will not cover that here.
Scaling on the same machine may sound a bit funny, but it does help a bit with performance and lets you use more of the local resources. Especially during long run processes like repo imports. There was some work done to make the import async and not block, but it still helps to have two PDS containers if you have a lot of traffic and want to use the whole CPU you pay for.
TLDR; need a few things
Need Redis for shared state between the PDS containers
Edits to the
/pds/compose.yamlto add Redis, and a new PDS instanceCaddyfile edit to allow load balancing
Edits to
/pds/pds.env. Most importantly, aPDS_DPOP_SECRET
Need Redis installed
The PDS uses Redis to share various parts of details between the PDS instances. Like DPoP nonces (yay oauth) and rate limits. So it's important that you have this, or the PDS will error since each instance doesn't have the critical shared state. For this, I added a new entry to my /pds/compose.yaml for redis. I also set a password, but it's important to also block the redis public port on the server, which is usually 6379. The password is the bare minimum security, but can be brute-forced. I did this by blocking it on DigitalOcean's firewall. There are other ways in Docker Compose to make sure it's accessible to the PDS containers, but I went for the easier solution.
My config entry for redis
redis:
image: 'redis:alpine'
command: redis-server --requirepass {a secure password}
restart: unless-stopped
ports:
- '${FORWARD_REDIS_PORT:-6379}:6379'
volumes:
- 'redis-data:/data'
healthcheck:
test: [ "CMD", "redis-cli", "ping" ]
retries: 3
timeout: 5s
#This goes at the very end of the compose for the volume
volumes:
redis-data:Then for your /pds/pds.env you set these
#Redis
PDS_REDIS_SCRATCH_ADDRESS=localhost:6379
PDS_REDIS_SCRATCH_PASSWORD={your redis password}New PDS compose entry
The star of the show. Adding a second entry for another PDS. It's pretty much the same. Needs access to same /pds folder for DBs, and mostly the same env vars. You do need to set a PDS_PORT on one so that your second PDS instance is on a different port. I went with 3001. Will also need to edit the depends_on: entries on things like Caddy to reflect the new container names. Don't worry compose will let you know if you missed one on compose up. Mine look about like this
pds-one:
container_name: pds-one
image: ghcr.io/bluesky-social/pds:0.4
network_mode: host
restart: unless-stopped
volumes:
- type: bind
source: /pds
target: /pds
env_file:
- /pds/pds.env
pds-two:
container_name: pds-two
image: ghcr.io/bluesky-social/pds:0.4
network_mode: host
restart: unless-stopped
volumes:
- type: bind
source: /pds
target: /pds
env_file:
- /pds/pds.env
environment:
PDS_PORT: 3001Add load balancing to Caddy
Your Caddyfile is usually found at /pds/caddy/etc/caddy/Caddyfile. You can add a few different types of load balancing, but I went with round_robin, I like to start simple and go from there. Round robin alternates between the two PDS instances. Request 1 to pds-one, request 2 to pds-two, request 3 to pds-one, etc. Can see other options here. Depending on how you set it up your Caddyfile would look a bit like below. Main thing is find the reverse_proxy http://localhost:3000 entry and extend it with the new PDS instance url and { to go into setting upload balancing.
reverse_proxy http://localhost:3000 http://localhost:3001 {
lb_policy round_rob
}New env variables
In addition to the env vars we added for Redis we also have to add one for PDS_DPOP_SECRET this very important because if you don't OAuth(yay) requests will fail every other request since each instance is generating nonces from different secrets. This also has to be exactly 32 characters. Can generate one with openssl rand --hex 32. Set it will look like this
#This secret is 31 long to make sure you don't use it. will error
PDS_DPOP_SECRET=f89e1f48ff6aa5e813d5c99b85051a4f650665e451c8aab03afe7c58955dcbdMixing bowl
That's pretty much it! At the end, you want to run docker compose down in your /pds directory, may get an error about "PDS not found" from us changing the container names. If you do can run docker container stop pds to stop that one, then do docker compose down and docker compose up -d. Logs are a bit harder to read between two containers. But I use this to give me a nice jq read out of them. Needs to be ran in /pds
docker compose logs --no-log-prefix -f | jq -R ". as \$line | try (fromjson) catch \$line"Also worth noting, you don't have to stop at 2 PDSs instances. Scale to your needs. Use all of those CPU cores you're paying for.