Ever had a project where you wanted to make sure you were not recreating the same record to the user's repo? Maybe you are doing a listRecords, or keeping a record locally of what's been created on the user's repo? But how can you simplify that check and scale it to multiple instances to ensure you are not writing multiple of the same record? Well, you can do something called upserting. Upserting is a term that is a combination of the words insert and update, used mostly in database terminology. The idea is that it's a statement that checks to see if a record/row exists. If it does, it updates the record/row, if not, it creates a new one. This is usually done via a where statement on some unique data. An example in PostgreSQL.

INSERT INTO inventory (id, name, price, quantity)
VALUES (1, 'A', 16.99, 120)
ON CONFLICT(id)
DO UPDATE SET
  price = EXCLUDED.price,
  quantity = EXCLUDED.quantity;

The secret to doing an upsert with ATProto records is two things. Using a TID record key,and doing a com.atproto.repo.putRecord instead of a create.

I usually use Bluesky's TypeScript libraries for code examples. But today I am going to highlight a community made TypeScript library, atcute by and show the examples in it.

TID

TID stands for Timestamp Identifier, or in other words. It's an ID built from a timestamp in a way that is mostly guaranteed to be unique across systems. You can read about the details here, but the important bit is to know the TID is made from two pieces of information. A clock ID that is 0-1023 and a timestamp in microseconds. Why the clock ID? Well, it's to help with the randomness. If you have two PDSs churning out records, there's a chance they may each make the same TID. So if each has their own clock ID, that won't happen. Add because it uses microseconds, it becomes even more unlikely to happen.

Using @atcute/tid you can see we take the current timestamp, convert it to a TID, and then convert it back to the same exact timestamp. Remember that; it will be important later.

import * as TID from '@atcute/tid';

const rightMeow = new Date();
console.log(`It's ${rightMeow.toLocaleString()} or ${rightMeow.getTime()}`);

//TIDs timestamps are in microseconds. Padding it a bit since we don't need that precision.
const rightNowMicroSeconds = rightMeow * 1000;
//Every TID needs a clock id, can be your favorite number even.
//But make sure you use the same one if you the same TIDs from the same tiemstamps
const clockId = 23;

const rightMeowTid = TID.create(rightNowMicroSeconds, clockId);
console.log(`TID: ${rightMeowTid}`);
const { timestamp} = TID.parse(rightMeowTid);

//remove the padding
const backToMilliSeconds = timestamp / 1000;
//Get a readable timestamp for demo
const rightNowConvertedBack = new Date(backToMilliSeconds);
console.log(`Converted back: ${rightNowConvertedBack.toLocaleString()} or ${backToMilliSeconds}`);

May have also noticed we are adding 1000 and dividing 1000 when converting back. That's us adding padding to the timestamp that was in milliseconds to get it to microseconds. We're not really worried about colliding IDs and don't need the precision of microseconds.

putRecord

Usually, when you create a record in a user's repo you use com.atproto.repo.createRecord, but you can actually use com.atproto.repo.putRecord. With createRecord, if the PDS finds a record with the same record key, it will error, but with putRecord it will actually upsert. Probably seeing where this is going now? So in other words, if you're making a record in the collection com.example.something with a record key of self if you use createRecord, error. With putRecord it replaces the record there with what you uploaded. If the record isn't there, it creates a new one.

All Together

So now you know how to create a TID, how you can use putRecord to create a record even if it has the same key. So, that's the use case here? Well, let's say you have a data set locally. It has a create date that never changes, and you want to set that data set remotely to the user's repo. Maybe something like workouts on your phone for that day, or the last 10 listened to songs. Each of those usually has a created date that never change and you can make a TID from it, giving it a unique ID for when you do a putRecord ensuring that you are never creating multiples of the same record. The example ended up being a bit long to put here. But can check out the full one on this tangled repo to see how this looks in code. As well as one for the TID code above.

A short version of it using the workout/activity idea looks like this

//You want to make sure this is always the same for the applications you are generating upsert tids for
//If you use the same timestamp but a different clock id, you will get different tids
const CLOCK_ID = 23;

//A list of activities that may be gotten from your phone or wherever, but you get the whole list everytime
let activities = []


//I just finished a run, it's saved to my phone, now uploading it to the PDS
activities.push({
    $type: collection,
    type: 'run',
    startTime: new Date()
})

// We go through and upsert all activities
for (const activity of activities) {

    //Creates that unique key from the startTime of the activity so we don't have duplicates
    let rKey = TID.create(activity.startTime.getTime() * 1000, CLOCK_ID);

    await ok(rpc.post('com.atproto.repo.putRecord', {
        input: {
            repo: 'baileytownsend.dev',
            collection: 'social.pace.feed.activity',
            rkey: rKey,
            record: activity,

        }
    }));
    console.log(`Uploaded activity with rkey: ${rKey}`);
}

const rkey = TID.create(activities[0].startTime.getTime() * 1000, CLOCK_ID);

const activityFromPDS = await  ok(rpc.get('com.atproto.repo.getRecord', {
    params: {
        repo: handle,
        collection,
        rkey,
    }
}));

console.log(`The PDS shows you went on a ${activityFromPDS.value.type} at ${activityFromPDS.value.startTime.toLocaleString()}.`);

//Then this would be doing the same thing as above but
//adding a new workout to activities and resyncing to the PDS again

Now, this is not always the best thing. You can't really use it against a large local dataset and go through 100s of putRecords. I mean it would work and would not create duplicates, but that's not really economical. This works great though if you are getting the workouts you've done that day and don't want to overwrite any. Or maybe you're an atproto music stamping service like teal.fm and don't want to stamp the same song you listen to if you login to two different music stamper services. Another one is all those twitter to Bluesky account importers. Use the posted date of the tweet and won't have dups if you need to restart it. Or the same if you are backlogging your last.fm to atproto, use the played date.

We touched on the clock ID a bit earlier, but it is important that you use the same clock ID if you want to generate the same TID from a timestamp. If the clock ID is different, then it will be different TIDs if it's the same timestamp. I usually use 23, it's a lucky number for me. It also has to be between 0-1023

TID libraries

TIDs are pretty core atproto things. So usually if your programming language of choice has an atproto library, it can probably make TIDs from timestamps.

Some I know of

Fin

Thanks for reading! Happy hacking!