Synchronize updates and deletions of posts and media attachments #6
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Current issue: the archive scrapers operate on a last-seen-id basis.
They pull any new content of a user since a given last-seen post ID or attachment ID.
This means that updates or deletions performed to content submitted before that timestamp will not be reprocessed, thus deleted or changed content will not be synchronized.
Unfortunately, this problem is not easy to solve because of the limitations of Mastodon's API and the nature of the archive. Both the approaches are very hard to implement and come with many trade-offs
Full reconciliation
On every round of post fetches, the archive needs to pull all the posts for a given user (since the first one) and synchronize anything that has been either updated or deleted.
Why it's unfeasible
This approach would require an enormous amount of queries on every round of polling. This means:
Streaming API
Hook to the events WebSocket API (used e.g. to power notifications on the frontend)
Why it's unfeasible
Potentially feasible approach (but very high implementation cost):
Mock a Mastodon server
Use e.g. Pubby to implement a minimal ActivityPub server on the archive. The "instance" can expose a single user that follows all the verified accounts. This will enable notifications to be delivered to that user when an account posts or modifies an activity. That event can be intercepted by the rest of the archive machinery and appropriately parsed and stored.