Fork of Scribe - an alternative Medium frontend

Find a file

Edward Loveall 7518a035b1 Proxy GitHub gists with rate limiting Previously, GitHub gists were embedded. The gist url would be detected in a paragraph and the page would render a script like: ```html <script src="https://gist.github.com/user/gist_id.js"></script> ``` The script would then embed the gist on the page. However, gists contain multiple files. It's technically possible to embed a single file in the same way by appending a `file` query param: ```html <script src="https://gist.github.com/user/gist_id.js?file=foo.txt"></script> ``` I wanted to try and tackle proxying gists instead. Overview -------- At a high level the PageConverter kicks off the work of fetching and storing the gist content, then sends that content down to the `ParagraphConverter`. When a paragraph comes up that contains a gist embed, it retrieves the previously fetched content. This allows all the necessary content to be fetched up front so the minimum number of requests need to be made. Fetching Gists -------------- There is now a `GithubClient` class that gets gist content from GitHub's ReST API. The gist API response looks something like this (non-relevant keys removed): ```json { "files": { "file-one.txt": { "filename": "file-one.txt", "raw_url": "https://gist.githubusercontent.com/<username>/<id>/raw/<file_id>/file-o ne.txt", "content": "..." }, "file-two.txt": { "filename": "file-two.txt", "raw_url": "https://gist.githubusercontent.com/<username>/<id>/raw/<file_id>/file-t wo.txt", "content": "..." } } } ``` That response gets turned into a bunch of `GistFile` objects that are then stored in a request-level `GistStore`. Crystal's JSON parsing does not make it easy to parse json with arbitrary keys into objects. This is because each key corresponds to an object property, like `property name : String`. If Crystal doesn't know the keys ahead of time, there's no way to know what methods to create. That's a problem here because the key for each gist file is the unique filename. Fortunately, the keys for each _file_ follows the same pattern and are easy to parse into a `GistFile` object. To turn gist file JSON into Crystal objects, the `GithubClient` turns the whole response into a `JSON::Any` which is like a Hash. Then it extracts just the file data objects and parses those into `GistFile` objects. Those `GistFile` objects are then cached in a `GistStore` that is shared for the page, which means one gist cache per request/article. `GistFile` objects can be fetched out of the store by file, or if no file is specified, it returns all files in the gist. The GistFile is rendered as a link of the file's name to the file in the gist on GitHub, and then a code block of the contents of the file. In summary, the `PageConverter`: * Scans the paragraphs for GitHub gists using `GistScanner` * Requests their data from GitHub using the `GithubClient` * Parses the response into `GistFile`s and populates the `GistStore` * Passes that `GistStore` to the `ParagraphConverter` to use when constructing the page nodes Caching ------- GitHub limits API requests to 5000/hour with a valid api token and 60/hour without. 60 is pretty tight for the usage that scribe.rip gets, but 5000 is reasonable most of the time. Not every article has an embedded gist, but some articles have multiple gists. A viral article (of which Scribe has seen two at the time of this commit) might receive a little over 127k hits/day, which is an average of over 5300/hour. If that article had a gist, Scribe would reach the API limit during parts of the day with high traffic. If it had multiple gists, it would hit it even more. However, average traffic is around 30k visits/day which would be well under the limit, assuming average load. To help not hit that limit, a `GistStore` holds all the `GistFile` objects per gist. The logic in `GistScanner` is smart enough to only return unique gist URLs so each gist is only requested once even if multiple files from one gist exist in an article. This limits the number of times Scribe hits the GitHub API. If Scribe is rate-limited, instead of populating a `GistCache` the `PageConverter` will create a `RateLimitedGistStore`. This is an object that acts like the `GistStore` but returns `RateLimitedGistFile` objects instead of `GistFile` objects. This allows Scribe to gracefully degrade in the event of reaching the rate limit. If rate-limiting becomes a regular problem, Scribe could also be reworked to fallback to the embedded gists again. API Credentials --------------- API credentials are in the form of a GitHub username and a personal access token attached to that username. To get a token, visit https://github.com/settings/tokens and create a new token. The only permission it needs is `gist`. This token is set via the `GITHUB_PERSONAL_ACCESS_TOKEN` environment variable. The username also needs to be set via `GITHUB_USERNAME`. When developing locally, these can both be set in the .env file. Authentication is probably not necessary locally, but it's there if you want to test. If either token is missing, unauthenticated requests are made. Rendering --------- The node tree itself holds a `GithubGist` object. It has a reference to the `GistStore` and the original gist URL. When it renders the page requests the gist's `files`. The gist ID and optional file are detected, and then used to request the file(s) from the `GistStore`. Gists render as a list of each files contents and a link to the file on GitHub. If the requests were rate limited, the store is a `RateLimitedGistStore` and the files are `RateLimitedGistFile`s. These rate-limited objects rendered with a link to the gist on GitHub and text saying that Scribe has been rate-limited. If somehow the file requested doesn't exist in the store, it displays similarly to the rate-limited file but with "file missing" text instead of "rate limited" text. GitHub API docs: https://docs.github.com/en/rest/reference/gists Rate Limiting docs: https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate- limiting		2022-01-23 15:05:46 -05:00
.github/workflows	Initial app	2021-05-01 17:03:38 -04:00
config	Add instructions for Lucky config variables	2022-01-15 16:29:46 -05:00
db/migrations	Initial app	2021-05-01 17:03:38 -04:00
docs	Add scribe.bus-hit.me instance	2022-01-16 22:05:31 -05:00
public	Initial app	2021-05-01 17:03:38 -04:00
script	Ensure that scr/version is up-to-date when building	2022-01-15 16:31:02 -05:00
spec	Proxy GitHub gists with rate limiting	2022-01-23 15:05:46 -05:00
src	Proxy GitHub gists with rate limiting	2022-01-23 15:05:46 -05:00
tasks	Initial app	2021-05-01 17:03:38 -04:00
.crystal-version	Upgrade Crystal to 1.2.1 and Lucky to 0.29.0	2021-12-12 12:01:55 -05:00
.dockerignore	Add Dockerfile	2021-10-16 10:56:15 -04:00
.editorconfig	Initial app	2021-05-01 17:03:38 -04:00
.gitignore	Add instance docs	2021-11-11 11:33:22 -05:00
.tool-versions	Upgrade Crystal to 1.2.1 and Lucky to 0.29.0	2021-12-12 12:01:55 -05:00
bs-config.js	Initial app	2021-05-01 17:03:38 -04:00
Dockerfile	update crystal version in Dockerfile	2021-12-15 21:29:41 -05:00
flake.lock	Add support for development with Nix	2021-10-15 08:56:15 -04:00
flake.nix	Add support for development with Nix	2021-10-15 08:56:15 -04:00
LICENSE	Add License	2021-09-12 17:34:48 -04:00
package.json	Upgrade Crystal to 1.2.1 and Lucky to 0.29.0	2021-12-12 12:01:55 -05:00
Procfile	Initial app	2021-05-01 17:03:38 -04:00
Procfile.dev	Initial app	2021-05-01 17:03:38 -04:00
README.md	Add instructions for Lucky config variables	2022-01-15 16:29:46 -05:00
shard.lock	Upgrade Crystal to 1.2.1 and Lucky to 0.29.0	2021-12-12 12:01:55 -05:00
shard.yml	Upgrade Crystal to 1.2.1 and Lucky to 0.29.0	2021-12-12 12:01:55 -05:00
shell.nix	Add support for development with Nix	2021-10-15 08:56:15 -04:00
tasks.cr	Initial app	2021-05-01 17:03:38 -04:00
webpack.mix.js	Add tufte.css	2021-08-29 15:19:40 -04:00
yarn.lock	Upgrade Crystal to 1.2.1 and Lucky to 0.29.0	2021-12-12 12:01:55 -05:00

README.md

Scribe - An Alternative Medium Frontend

This is a project written using Lucky. It's main website is scribe.rip.

Deploying Your Own

I'd love it if you deployed your own version of this app! A few others have already. To do so currently will take some knowledge of how a webserver runs. This app is built with the Lucky framework and there are a bunch of different ways to deploy. The main instance runs on Ubuntu but there are also directions for Heroku or Dokku.

One thing to note is that this app doesn't currently use a database. Any instructions around postgres can be safely ignored. However, Lucky (and it's dependency Avram) do require a DATABASE_URL formatted for postgres. It doesn't need to be the URL of an actual database server though. Here's mine: DATABASE_URL=postgres://does@not/mater

Hopefully a more comprehensive guide will be written at some point, but for now feel free to reach out to the mailing list if you have any questions.

Docker (Unsupported)

A Dockerfile is included to build and run your own OCI images. I don't use Docker personally so this is all community created and supported. If it breaks, please write to the mailing list.

To build:

$ docker build [--build-arg PUID=1000] [--build-arg PGID=1000] -t scribe:latest -f ./Dockerfile .

To run (generating a base config from environment variables):

$ docker run -it --rm -p 8080:8080 -e SCRIBE_PORT=8080 -e SCRIBE_HOST=0.0.0.0 -e SCRIBE_DB=postgres://does@not/matter scribe:latest

To run with mounted config from local fs:

$ docker run -it --rm -v `pwd`/config/watch.yml:/app/config/watch.yml -p 8080:8080 scribe:latest

Configuration

To allow your domain to show up on the homepage, the APP_DOMAIN environment variable must be set. Note that this only takes effect if the LUCKY_ENV environment variable is also set to production.

See the route_helper config for the code that powers this feature.

Other configuration needed when in production mode:

PORT: The port Scribe should run on
SECRET_KEY_BASE: A 32-bit string. Can be generated with lucky gen.secret_key
DATABASE_URL: May be any valid postgres url since Scribe doesn't use a database
- Example: postgres://does@not/matter

Project goals

I believe that Medium is a bad actor on the web. They offer a bad reading experience. Writing there benefits Medium more than the author. Counter to their promise of a wider reach, they offer worse SEO. They use extortionist business tactics. Finally, they want to centralize the currently decentralized world of blogging.

Since Scribe uses Medium content, I don't want to help people engage with it more than they must. My goal here is not to make a nicer Medium to engage with, but to make a less bad experience when people are forced to engage with it. I want Scribe to be a tool, not a platform.

It's intentional that there is no way to browse content from a user, see popular posts, consume via an RSS feed, or further engage with an article via comments or "claps". I want to spend my time encouraging writers to move to worthy platforms, not making a bad platform worthy.

Contributing

Install required dependencies (see sub-sections below)
Run script/setup
Run lucky dev to start the app
Send a patch to ~edwardloveall/Scribe@lists.sr.ht (it may not look like it at first, but that's an email address).

Installing dependencies

General instructions for installing Lucky and its dependencies can be found at https://luckyframework.org/guides/getting-started/installing#install-required-dependencies.

Installing dependencies with Nix

If you are using the Nix package manager, you can get a shell with all dependencies with the following command(s):

nix-shell

# Or if you are using the (still experimental) Nix Flakes feature
nix flake update # Update dependencies (optional)
nix develop

Learning Lucky

Lucky uses the Crystal programming language. You can learn about Lucky from the Lucky Guides.