Commit Graph

18 Commits

Author SHA1 Message Date
Edward Loveall d1ecb76cdc
Update to lucky 1.0.0-rc1 2023-05-06 10:53:31 -04:00
Edward Loveall cef1bc256d
Add unique ID to headings
The `name` field on the `paragraph` type contains a unique ID for the
paragraph. It's not guaranteed to be there, on images for example like
in the `fd8d091ab8ef` post, but it's there for everything else I can
find.

This enables deep linking. There's no way to get to the deep link other
than opening up the web console. I wanted to link every heading, but
you can actually have links in part of a heading so that's not tenable.
Maybe a "permalink" link next to every heading?
2023-03-25 11:20:14 -04:00
Edward Loveall 815f5c19f0
Update to nodejs 16.18.0
It was pretty old, but also it wasn't installing correctly on an Apple
Silicon machine.
2022-11-06 17:33:20 -05:00
Edward Loveall bf31305617
Version 2022-10-30 2022-11-04 18:25:14 -04:00
Edward Loveall d7ea1174ff
Updates to pre/code config
This ensures that code blocks look good at all screen sizes.
2022-10-11 20:33:18 -04:00
Edward Loveall 7e927469dc
Replace Redirector extension with LibRedirect
Since Scribe launched, the Redirector extension config has needed
occasional attention. Using regular expressions to cover all edge cases
is difficult. After finding out that Scribe's current config can hang
websites, I decided that [LibRedirect] is likely a more robust
solution. It can rely on more than regular expressions, and is less
work to set up.

[LibRedirect]: https://libredirect.github.io/
2022-09-24 15:50:38 -04:00
Edward Loveall 4097aa20df
Fix Redirector config escaped strings
When printing out the configuration JSON, the Redirector extension
expects regex escapes to be escaped, themselves. So `\` becomes `\\`.
However, Crystal treats these as escaped character also, and each `\`
must additionally be escaped, so a single slash becomes `\\\\`
2022-07-19 16:28:23 -04:00
Edward Loveall 740230d451
Fix source code link
Capitalize the `S` in `Scribe`. I don't have record of this ever
needing to be capitalized before, but it clearly does not work.
2022-07-17 11:30:03 -04:00
Edward Loveall f05a12a880
Add support for missing posts
Posts, like 8661f4724aa9, can go missing if the account or post was
removed. In this case, the API returns data like this:

```json
{
  "data": {
    "post": null
  }
}
```

When this happens, we can detect it because the parsed response now has
a nil value: `response.data.post == nil` and construct an `EmptyPage`
instead of a `Page`. The `Articles::Show` action can then render
conditionally based on if the response from `PageConverter` is a `Page`
or an `EmptyPage`.
2022-06-17 16:00:01 -04:00
Michael Herold 098f7fe0f9
Remove the need for a DATABASE_URL
Since the application does not use a database, it's confusing to have to
set a bogus database URL environment variable. This change follows [the
Lucky guide][1] suggestion for disabling the need for database
configuration. That makes the setup a little easier.

[1]:
https://www.luckyframework.org/guides/database/intro-to-avram-and-orms
2022-05-21 11:34:28 -04:00
Edward Loveall defec9319e
Handle gists with file extensions
Somehow, in my Gist Proxy code 7518a035b1 I never accounted for gist
ids with file extensions. For example: `def123.js` instead of plain
`def123`. This is now fixed and articles with those kinds of gists in
them work now.

Reference article:
https://medium.com/neat-tips-tricks/ocaml-continuation-explained-3b73839
b679f
2022-04-04 20:32:42 -04:00
Edward Loveall 80b6b51804
Fix redirection pattern
Commit 6ea0586423 improved redirection
instructions, but regressed in one way. The "Redirect to" pattern
specified a slash which was accounted for in the main pattern, which
resulted in a double slash:

https://medium.com/@user/post-123456abcdef

would redirect to

https://scribe.rip//@user/post-123456abcdef

This removes the extra slash
2022-03-12 12:03:23 -05:00
Edward Loveall 3f5a5580e0
Release version 2022-02-13 2022-02-13 10:14:22 -05:00
Edward Loveall 7d0bc37efd
Fix markup errors caused by UTF-16/8 differences
Medium uses UTF-16 character offsets (likely to make it easier to parse
in JavaScript) but Crystal uses UTF-8. Converting strings to UTF-16 to
do offset calculation then back to UFT-8 fixes some markup bugs.

---

Medium calculates markup offsets using UTF-16 encoding. Some characters
like Emoji are count as multiple bytes which affects those offsets. For
example in UTF-16 💸 is worth two bytes, but Crystal strings only count
it as one. This is a problem for markup generation because it can
offset the markup and even cause out-of-range errors.

Take the following example:

💸💸!

Imagine that `!` was bold but the emoji isn't. For Crystal, this starts
at char index 2, end at char index 3. Medium's markup will say markup
goes from character 4 to 5. In a 3 character string like this, trying
to access character range 4...5 is an error because 5 is already out of
bounds.

My theory is that this is meant to be compatible with JavaScript's
string length calculations, as Medium is primarily a platform built for
the web:

```js
"a".length // 1
"💸".length // 2
"👩‍❤️‍💋‍👩".length // 11
```

To get these same numbers in Crystal strings must be converted to
UTF-16:

```crystal
"a".to_utf16.size # 1
"💸".to_utf16.size # 2
"👩‍❤️‍💋‍👩".to_utf16.size # 11
```

The MarkupConverter now converts text into UFT-16 byte arrays on
initialization. Once it's figured out the range of bytes needed for
each piece of markup, it converts it back into UTF-8 strings.
2022-01-30 11:53:22 -05:00
Edward Loveall 648a933b24
Provide a list of instances as JSON
This is for extensions or other tools that wish to have a list of
instances. It can be accessed by visiting the raw file on sourcehut:

https://git.sr.ht/~edwardloveall/scribe/blob/main/docs/instances.json
2022-01-29 12:58:08 -05:00
Edward Loveall 7518a035b1
Proxy GitHub gists with rate limiting
Previously, GitHub gists were embedded. The gist url would be detected
in a paragraph and the page would render a script like:

```html
<script src="https://gist.github.com/user/gist_id.js"></script>
```

The script would then embed the gist on the page. However, gists contain
multiple files. It's technically possible to embed a single file in the
same way by appending a `file` query param:

```html
<script
src="https://gist.github.com/user/gist_id.js?file=foo.txt"></script>
```

I wanted to try and tackle proxying gists instead.

Overview
--------

At a high level the PageConverter kicks off the work of fetching and
storing the gist content, then sends that content down to the
`ParagraphConverter`. When a paragraph comes up that contains a gist
embed, it retrieves the previously fetched content. This allows all the
necessary content to be fetched up front so the minimum number of
requests need to be made.

Fetching Gists
--------------

There is now a `GithubClient` class that gets gist content from GitHub's
ReST API. The gist API response looks something like this (non-relevant
keys removed):

```json
{
  "files": {
    "file-one.txt": {
      "filename": "file-one.txt",
      "raw_url":
"https://gist.githubusercontent.com/<username>/<id>/raw/<file_id>/file-o
ne.txt",
      "content": "..."
    },
    "file-two.txt": {
      "filename": "file-two.txt",
      "raw_url":
"https://gist.githubusercontent.com/<username>/<id>/raw/<file_id>/file-t
wo.txt",
      "content": "..."
    }
  }
}
```

That response gets turned into a bunch of `GistFile` objects that are
then stored in a request-level `GistStore`. Crystal's JSON parsing does
not make it easy to parse json with arbitrary keys into objects. This is
because each key corresponds to an object property, like `property name
: String`. If Crystal doesn't know the keys ahead of time, there's no
way to know what methods to create.

That's a problem here because the key for each gist file is the unique
filename. Fortunately, the keys for each _file_ follows the same pattern
and are easy to parse into a `GistFile` object. To turn gist file JSON
into Crystal objects, the `GithubClient` turns the whole response into a
`JSON::Any` which is like a Hash. Then it extracts just the file data
objects and parses those into `GistFile` objects.

Those `GistFile` objects are then cached in a `GistStore` that is shared
for the page, which means one gist cache per request/article. `GistFile`
objects can be fetched out of the store by file, or if no file is
specified, it returns all files in the gist.

The GistFile is rendered as a link of the file's name to the file in
the gist on GitHub, and then a code block of the contents of the file.

In summary, the `PageConverter`:

* Scans the paragraphs for GitHub gists using `GistScanner`
* Requests their data from GitHub using the `GithubClient`
* Parses the response into `GistFile`s and populates the `GistStore`
* Passes that `GistStore` to the `ParagraphConverter` to use when
  constructing the page nodes

Caching
-------

GitHub limits API requests to 5000/hour with a valid api token and
60/hour without. 60 is pretty tight for the usage that scribe.rip gets,
but 5000 is reasonable most of the time. Not every article has an
embedded gist, but some articles have multiple gists. A viral article
(of which Scribe has seen two at the time of this commit) might receive
a little over 127k hits/day, which is an average of over 5300/hour. If
that article had a gist, Scribe would reach the API limit during parts
of the day with high traffic. If it had multiple gists, it would hit it
even more. However, average traffic is around 30k visits/day which would
be well under the limit, assuming average load.

To help not hit that limit, a `GistStore` holds all the `GistFile`
objects per gist. The logic in `GistScanner` is smart enough to only
return unique gist URLs so each gist is only requested once even if
multiple files from one gist exist in an article. This limits the number
of times Scribe hits the GitHub API.

If Scribe is rate-limited, instead of populating a `GistCache` the
`PageConverter` will create a `RateLimitedGistStore`. This is an object
that acts like the `GistStore` but returns `RateLimitedGistFile` objects
instead of `GistFile` objects. This allows Scribe to gracefully degrade
in the event of reaching the rate limit.

If rate-limiting becomes a regular problem, Scribe could also be
reworked to fallback to the embedded gists again.

API Credentials
---------------

API credentials are in the form of a GitHub username and a personal
access token attached to that username. To get a token, visit
https://github.com/settings/tokens and create a new token. The only
permission it needs is `gist`.

This token is set via the `GITHUB_PERSONAL_ACCESS_TOKEN` environment
variable. The username also needs to be set via `GITHUB_USERNAME`. When
developing locally, these can both be set in the .env file.
Authentication is probably not necessary locally, but it's there if you
want to test. If either token is missing, unauthenticated requests are
made.

Rendering
---------

The node tree itself holds a `GithubGist` object. It has a reference to
the `GistStore` and the original gist URL. When it renders the page
requests the gist's `files`. The gist ID and optional file are detected,
and then used to request the file(s) from the `GistStore`. Gists render
as a list of each files contents and a link to the file on GitHub.

If the requests were rate limited, the store is a
`RateLimitedGistStore` and the files are `RateLimitedGistFile`s. These
rate-limited objects rendered with a link to the gist on GitHub and text
saying that Scribe has been rate-limited.

If somehow the file requested doesn't exist in the store, it displays
similarly to the rate-limited file but with "file missing" text instead
of "rate limited" text.

GitHub API docs: https://docs.github.com/en/rest/reference/gists
Rate Limiting docs:
https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-
limiting
2022-01-23 15:05:46 -05:00
Edward Loveall 46d87930b8
Use FAQ entry to explain custom domains 2022-01-08 20:15:46 -05:00
Edward Loveall 037bc7cd0f
Add visible version
This is to be able to track which instances (including the main one)
have which fixes
2022-01-04 21:26:53 -05:00