scribe

Author	SHA1	Message	Date
Fabio Manganiello	4491c6dba1	Added carbon dependency to shard.yml	2023-05-11 15:47:39 +02:00
Edward Loveall	6a38a1cebc	Update CHANGELOG	2023-05-06 13:18:13 -04:00
Edward Loveall	467f3c3a63	Change crystal version to 1.8.1	2023-05-06 13:05:58 -04:00
Edward Loveall	853e9ad50d	Add captions to embedded media	2023-05-06 12:10:46 -04:00
Edward Loveall	27faf59549	Upgrade to Lucky 1.0.0	2023-05-06 10:56:02 -04:00
Edward Loveall	d1ecb76cdc	Update to lucky 1.0.0-rc1	2023-05-06 10:53:31 -04:00
Edward Loveall	e86108e18f	Rearrange article id parsing to be more reliable The article ID parser looks for a string at the end of a URL path with a bunch of hex digits. But it also has to handle user, tag, and search URLs. * /@ba5eba11 * /tag/0ddba11 * /search?q=ba5eba11 Some URLs are encoded as params. The parser used to look at the result of the path first, then the params. But paths that ended in `global-identity-2` messed that up because `2` is a hex digit at the end of the path. This changes the logic to parse params first and paths second which gets around this.	2023-03-25 16:32:37 -04:00
Edward Loveall	cef1bc256d	Add unique ID to headings The `name` field on the `paragraph` type contains a unique ID for the paragraph. It's not guaranteed to be there, on images for example like in the `fd8d091ab8ef` post, but it's there for everything else I can find. This enables deep linking. There's no way to get to the deep link other than opening up the web console. I wanted to link every heading, but you can actually have links in part of a heading so that's not tenable. Maybe a "permalink" link next to every heading?	2023-03-25 11:20:14 -04:00
PrivacyDev	761e4ef170	Add scribe.g4c3eya4clenolymqbpgwz3q3tawoxw56yhzk4vugqrl6dtu3ejvhjid.onion instance	2022-12-11 13:33:09 -05:00
Edward Loveall	815f5c19f0	Update to nodejs 16.18.0 It was pretty old, but also it wasn't installing correctly on an Apple Silicon machine.	2022-11-06 17:33:20 -05:00
Edward Loveall	bf31305617	Version 2022-10-30	2022-11-04 18:25:14 -04:00
blankie	e1c70b9db0	Fix viewing articles if the URL has a trailing slash	2022-11-04 18:20:00 -04:00
Edward Loveall	d7ea1174ff	Updates to pre/code config This ensures that code blocks look good at all screen sizes.	2022-10-11 20:33:18 -04:00
Pedro Lucas Porcellis	eca9eb7f13	Avoid clipping gist code's content	2022-10-11 19:57:31 -04:00
Edward Loveall	48204b039b	Remove downloadable Redirector config	2022-09-24 15:59:37 -04:00
Edward Loveall	7e927469dc	Replace Redirector extension with LibRedirect Since Scribe launched, the Redirector extension config has needed occasional attention. Using regular expressions to cover all edge cases is difficult. After finding out that Scribe's current config can hang websites, I decided that [LibRedirect] is likely a more robust solution. It can rely on more than regular expressions, and is less work to set up. [LibRedirect]: https://libredirect.github.io/	2022-09-24 15:50:38 -04:00
Edward Loveall	b69fa2f2b1	Update tor instance	2022-09-15 19:03:14 -04:00
Arya Kiran	8240f40719	Add new instance sc.vern.cc Signed-off-by: Arya Kiran <aryak@vern.cc>	2022-08-20 10:25:19 -04:00
technonerd	98de1d24d6	Add new instance: scribe.rawbit.ninja	2022-08-20 10:19:35 -04:00
PrivacyDev	ef8ddb9025	Add scribe.privacydev.net instance	2022-08-16 08:45:12 -04:00
Edward Loveall	931636ebea	Add Tor instance	2022-08-05 08:36:25 -04:00
Edward Loveall	3c6c4770d0	Add scribe.esmailelbob.xyz instance	2022-07-29 08:17:31 -04:00
Edward Loveall	4097aa20df	Fix Redirector config escaped strings When printing out the configuration JSON, the Redirector extension expects regex escapes to be escaped, themselves. So `\` becomes `\\`. However, Crystal treats these as escaped character also, and each `\` must additionally be escaped, so a single slash becomes `\\\\`	2022-07-19 16:28:23 -04:00
Edward Loveall	449ece843a	Provide a configuration file for the Redirector extension Instead of providing long detailed instructions for how to configure the Redirector extension, this provides a single json file that users can import. I started by making a single file stored in the `public/assets` directory, but then realized this was a regression since the instructions were customized to each domain. Instead I can use Lucky's [data] response to dynamically build the JSON config. [data]: https://luckyframework.org/guides/http-and-routing/request-and-response# handling-responses	2022-07-17 15:00:03 -04:00
Edward Loveall	269ccc1bef	Scroll long code blocks This sets the width of code blocks to be the width of the page, and adds a scrollbar for long blocks. Article `c146e768bb41` has some examples. I could have also wrapped the codeblocks, but as pointed out by [~kaki87] this often reduces readability. Hence: scrollbars. [~kaki87]: https://todo.sr.ht/~edwardloveall/Scribe/6#event-188395	2022-07-17 13:23:03 -04:00
Edward Loveall	5b20d3f6d1	Upgrade to Crystal 1.5.0	2022-07-17 12:39:48 -04:00
Edward Loveall	35b72ada37	Upgrade to Lucky 0.30.1 Upgrading to 0.31.0 should be very easy. It's just running `shards update` in the root of the project. That should be all.	2022-07-17 11:55:51 -04:00
Edward Loveall	740230d451	Fix source code link Capitalize the `S` in `Scribe`. I don't have record of this ever needing to be capitalized before, but it clearly does not work.	2022-07-17 11:30:03 -04:00
Edward Loveall	f05a12a880	Add support for missing posts Posts, like 8661f4724aa9, can go missing if the account or post was removed. In this case, the API returns data like this: ```json { "data": { "post": null } } ``` When this happens, we can detect it because the parsed response now has a nil value: `response.data.post == nil` and construct an `EmptyPage` instead of a `Page`. The `Articles::Show` action can then render conditionally based on if the response from `PageConverter` is a `Page` or an `EmptyPage`.	2022-06-17 16:00:01 -04:00
Edward Loveall	1dcded9153	Update changelog to mention no DATABASE_URL	2022-05-21 15:02:06 -04:00
Michael Herold	098f7fe0f9	Remove the need for a DATABASE_URL Since the application does not use a database, it's confusing to have to set a bogus database URL environment variable. This change follows [the Lucky guide][1] suggestion for disabling the need for database configuration. That makes the setup a little easier. [1]: https://www.luckyframework.org/guides/database/intro-to-avram-and-orms	2022-05-21 11:34:28 -04:00
Edward Loveall	93f5cb2d9e	Update CHANGELOG	2022-04-04 20:41:00 -04:00
Edward Loveall	defec9319e	Handle gists with file extensions Somehow, in my Gist Proxy code `7518a035b1` I never accounted for gist ids with file extensions. For example: `def123.js` instead of plain `def123`. This is now fixed and articles with those kinds of gists in them work now. Reference article: https://medium.com/neat-tips-tricks/ocaml-continuation-explained-3b73839 b679f	2022-04-04 20:32:42 -04:00
Sam Therapy	89e5c7209f	Instance list: add scribe.froth.zone	2022-03-28 17:54:32 -04:00
Edward Loveall	80b6b51804	Fix redirection pattern Commit `6ea0586423` improved redirection instructions, but regressed in one way. The "Redirect to" pattern specified a slash which was accounted for in the main pattern, which resulted in a double slash: https://medium.com/@user/post-123456abcdef would redirect to https://scribe.rip//@user/post-123456abcdef This removes the extra slash	2022-03-12 12:03:23 -05:00
Edward Loveall	fb51270f87	Fix article ID parsing bug Since the article ID regular expression wasn't anchored to the end of the URL, it would grab characters after a / or - that were hex characters. For example /@user/bacon-123abc would just grab `bac`. Not great. This anchors the ID at the end of the string so that it will be more likely to catch IDs.	2022-02-13 21:07:50 -05:00
Edward Loveall	3f5a5580e0	Release version 2022-02-13	2022-02-13 10:14:22 -05:00
Edward Loveall	1f517f9031	Link to full Medium URL on error page Previously the link on the error page was only linking to the path component of the url, e.g. `/search` but ignoring any query params e.g. `/search?q=hello`. This uses the HTTP::Request `resource` method which appears to capture both.	2022-02-13 10:13:24 -05:00
Edward Loveall	24d3ab9ab3	Better article ID parsing A new ArticleIdParser class takes in an HTTP::Request object and parses the article ID from it. It intentinoally fails on tag, user, and search pages and attempts to only catch articles.	2022-02-13 10:10:46 -05:00
Edward Loveall	f056a0b68a	Better error pages Instead of showing the default Lucky error page, the styles now match Scribe. In addition, if a URL can't be parsed, Scribe gives some information as to why this might be (that Scribe can only deal with an article pages)	2022-02-12 17:56:36 -05:00
Edward Loveall	7d0bc37efd	Fix markup errors caused by UTF-16/8 differences Medium uses UTF-16 character offsets (likely to make it easier to parse in JavaScript) but Crystal uses UTF-8. Converting strings to UTF-16 to do offset calculation then back to UFT-8 fixes some markup bugs. --- Medium calculates markup offsets using UTF-16 encoding. Some characters like Emoji are count as multiple bytes which affects those offsets. For example in UTF-16 💸 is worth two bytes, but Crystal strings only count it as one. This is a problem for markup generation because it can offset the markup and even cause out-of-range errors. Take the following example: 💸💸! Imagine that `!` was bold but the emoji isn't. For Crystal, this starts at char index 2, end at char index 3. Medium's markup will say markup goes from character 4 to 5. In a 3 character string like this, trying to access character range 4...5 is an error because 5 is already out of bounds. My theory is that this is meant to be compatible with JavaScript's string length calculations, as Medium is primarily a platform built for the web: ```js "a".length // 1 "💸".length // 2 "👩‍❤️‍💋‍👩".length // 11 ``` To get these same numbers in Crystal strings must be converted to UTF-16: ```crystal "a".to_utf16.size # 1 "💸".to_utf16.size # 2 "👩‍❤️‍💋‍👩".to_utf16.size # 11 ``` The MarkupConverter now converts text into UFT-16 byte arrays on initialization. Once it's figured out the range of bytes needed for each piece of markup, it converts it back into UTF-8 strings.	2022-01-30 11:53:22 -05:00
Edward Loveall	648a933b24	Provide a list of instances as JSON This is for extensions or other tools that wish to have a list of instances. It can be accessed by visiting the raw file on sourcehut: https://git.sr.ht/~edwardloveall/scribe/blob/main/docs/instances.json	2022-01-29 12:58:08 -05:00
Edward Loveall	08f38a4d25	Add GitHub Gist authentication instructions	2022-01-23 16:08:23 -05:00
Edward Loveall	3a8ad82252	Add CHANGELOG	2022-01-23 15:06:01 -05:00
Edward Loveall	7518a035b1	Proxy GitHub gists with rate limiting Previously, GitHub gists were embedded. The gist url would be detected in a paragraph and the page would render a script like: ```html <script src="https://gist.github.com/user/gist_id.js"></script> ``` The script would then embed the gist on the page. However, gists contain multiple files. It's technically possible to embed a single file in the same way by appending a `file` query param: ```html <script src="https://gist.github.com/user/gist_id.js?file=foo.txt"></script> ``` I wanted to try and tackle proxying gists instead. Overview -------- At a high level the PageConverter kicks off the work of fetching and storing the gist content, then sends that content down to the `ParagraphConverter`. When a paragraph comes up that contains a gist embed, it retrieves the previously fetched content. This allows all the necessary content to be fetched up front so the minimum number of requests need to be made. Fetching Gists -------------- There is now a `GithubClient` class that gets gist content from GitHub's ReST API. The gist API response looks something like this (non-relevant keys removed): ```json { "files": { "file-one.txt": { "filename": "file-one.txt", "raw_url": "https://gist.githubusercontent.com/<username>/<id>/raw/<file_id>/file-o ne.txt", "content": "..." }, "file-two.txt": { "filename": "file-two.txt", "raw_url": "https://gist.githubusercontent.com/<username>/<id>/raw/<file_id>/file-t wo.txt", "content": "..." } } } ``` That response gets turned into a bunch of `GistFile` objects that are then stored in a request-level `GistStore`. Crystal's JSON parsing does not make it easy to parse json with arbitrary keys into objects. This is because each key corresponds to an object property, like `property name : String`. If Crystal doesn't know the keys ahead of time, there's no way to know what methods to create. That's a problem here because the key for each gist file is the unique filename. Fortunately, the keys for each _file_ follows the same pattern and are easy to parse into a `GistFile` object. To turn gist file JSON into Crystal objects, the `GithubClient` turns the whole response into a `JSON::Any` which is like a Hash. Then it extracts just the file data objects and parses those into `GistFile` objects. Those `GistFile` objects are then cached in a `GistStore` that is shared for the page, which means one gist cache per request/article. `GistFile` objects can be fetched out of the store by file, or if no file is specified, it returns all files in the gist. The GistFile is rendered as a link of the file's name to the file in the gist on GitHub, and then a code block of the contents of the file. In summary, the `PageConverter`: * Scans the paragraphs for GitHub gists using `GistScanner` * Requests their data from GitHub using the `GithubClient` * Parses the response into `GistFile`s and populates the `GistStore` * Passes that `GistStore` to the `ParagraphConverter` to use when constructing the page nodes Caching ------- GitHub limits API requests to 5000/hour with a valid api token and 60/hour without. 60 is pretty tight for the usage that scribe.rip gets, but 5000 is reasonable most of the time. Not every article has an embedded gist, but some articles have multiple gists. A viral article (of which Scribe has seen two at the time of this commit) might receive a little over 127k hits/day, which is an average of over 5300/hour. If that article had a gist, Scribe would reach the API limit during parts of the day with high traffic. If it had multiple gists, it would hit it even more. However, average traffic is around 30k visits/day which would be well under the limit, assuming average load. To help not hit that limit, a `GistStore` holds all the `GistFile` objects per gist. The logic in `GistScanner` is smart enough to only return unique gist URLs so each gist is only requested once even if multiple files from one gist exist in an article. This limits the number of times Scribe hits the GitHub API. If Scribe is rate-limited, instead of populating a `GistCache` the `PageConverter` will create a `RateLimitedGistStore`. This is an object that acts like the `GistStore` but returns `RateLimitedGistFile` objects instead of `GistFile` objects. This allows Scribe to gracefully degrade in the event of reaching the rate limit. If rate-limiting becomes a regular problem, Scribe could also be reworked to fallback to the embedded gists again. API Credentials --------------- API credentials are in the form of a GitHub username and a personal access token attached to that username. To get a token, visit https://github.com/settings/tokens and create a new token. The only permission it needs is `gist`. This token is set via the `GITHUB_PERSONAL_ACCESS_TOKEN` environment variable. The username also needs to be set via `GITHUB_USERNAME`. When developing locally, these can both be set in the .env file. Authentication is probably not necessary locally, but it's there if you want to test. If either token is missing, unauthenticated requests are made. Rendering --------- The node tree itself holds a `GithubGist` object. It has a reference to the `GistStore` and the original gist URL. When it renders the page requests the gist's `files`. The gist ID and optional file are detected, and then used to request the file(s) from the `GistStore`. Gists render as a list of each files contents and a link to the file on GitHub. If the requests were rate limited, the store is a `RateLimitedGistStore` and the files are `RateLimitedGistFile`s. These rate-limited objects rendered with a link to the gist on GitHub and text saying that Scribe has been rate-limited. If somehow the file requested doesn't exist in the store, it displays similarly to the rate-limited file but with "file missing" text instead of "rate limited" text. GitHub API docs: https://docs.github.com/en/rest/reference/gists Rate Limiting docs: https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate- limiting	2022-01-23 15:05:46 -05:00
Edward Loveall	8737ca7897	Add scribe.bus-hit.me instance	2022-01-16 22:05:31 -05:00
Edward Loveall	27234bd32a	Ensure that scr/version is up-to-date when building This is an experiment to see if it forces me to actually have updated the version before I build. The idea is that I need to actually commit the version which will make it more likely that all instances can pull down the code and display the correct version if I've done it myself. It uses `git show` to grab the committed contents of src/version then checks to see if it matches today's date.	2022-01-15 16:31:02 -05:00
Edward Loveall	c775072b3d	Add instructions for Lucky config variables The most common is "How do I set my custom domain" (answer: APP_DOMAIN) but this also requires setting LUCKY_ENV=production which requires SECRET_KEY_BASE, DATABASE_URL, and PORT	2022-01-15 16:29:46 -05:00
Edward Loveall	46d87930b8	Use FAQ entry to explain custom domains	2022-01-08 20:15:46 -05:00
Edward Loveall	037bc7cd0f	Add visible version This is to be able to track which instances (including the main one) have which fixes	2022-01-04 21:26:53 -05:00

1 2 3

120 commits