Previously the link on the error page was only linking to the path
component of the url, e.g. `/search` but ignoring any query params e.g.
`/search?q=hello`. This uses the HTTP::Request `resource` method which
appears to capture both.
A new ArticleIdParser class takes in an HTTP::Request object and parses
the article ID from it. It intentinoally fails on tag, user, and search
pages and attempts to only catch articles.
Instead of showing the default Lucky error page, the styles now match
Scribe. In addition, if a URL can't be parsed, Scribe gives some
information as to why this might be (that Scribe can only deal with an
article pages)
When a post has a gi= query param, Medium makes a global_identifier
"query". This redirects via a 307 temporary redirect to a url that
looks like this:
https://medium.com/m/global-identity?redirectUrl=https%3A%2F%2Fexample.c
om%2Fmy-post-000000000000
Previously, scribe looked for the Medium post id in the url's path, not
it's query params since query params can include other garbage like
medium_utm (not related to medium.com). Now it looks first for the post
id in the path, then looks to the redirectUrl as a fallback.
Medium guides each post to have a Title and Subtitle. They are rendered
as the first two paragraphs: H3 and H4 respectively. If they exist, a
new PageConverter class extracts them and sets them on the page.
However, they aren't required. If the first two paragraphs aren't H3
and H4, the PageConverter falls back to using the first paragraph as
the title, and setting the subtitle to blank.
The remaining paragraphs are passed into the ParagraphConverter as
normal.
The API responds with a bunch of paragraphs which the client converts
into Paragraph objects.
This turns the paragraphs in a PostResponse's Paragraph objects into the
form needed to render them on a page. This includes converting flat list
elements into list elements nested by a UL. And adding a limited markups
along the way.
The array of paragraphs is passed to a recursive function. The function
takes the first paragraph and either wraps the (marked up) contents in a
container tag (like Paragraph or Heading3), and then moves onto the next
tag. If it finds a list, it starts parsing the next paragraphs as a list
instead.
Originally, this was implemented like so:
```crystal
paragraph = paragraphs.shift
if list?
convert_list([paragraph] + paragraphs)
end
```
However, passing the `paragraphs` after adding it to the already shifted
`paragraph` creates a new object. This means `paragraphs` won't be
mutated and once the list is parsed, it starts with the next element of
the list. Instead, the element is `shift`ed inside each converter.
```crystal
if paragraphs.first == list?
convert_list(paragraphs)
end
def convert_list(paragraphs)
paragraph = paragraphs.shift
# ...
end
```
When rendering, there is an Empty and Container object. These represent
a kind of "null object" for both leafs and parent objects respectively.
They should never actually render. Emptys are filtered out, and
Containers are never created explicitly but this will make the types
pass.
IFrames are a bit of a special case. Each IFrame has custom data on it
that this system would need to be aware of. For now, instead of trying
to parse the seemingly large number of iframe variations and dealing
with embedded iframe problems, this will just keep track of the source
page URL and send the user there with a link.
The basic idea here is to fetch the post with the medium API, parse the
JSON into types, and then re-display the content. We also have to fetch
each media object as a REST call to get things like embeded iframes.