scribe/src/models/nodes.cr

263 lines
4.5 KiB
Crystal
Raw Normal View History

First step rendering a page The API responds with a bunch of paragraphs which the client converts into Paragraph objects. This turns the paragraphs in a PostResponse's Paragraph objects into the form needed to render them on a page. This includes converting flat list elements into list elements nested by a UL. And adding a limited markups along the way. The array of paragraphs is passed to a recursive function. The function takes the first paragraph and either wraps the (marked up) contents in a container tag (like Paragraph or Heading3), and then moves onto the next tag. If it finds a list, it starts parsing the next paragraphs as a list instead. Originally, this was implemented like so: ```crystal paragraph = paragraphs.shift if list? convert_list([paragraph] + paragraphs) end ``` However, passing the `paragraphs` after adding it to the already shifted `paragraph` creates a new object. This means `paragraphs` won't be mutated and once the list is parsed, it starts with the next element of the list. Instead, the element is `shift`ed inside each converter. ```crystal if paragraphs.first == list? convert_list(paragraphs) end def convert_list(paragraphs) paragraph = paragraphs.shift # ... end ``` When rendering, there is an Empty and Container object. These represent a kind of "null object" for both leafs and parent objects respectively. They should never actually render. Emptys are filtered out, and Containers are never created explicitly but this will make the types pass. IFrames are a bit of a special case. Each IFrame has custom data on it that this system would need to be aware of. For now, instead of trying to parse the seemingly large number of iframe variations and dealing with embedded iframe problems, this will just keep track of the source page URL and send the user there with a link.
2021-05-16 20:14:25 +02:00
module Nodes
Render embedded content PostResponse::Paragraph's that are of type IFRAME have extra data in the iframe attribute to specify what's in the iframe. Not all data is the same, however. I've identified three types and am using the new EmbeddedConverter class to convert them: * EmbeddedContent, the full iframe experience * GithubGist, because medium or github treat embeds differently for whatever reason * EmbeddedLink, the old style, just a link to the content. Effectively a fallback The size of the original iframe is also specified as an attribute. This code resizes it. The resizing is determined by figuring out the width/height ratio and setting the width to 800. EmbeddedContent can be displayed if we have an embed.ly url, which most iframe response data has. GitHub gists are a notable exception. Gists instead can be embedded simply by taking the gist URL and attaching .js to the end. That becomes the iframe's src attribute. The PostResponse::Paragraph's iframe attribute is nillable. Previous code used lots of if-statements with variable bindings to work with the possible nil values: ```crystal if foo = obj.nillable_value # obj.nillable_value was not nil and foo contains the value else # obj.nillable_value was nil so do something else end ``` See https://crystal-lang.org/reference/syntax_and_semantics/if_var.html for more info In the EmbeddedConverter the monads library has been introduced to get rid of at least one level of nillability. This wraps values in Maybe which allows for a cleaner interface: ```crystal Monads::Try(Value).new(->{ obj.nillable_value }) .to_maybe .fmap(->(value: Value) { # do something with value }) .value_or(# value was nil, do something else) ``` This worked to get the iframe attribute from a Paragraph: ```crystal Monads::Try(PostResponse::IFrame).new(->{ paragraph.iframe }) .to_maybe .fmap(->(iframe : PostResponse::IFrame) { # iframe is not nil! }) .fmap(#and so on) .value_or(Empty.new) ``` iframe only has one attribute: mediaResource which contains the iframe data. That was used to determine one of the three types above. Finally, Tufte.css has options for iframes. They mostly look good except for tweets which are too small and weirdly in the center of the page which actually looks off-center. That's for another day though.
2021-09-13 19:27:52 +02:00
alias Embedded = EmbeddedLink | EmbeddedContent | GithubGist
alias Leaf = Text | Image | Embedded
First step rendering a page The API responds with a bunch of paragraphs which the client converts into Paragraph objects. This turns the paragraphs in a PostResponse's Paragraph objects into the form needed to render them on a page. This includes converting flat list elements into list elements nested by a UL. And adding a limited markups along the way. The array of paragraphs is passed to a recursive function. The function takes the first paragraph and either wraps the (marked up) contents in a container tag (like Paragraph or Heading3), and then moves onto the next tag. If it finds a list, it starts parsing the next paragraphs as a list instead. Originally, this was implemented like so: ```crystal paragraph = paragraphs.shift if list? convert_list([paragraph] + paragraphs) end ``` However, passing the `paragraphs` after adding it to the already shifted `paragraph` creates a new object. This means `paragraphs` won't be mutated and once the list is parsed, it starts with the next element of the list. Instead, the element is `shift`ed inside each converter. ```crystal if paragraphs.first == list? convert_list(paragraphs) end def convert_list(paragraphs) paragraph = paragraphs.shift # ... end ``` When rendering, there is an Empty and Container object. These represent a kind of "null object" for both leafs and parent objects respectively. They should never actually render. Emptys are filtered out, and Containers are never created explicitly but this will make the types pass. IFrames are a bit of a special case. Each IFrame has custom data on it that this system would need to be aware of. For now, instead of trying to parse the seemingly large number of iframe variations and dealing with embedded iframe problems, this will just keep track of the source page URL and send the user there with a link.
2021-05-16 20:14:25 +02:00
alias Child = Container | Leaf | Empty
alias Children = Array(Child)
class Container
getter children : Children
def initialize(@children : Children)
end
def ==(other : Container)
other.children == children
end
def empty?
children.empty? || children.each(&.empty?)
end
end
class Empty
def empty?
true
end
end
class BlockQuote < Container
end
class Code < Container
end
class Emphasis < Container
end
class Figure < Container
end
class FigureCaption < Container
end
class Heading1 < Container
getter identifier : String
def initialize(@children : Children, @identifier : String)
end
end
class Heading2 < Container
getter identifier : String
def initialize(@children : Children, @identifier : String)
end
First step rendering a page The API responds with a bunch of paragraphs which the client converts into Paragraph objects. This turns the paragraphs in a PostResponse's Paragraph objects into the form needed to render them on a page. This includes converting flat list elements into list elements nested by a UL. And adding a limited markups along the way. The array of paragraphs is passed to a recursive function. The function takes the first paragraph and either wraps the (marked up) contents in a container tag (like Paragraph or Heading3), and then moves onto the next tag. If it finds a list, it starts parsing the next paragraphs as a list instead. Originally, this was implemented like so: ```crystal paragraph = paragraphs.shift if list? convert_list([paragraph] + paragraphs) end ``` However, passing the `paragraphs` after adding it to the already shifted `paragraph` creates a new object. This means `paragraphs` won't be mutated and once the list is parsed, it starts with the next element of the list. Instead, the element is `shift`ed inside each converter. ```crystal if paragraphs.first == list? convert_list(paragraphs) end def convert_list(paragraphs) paragraph = paragraphs.shift # ... end ``` When rendering, there is an Empty and Container object. These represent a kind of "null object" for both leafs and parent objects respectively. They should never actually render. Emptys are filtered out, and Containers are never created explicitly but this will make the types pass. IFrames are a bit of a special case. Each IFrame has custom data on it that this system would need to be aware of. For now, instead of trying to parse the seemingly large number of iframe variations and dealing with embedded iframe problems, this will just keep track of the source page URL and send the user there with a link.
2021-05-16 20:14:25 +02:00
end
class Heading3 < Container
getter identifier : String
def initialize(@children : Children, @identifier : String)
end
First step rendering a page The API responds with a bunch of paragraphs which the client converts into Paragraph objects. This turns the paragraphs in a PostResponse's Paragraph objects into the form needed to render them on a page. This includes converting flat list elements into list elements nested by a UL. And adding a limited markups along the way. The array of paragraphs is passed to a recursive function. The function takes the first paragraph and either wraps the (marked up) contents in a container tag (like Paragraph or Heading3), and then moves onto the next tag. If it finds a list, it starts parsing the next paragraphs as a list instead. Originally, this was implemented like so: ```crystal paragraph = paragraphs.shift if list? convert_list([paragraph] + paragraphs) end ``` However, passing the `paragraphs` after adding it to the already shifted `paragraph` creates a new object. This means `paragraphs` won't be mutated and once the list is parsed, it starts with the next element of the list. Instead, the element is `shift`ed inside each converter. ```crystal if paragraphs.first == list? convert_list(paragraphs) end def convert_list(paragraphs) paragraph = paragraphs.shift # ... end ``` When rendering, there is an Empty and Container object. These represent a kind of "null object" for both leafs and parent objects respectively. They should never actually render. Emptys are filtered out, and Containers are never created explicitly but this will make the types pass. IFrames are a bit of a special case. Each IFrame has custom data on it that this system would need to be aware of. For now, instead of trying to parse the seemingly large number of iframe variations and dealing with embedded iframe problems, this will just keep track of the source page URL and send the user there with a link.
2021-05-16 20:14:25 +02:00
end
class ListItem < Container
end
2021-09-08 03:13:28 +02:00
class MixtapeEmbed < Container
end
First step rendering a page The API responds with a bunch of paragraphs which the client converts into Paragraph objects. This turns the paragraphs in a PostResponse's Paragraph objects into the form needed to render them on a page. This includes converting flat list elements into list elements nested by a UL. And adding a limited markups along the way. The array of paragraphs is passed to a recursive function. The function takes the first paragraph and either wraps the (marked up) contents in a container tag (like Paragraph or Heading3), and then moves onto the next tag. If it finds a list, it starts parsing the next paragraphs as a list instead. Originally, this was implemented like so: ```crystal paragraph = paragraphs.shift if list? convert_list([paragraph] + paragraphs) end ``` However, passing the `paragraphs` after adding it to the already shifted `paragraph` creates a new object. This means `paragraphs` won't be mutated and once the list is parsed, it starts with the next element of the list. Instead, the element is `shift`ed inside each converter. ```crystal if paragraphs.first == list? convert_list(paragraphs) end def convert_list(paragraphs) paragraph = paragraphs.shift # ... end ``` When rendering, there is an Empty and Container object. These represent a kind of "null object" for both leafs and parent objects respectively. They should never actually render. Emptys are filtered out, and Containers are never created explicitly but this will make the types pass. IFrames are a bit of a special case. Each IFrame has custom data on it that this system would need to be aware of. For now, instead of trying to parse the seemingly large number of iframe variations and dealing with embedded iframe problems, this will just keep track of the source page URL and send the user there with a link.
2021-05-16 20:14:25 +02:00
class OrderedList < Container
end
class Paragraph < Container
end
class Preformatted < Container
end
class Strong < Container
end
class UnorderedList < Container
end
class Text
getter content : String
def initialize(@content : String)
end
def ==(other : Text)
other.content == content
end
def empty?
content.empty?
end
end
class Image
2021-11-06 18:22:03 +01:00
IMAGE_HOST = "https://cdn-images-1.medium.com/fit/c"
MAX_WIDTH = 800
FALLBACK_HEIGHT = 600
First step rendering a page The API responds with a bunch of paragraphs which the client converts into Paragraph objects. This turns the paragraphs in a PostResponse's Paragraph objects into the form needed to render them on a page. This includes converting flat list elements into list elements nested by a UL. And adding a limited markups along the way. The array of paragraphs is passed to a recursive function. The function takes the first paragraph and either wraps the (marked up) contents in a container tag (like Paragraph or Heading3), and then moves onto the next tag. If it finds a list, it starts parsing the next paragraphs as a list instead. Originally, this was implemented like so: ```crystal paragraph = paragraphs.shift if list? convert_list([paragraph] + paragraphs) end ``` However, passing the `paragraphs` after adding it to the already shifted `paragraph` creates a new object. This means `paragraphs` won't be mutated and once the list is parsed, it starts with the next element of the list. Instead, the element is `shift`ed inside each converter. ```crystal if paragraphs.first == list? convert_list(paragraphs) end def convert_list(paragraphs) paragraph = paragraphs.shift # ... end ``` When rendering, there is an Empty and Container object. These represent a kind of "null object" for both leafs and parent objects respectively. They should never actually render. Emptys are filtered out, and Containers are never created explicitly but this will make the types pass. IFrames are a bit of a special case. Each IFrame has custom data on it that this system would need to be aware of. For now, instead of trying to parse the seemingly large number of iframe variations and dealing with embedded iframe problems, this will just keep track of the source page URL and send the user there with a link.
2021-05-16 20:14:25 +02:00
getter originalHeight : Int32
getter originalWidth : Int32
First step rendering a page The API responds with a bunch of paragraphs which the client converts into Paragraph objects. This turns the paragraphs in a PostResponse's Paragraph objects into the form needed to render them on a page. This includes converting flat list elements into list elements nested by a UL. And adding a limited markups along the way. The array of paragraphs is passed to a recursive function. The function takes the first paragraph and either wraps the (marked up) contents in a container tag (like Paragraph or Heading3), and then moves onto the next tag. If it finds a list, it starts parsing the next paragraphs as a list instead. Originally, this was implemented like so: ```crystal paragraph = paragraphs.shift if list? convert_list([paragraph] + paragraphs) end ``` However, passing the `paragraphs` after adding it to the already shifted `paragraph` creates a new object. This means `paragraphs` won't be mutated and once the list is parsed, it starts with the next element of the list. Instead, the element is `shift`ed inside each converter. ```crystal if paragraphs.first == list? convert_list(paragraphs) end def convert_list(paragraphs) paragraph = paragraphs.shift # ... end ``` When rendering, there is an Empty and Container object. These represent a kind of "null object" for both leafs and parent objects respectively. They should never actually render. Emptys are filtered out, and Containers are never created explicitly but this will make the types pass. IFrames are a bit of a special case. Each IFrame has custom data on it that this system would need to be aware of. For now, instead of trying to parse the seemingly large number of iframe variations and dealing with embedded iframe problems, this will just keep track of the source page URL and send the user there with a link.
2021-05-16 20:14:25 +02:00
2021-11-06 18:22:03 +01:00
def initialize(
@src : String,
originalWidth : Int32?,
originalHeight : Int32?
)
@originalWidth = originalWidth || MAX_WIDTH
@originalHeight = originalHeight || FALLBACK_HEIGHT
First step rendering a page The API responds with a bunch of paragraphs which the client converts into Paragraph objects. This turns the paragraphs in a PostResponse's Paragraph objects into the form needed to render them on a page. This includes converting flat list elements into list elements nested by a UL. And adding a limited markups along the way. The array of paragraphs is passed to a recursive function. The function takes the first paragraph and either wraps the (marked up) contents in a container tag (like Paragraph or Heading3), and then moves onto the next tag. If it finds a list, it starts parsing the next paragraphs as a list instead. Originally, this was implemented like so: ```crystal paragraph = paragraphs.shift if list? convert_list([paragraph] + paragraphs) end ``` However, passing the `paragraphs` after adding it to the already shifted `paragraph` creates a new object. This means `paragraphs` won't be mutated and once the list is parsed, it starts with the next element of the list. Instead, the element is `shift`ed inside each converter. ```crystal if paragraphs.first == list? convert_list(paragraphs) end def convert_list(paragraphs) paragraph = paragraphs.shift # ... end ``` When rendering, there is an Empty and Container object. These represent a kind of "null object" for both leafs and parent objects respectively. They should never actually render. Emptys are filtered out, and Containers are never created explicitly but this will make the types pass. IFrames are a bit of a special case. Each IFrame has custom data on it that this system would need to be aware of. For now, instead of trying to parse the seemingly large number of iframe variations and dealing with embedded iframe problems, this will just keep track of the source page URL and send the user there with a link.
2021-05-16 20:14:25 +02:00
end
def ==(other : Image)
other.src == src
end
def src
[IMAGE_HOST, width, height, @src].join("/")
end
def width
[originalWidth, MAX_WIDTH].min.to_s
end
def height
if originalWidth > MAX_WIDTH
(originalHeight * ratio).round.to_i.to_s
else
originalHeight.to_s
end
end
private def ratio
MAX_WIDTH / originalWidth
end
First step rendering a page The API responds with a bunch of paragraphs which the client converts into Paragraph objects. This turns the paragraphs in a PostResponse's Paragraph objects into the form needed to render them on a page. This includes converting flat list elements into list elements nested by a UL. And adding a limited markups along the way. The array of paragraphs is passed to a recursive function. The function takes the first paragraph and either wraps the (marked up) contents in a container tag (like Paragraph or Heading3), and then moves onto the next tag. If it finds a list, it starts parsing the next paragraphs as a list instead. Originally, this was implemented like so: ```crystal paragraph = paragraphs.shift if list? convert_list([paragraph] + paragraphs) end ``` However, passing the `paragraphs` after adding it to the already shifted `paragraph` creates a new object. This means `paragraphs` won't be mutated and once the list is parsed, it starts with the next element of the list. Instead, the element is `shift`ed inside each converter. ```crystal if paragraphs.first == list? convert_list(paragraphs) end def convert_list(paragraphs) paragraph = paragraphs.shift # ... end ``` When rendering, there is an Empty and Container object. These represent a kind of "null object" for both leafs and parent objects respectively. They should never actually render. Emptys are filtered out, and Containers are never created explicitly but this will make the types pass. IFrames are a bit of a special case. Each IFrame has custom data on it that this system would need to be aware of. For now, instead of trying to parse the seemingly large number of iframe variations and dealing with embedded iframe problems, this will just keep track of the source page URL and send the user there with a link.
2021-05-16 20:14:25 +02:00
def empty?
false
end
end
Render embedded content PostResponse::Paragraph's that are of type IFRAME have extra data in the iframe attribute to specify what's in the iframe. Not all data is the same, however. I've identified three types and am using the new EmbeddedConverter class to convert them: * EmbeddedContent, the full iframe experience * GithubGist, because medium or github treat embeds differently for whatever reason * EmbeddedLink, the old style, just a link to the content. Effectively a fallback The size of the original iframe is also specified as an attribute. This code resizes it. The resizing is determined by figuring out the width/height ratio and setting the width to 800. EmbeddedContent can be displayed if we have an embed.ly url, which most iframe response data has. GitHub gists are a notable exception. Gists instead can be embedded simply by taking the gist URL and attaching .js to the end. That becomes the iframe's src attribute. The PostResponse::Paragraph's iframe attribute is nillable. Previous code used lots of if-statements with variable bindings to work with the possible nil values: ```crystal if foo = obj.nillable_value # obj.nillable_value was not nil and foo contains the value else # obj.nillable_value was nil so do something else end ``` See https://crystal-lang.org/reference/syntax_and_semantics/if_var.html for more info In the EmbeddedConverter the monads library has been introduced to get rid of at least one level of nillability. This wraps values in Maybe which allows for a cleaner interface: ```crystal Monads::Try(Value).new(->{ obj.nillable_value }) .to_maybe .fmap(->(value: Value) { # do something with value }) .value_or(# value was nil, do something else) ``` This worked to get the iframe attribute from a Paragraph: ```crystal Monads::Try(PostResponse::IFrame).new(->{ paragraph.iframe }) .to_maybe .fmap(->(iframe : PostResponse::IFrame) { # iframe is not nil! }) .fmap(#and so on) .value_or(Empty.new) ``` iframe only has one attribute: mediaResource which contains the iframe data. That was used to determine one of the three types above. Finally, Tufte.css has options for iframes. They mostly look good except for tweets which are too small and weirdly in the center of the page which actually looks off-center. That's for another day though.
2021-09-13 19:27:52 +02:00
class EmbeddedContent
MAX_WIDTH = 800
getter src : String
2023-05-06 18:10:46 +02:00
getter caption : FigureCaption?
Render embedded content PostResponse::Paragraph's that are of type IFRAME have extra data in the iframe attribute to specify what's in the iframe. Not all data is the same, however. I've identified three types and am using the new EmbeddedConverter class to convert them: * EmbeddedContent, the full iframe experience * GithubGist, because medium or github treat embeds differently for whatever reason * EmbeddedLink, the old style, just a link to the content. Effectively a fallback The size of the original iframe is also specified as an attribute. This code resizes it. The resizing is determined by figuring out the width/height ratio and setting the width to 800. EmbeddedContent can be displayed if we have an embed.ly url, which most iframe response data has. GitHub gists are a notable exception. Gists instead can be embedded simply by taking the gist URL and attaching .js to the end. That becomes the iframe's src attribute. The PostResponse::Paragraph's iframe attribute is nillable. Previous code used lots of if-statements with variable bindings to work with the possible nil values: ```crystal if foo = obj.nillable_value # obj.nillable_value was not nil and foo contains the value else # obj.nillable_value was nil so do something else end ``` See https://crystal-lang.org/reference/syntax_and_semantics/if_var.html for more info In the EmbeddedConverter the monads library has been introduced to get rid of at least one level of nillability. This wraps values in Maybe which allows for a cleaner interface: ```crystal Monads::Try(Value).new(->{ obj.nillable_value }) .to_maybe .fmap(->(value: Value) { # do something with value }) .value_or(# value was nil, do something else) ``` This worked to get the iframe attribute from a Paragraph: ```crystal Monads::Try(PostResponse::IFrame).new(->{ paragraph.iframe }) .to_maybe .fmap(->(iframe : PostResponse::IFrame) { # iframe is not nil! }) .fmap(#and so on) .value_or(Empty.new) ``` iframe only has one attribute: mediaResource which contains the iframe data. That was used to determine one of the three types above. Finally, Tufte.css has options for iframes. They mostly look good except for tweets which are too small and weirdly in the center of the page which actually looks off-center. That's for another day though.
2021-09-13 19:27:52 +02:00
2023-05-06 18:10:46 +02:00
def initialize(
@src : String,
@originalWidth : Int32,
@originalHeight : Int32,
@caption : FigureCaption? = nil
)
Render embedded content PostResponse::Paragraph's that are of type IFRAME have extra data in the iframe attribute to specify what's in the iframe. Not all data is the same, however. I've identified three types and am using the new EmbeddedConverter class to convert them: * EmbeddedContent, the full iframe experience * GithubGist, because medium or github treat embeds differently for whatever reason * EmbeddedLink, the old style, just a link to the content. Effectively a fallback The size of the original iframe is also specified as an attribute. This code resizes it. The resizing is determined by figuring out the width/height ratio and setting the width to 800. EmbeddedContent can be displayed if we have an embed.ly url, which most iframe response data has. GitHub gists are a notable exception. Gists instead can be embedded simply by taking the gist URL and attaching .js to the end. That becomes the iframe's src attribute. The PostResponse::Paragraph's iframe attribute is nillable. Previous code used lots of if-statements with variable bindings to work with the possible nil values: ```crystal if foo = obj.nillable_value # obj.nillable_value was not nil and foo contains the value else # obj.nillable_value was nil so do something else end ``` See https://crystal-lang.org/reference/syntax_and_semantics/if_var.html for more info In the EmbeddedConverter the monads library has been introduced to get rid of at least one level of nillability. This wraps values in Maybe which allows for a cleaner interface: ```crystal Monads::Try(Value).new(->{ obj.nillable_value }) .to_maybe .fmap(->(value: Value) { # do something with value }) .value_or(# value was nil, do something else) ``` This worked to get the iframe attribute from a Paragraph: ```crystal Monads::Try(PostResponse::IFrame).new(->{ paragraph.iframe }) .to_maybe .fmap(->(iframe : PostResponse::IFrame) { # iframe is not nil! }) .fmap(#and so on) .value_or(Empty.new) ``` iframe only has one attribute: mediaResource which contains the iframe data. That was used to determine one of the three types above. Finally, Tufte.css has options for iframes. They mostly look good except for tweets which are too small and weirdly in the center of the page which actually looks off-center. That's for another day though.
2021-09-13 19:27:52 +02:00
end
def width
[@originalWidth, MAX_WIDTH].min.to_s
end
def height
if @originalWidth > MAX_WIDTH
(@originalHeight * ratio).round.to_i.to_s
else
@originalHeight.to_s
end
end
private def ratio
MAX_WIDTH / @originalWidth
end
def ==(other : EmbeddedContent)
2023-05-06 18:10:46 +02:00
other.src == src &&
other.width == width &&
other.height == height &&
other.caption == caption
Render embedded content PostResponse::Paragraph's that are of type IFRAME have extra data in the iframe attribute to specify what's in the iframe. Not all data is the same, however. I've identified three types and am using the new EmbeddedConverter class to convert them: * EmbeddedContent, the full iframe experience * GithubGist, because medium or github treat embeds differently for whatever reason * EmbeddedLink, the old style, just a link to the content. Effectively a fallback The size of the original iframe is also specified as an attribute. This code resizes it. The resizing is determined by figuring out the width/height ratio and setting the width to 800. EmbeddedContent can be displayed if we have an embed.ly url, which most iframe response data has. GitHub gists are a notable exception. Gists instead can be embedded simply by taking the gist URL and attaching .js to the end. That becomes the iframe's src attribute. The PostResponse::Paragraph's iframe attribute is nillable. Previous code used lots of if-statements with variable bindings to work with the possible nil values: ```crystal if foo = obj.nillable_value # obj.nillable_value was not nil and foo contains the value else # obj.nillable_value was nil so do something else end ``` See https://crystal-lang.org/reference/syntax_and_semantics/if_var.html for more info In the EmbeddedConverter the monads library has been introduced to get rid of at least one level of nillability. This wraps values in Maybe which allows for a cleaner interface: ```crystal Monads::Try(Value).new(->{ obj.nillable_value }) .to_maybe .fmap(->(value: Value) { # do something with value }) .value_or(# value was nil, do something else) ``` This worked to get the iframe attribute from a Paragraph: ```crystal Monads::Try(PostResponse::IFrame).new(->{ paragraph.iframe }) .to_maybe .fmap(->(iframe : PostResponse::IFrame) { # iframe is not nil! }) .fmap(#and so on) .value_or(Empty.new) ``` iframe only has one attribute: mediaResource which contains the iframe data. That was used to determine one of the three types above. Finally, Tufte.css has options for iframes. They mostly look good except for tweets which are too small and weirdly in the center of the page which actually looks off-center. That's for another day though.
2021-09-13 19:27:52 +02:00
end
def empty?
false
end
end
class EmbeddedLink
First step rendering a page The API responds with a bunch of paragraphs which the client converts into Paragraph objects. This turns the paragraphs in a PostResponse's Paragraph objects into the form needed to render them on a page. This includes converting flat list elements into list elements nested by a UL. And adding a limited markups along the way. The array of paragraphs is passed to a recursive function. The function takes the first paragraph and either wraps the (marked up) contents in a container tag (like Paragraph or Heading3), and then moves onto the next tag. If it finds a list, it starts parsing the next paragraphs as a list instead. Originally, this was implemented like so: ```crystal paragraph = paragraphs.shift if list? convert_list([paragraph] + paragraphs) end ``` However, passing the `paragraphs` after adding it to the already shifted `paragraph` creates a new object. This means `paragraphs` won't be mutated and once the list is parsed, it starts with the next element of the list. Instead, the element is `shift`ed inside each converter. ```crystal if paragraphs.first == list? convert_list(paragraphs) end def convert_list(paragraphs) paragraph = paragraphs.shift # ... end ``` When rendering, there is an Empty and Container object. These represent a kind of "null object" for both leafs and parent objects respectively. They should never actually render. Emptys are filtered out, and Containers are never created explicitly but this will make the types pass. IFrames are a bit of a special case. Each IFrame has custom data on it that this system would need to be aware of. For now, instead of trying to parse the seemingly large number of iframe variations and dealing with embedded iframe problems, this will just keep track of the source page URL and send the user there with a link.
2021-05-16 20:14:25 +02:00
getter href : String
def initialize(@href : String)
end
def domain
URI.parse(href).host
end
Render embedded content PostResponse::Paragraph's that are of type IFRAME have extra data in the iframe attribute to specify what's in the iframe. Not all data is the same, however. I've identified three types and am using the new EmbeddedConverter class to convert them: * EmbeddedContent, the full iframe experience * GithubGist, because medium or github treat embeds differently for whatever reason * EmbeddedLink, the old style, just a link to the content. Effectively a fallback The size of the original iframe is also specified as an attribute. This code resizes it. The resizing is determined by figuring out the width/height ratio and setting the width to 800. EmbeddedContent can be displayed if we have an embed.ly url, which most iframe response data has. GitHub gists are a notable exception. Gists instead can be embedded simply by taking the gist URL and attaching .js to the end. That becomes the iframe's src attribute. The PostResponse::Paragraph's iframe attribute is nillable. Previous code used lots of if-statements with variable bindings to work with the possible nil values: ```crystal if foo = obj.nillable_value # obj.nillable_value was not nil and foo contains the value else # obj.nillable_value was nil so do something else end ``` See https://crystal-lang.org/reference/syntax_and_semantics/if_var.html for more info In the EmbeddedConverter the monads library has been introduced to get rid of at least one level of nillability. This wraps values in Maybe which allows for a cleaner interface: ```crystal Monads::Try(Value).new(->{ obj.nillable_value }) .to_maybe .fmap(->(value: Value) { # do something with value }) .value_or(# value was nil, do something else) ``` This worked to get the iframe attribute from a Paragraph: ```crystal Monads::Try(PostResponse::IFrame).new(->{ paragraph.iframe }) .to_maybe .fmap(->(iframe : PostResponse::IFrame) { # iframe is not nil! }) .fmap(#and so on) .value_or(Empty.new) ``` iframe only has one attribute: mediaResource which contains the iframe data. That was used to determine one of the three types above. Finally, Tufte.css has options for iframes. They mostly look good except for tweets which are too small and weirdly in the center of the page which actually looks off-center. That's for another day though.
2021-09-13 19:27:52 +02:00
def ==(other : EmbeddedLink)
First step rendering a page The API responds with a bunch of paragraphs which the client converts into Paragraph objects. This turns the paragraphs in a PostResponse's Paragraph objects into the form needed to render them on a page. This includes converting flat list elements into list elements nested by a UL. And adding a limited markups along the way. The array of paragraphs is passed to a recursive function. The function takes the first paragraph and either wraps the (marked up) contents in a container tag (like Paragraph or Heading3), and then moves onto the next tag. If it finds a list, it starts parsing the next paragraphs as a list instead. Originally, this was implemented like so: ```crystal paragraph = paragraphs.shift if list? convert_list([paragraph] + paragraphs) end ``` However, passing the `paragraphs` after adding it to the already shifted `paragraph` creates a new object. This means `paragraphs` won't be mutated and once the list is parsed, it starts with the next element of the list. Instead, the element is `shift`ed inside each converter. ```crystal if paragraphs.first == list? convert_list(paragraphs) end def convert_list(paragraphs) paragraph = paragraphs.shift # ... end ``` When rendering, there is an Empty and Container object. These represent a kind of "null object" for both leafs and parent objects respectively. They should never actually render. Emptys are filtered out, and Containers are never created explicitly but this will make the types pass. IFrames are a bit of a special case. Each IFrame has custom data on it that this system would need to be aware of. For now, instead of trying to parse the seemingly large number of iframe variations and dealing with embedded iframe problems, this will just keep track of the source page URL and send the user there with a link.
2021-05-16 20:14:25 +02:00
other.href == href
end
def empty?
false
end
end
class Anchor < Container
getter href : String
def initialize(@children : Children, @href : String)
end
def ==(other : Anchor)
other.children == children && other.href == href
end
def empty?
false
end
end
2021-07-04 23:37:45 +02:00
class UserAnchor < Container
2021-07-04 23:37:45 +02:00
USER_BASE_URL = "https://medium.com/u/"
getter href : String
def initialize(@children : Children, user_id : String)
@href = USER_BASE_URL + user_id
2021-07-04 23:37:45 +02:00
end
def ==(other : UserAnchor)
other.children == children && other.href == href
2021-07-04 23:37:45 +02:00
end
def empty?
false
end
end
Render embedded content PostResponse::Paragraph's that are of type IFRAME have extra data in the iframe attribute to specify what's in the iframe. Not all data is the same, however. I've identified three types and am using the new EmbeddedConverter class to convert them: * EmbeddedContent, the full iframe experience * GithubGist, because medium or github treat embeds differently for whatever reason * EmbeddedLink, the old style, just a link to the content. Effectively a fallback The size of the original iframe is also specified as an attribute. This code resizes it. The resizing is determined by figuring out the width/height ratio and setting the width to 800. EmbeddedContent can be displayed if we have an embed.ly url, which most iframe response data has. GitHub gists are a notable exception. Gists instead can be embedded simply by taking the gist URL and attaching .js to the end. That becomes the iframe's src attribute. The PostResponse::Paragraph's iframe attribute is nillable. Previous code used lots of if-statements with variable bindings to work with the possible nil values: ```crystal if foo = obj.nillable_value # obj.nillable_value was not nil and foo contains the value else # obj.nillable_value was nil so do something else end ``` See https://crystal-lang.org/reference/syntax_and_semantics/if_var.html for more info In the EmbeddedConverter the monads library has been introduced to get rid of at least one level of nillability. This wraps values in Maybe which allows for a cleaner interface: ```crystal Monads::Try(Value).new(->{ obj.nillable_value }) .to_maybe .fmap(->(value: Value) { # do something with value }) .value_or(# value was nil, do something else) ``` This worked to get the iframe attribute from a Paragraph: ```crystal Monads::Try(PostResponse::IFrame).new(->{ paragraph.iframe }) .to_maybe .fmap(->(iframe : PostResponse::IFrame) { # iframe is not nil! }) .fmap(#and so on) .value_or(Empty.new) ``` iframe only has one attribute: mediaResource which contains the iframe data. That was used to determine one of the three types above. Finally, Tufte.css has options for iframes. They mostly look good except for tweets which are too small and weirdly in the center of the page which actually looks off-center. That's for another day though.
2021-09-13 19:27:52 +02:00
class GithubGist
Proxy GitHub gists with rate limiting Previously, GitHub gists were embedded. The gist url would be detected in a paragraph and the page would render a script like: ```html <script src="https://gist.github.com/user/gist_id.js"></script> ``` The script would then embed the gist on the page. However, gists contain multiple files. It's technically possible to embed a single file in the same way by appending a `file` query param: ```html <script src="https://gist.github.com/user/gist_id.js?file=foo.txt"></script> ``` I wanted to try and tackle proxying gists instead. Overview -------- At a high level the PageConverter kicks off the work of fetching and storing the gist content, then sends that content down to the `ParagraphConverter`. When a paragraph comes up that contains a gist embed, it retrieves the previously fetched content. This allows all the necessary content to be fetched up front so the minimum number of requests need to be made. Fetching Gists -------------- There is now a `GithubClient` class that gets gist content from GitHub's ReST API. The gist API response looks something like this (non-relevant keys removed): ```json { "files": { "file-one.txt": { "filename": "file-one.txt", "raw_url": "https://gist.githubusercontent.com/<username>/<id>/raw/<file_id>/file-o ne.txt", "content": "..." }, "file-two.txt": { "filename": "file-two.txt", "raw_url": "https://gist.githubusercontent.com/<username>/<id>/raw/<file_id>/file-t wo.txt", "content": "..." } } } ``` That response gets turned into a bunch of `GistFile` objects that are then stored in a request-level `GistStore`. Crystal's JSON parsing does not make it easy to parse json with arbitrary keys into objects. This is because each key corresponds to an object property, like `property name : String`. If Crystal doesn't know the keys ahead of time, there's no way to know what methods to create. That's a problem here because the key for each gist file is the unique filename. Fortunately, the keys for each _file_ follows the same pattern and are easy to parse into a `GistFile` object. To turn gist file JSON into Crystal objects, the `GithubClient` turns the whole response into a `JSON::Any` which is like a Hash. Then it extracts just the file data objects and parses those into `GistFile` objects. Those `GistFile` objects are then cached in a `GistStore` that is shared for the page, which means one gist cache per request/article. `GistFile` objects can be fetched out of the store by file, or if no file is specified, it returns all files in the gist. The GistFile is rendered as a link of the file's name to the file in the gist on GitHub, and then a code block of the contents of the file. In summary, the `PageConverter`: * Scans the paragraphs for GitHub gists using `GistScanner` * Requests their data from GitHub using the `GithubClient` * Parses the response into `GistFile`s and populates the `GistStore` * Passes that `GistStore` to the `ParagraphConverter` to use when constructing the page nodes Caching ------- GitHub limits API requests to 5000/hour with a valid api token and 60/hour without. 60 is pretty tight for the usage that scribe.rip gets, but 5000 is reasonable most of the time. Not every article has an embedded gist, but some articles have multiple gists. A viral article (of which Scribe has seen two at the time of this commit) might receive a little over 127k hits/day, which is an average of over 5300/hour. If that article had a gist, Scribe would reach the API limit during parts of the day with high traffic. If it had multiple gists, it would hit it even more. However, average traffic is around 30k visits/day which would be well under the limit, assuming average load. To help not hit that limit, a `GistStore` holds all the `GistFile` objects per gist. The logic in `GistScanner` is smart enough to only return unique gist URLs so each gist is only requested once even if multiple files from one gist exist in an article. This limits the number of times Scribe hits the GitHub API. If Scribe is rate-limited, instead of populating a `GistCache` the `PageConverter` will create a `RateLimitedGistStore`. This is an object that acts like the `GistStore` but returns `RateLimitedGistFile` objects instead of `GistFile` objects. This allows Scribe to gracefully degrade in the event of reaching the rate limit. If rate-limiting becomes a regular problem, Scribe could also be reworked to fallback to the embedded gists again. API Credentials --------------- API credentials are in the form of a GitHub username and a personal access token attached to that username. To get a token, visit https://github.com/settings/tokens and create a new token. The only permission it needs is `gist`. This token is set via the `GITHUB_PERSONAL_ACCESS_TOKEN` environment variable. The username also needs to be set via `GITHUB_USERNAME`. When developing locally, these can both be set in the .env file. Authentication is probably not necessary locally, but it's there if you want to test. If either token is missing, unauthenticated requests are made. Rendering --------- The node tree itself holds a `GithubGist` object. It has a reference to the `GistStore` and the original gist URL. When it renders the page requests the gist's `files`. The gist ID and optional file are detected, and then used to request the file(s) from the `GistStore`. Gists render as a list of each files contents and a link to the file on GitHub. If the requests were rate limited, the store is a `RateLimitedGistStore` and the files are `RateLimitedGistFile`s. These rate-limited objects rendered with a link to the gist on GitHub and text saying that Scribe has been rate-limited. If somehow the file requested doesn't exist in the store, it displays similarly to the rate-limited file but with "file missing" text instead of "rate limited" text. GitHub API docs: https://docs.github.com/en/rest/reference/gists Rate Limiting docs: https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate- limiting
2022-01-23 21:05:46 +01:00
getter gist_store : GistStore | RateLimitedGistStore
def initialize(@href : String, @gist_store : GistStore | RateLimitedGistStore)
Render embedded content PostResponse::Paragraph's that are of type IFRAME have extra data in the iframe attribute to specify what's in the iframe. Not all data is the same, however. I've identified three types and am using the new EmbeddedConverter class to convert them: * EmbeddedContent, the full iframe experience * GithubGist, because medium or github treat embeds differently for whatever reason * EmbeddedLink, the old style, just a link to the content. Effectively a fallback The size of the original iframe is also specified as an attribute. This code resizes it. The resizing is determined by figuring out the width/height ratio and setting the width to 800. EmbeddedContent can be displayed if we have an embed.ly url, which most iframe response data has. GitHub gists are a notable exception. Gists instead can be embedded simply by taking the gist URL and attaching .js to the end. That becomes the iframe's src attribute. The PostResponse::Paragraph's iframe attribute is nillable. Previous code used lots of if-statements with variable bindings to work with the possible nil values: ```crystal if foo = obj.nillable_value # obj.nillable_value was not nil and foo contains the value else # obj.nillable_value was nil so do something else end ``` See https://crystal-lang.org/reference/syntax_and_semantics/if_var.html for more info In the EmbeddedConverter the monads library has been introduced to get rid of at least one level of nillability. This wraps values in Maybe which allows for a cleaner interface: ```crystal Monads::Try(Value).new(->{ obj.nillable_value }) .to_maybe .fmap(->(value: Value) { # do something with value }) .value_or(# value was nil, do something else) ``` This worked to get the iframe attribute from a Paragraph: ```crystal Monads::Try(PostResponse::IFrame).new(->{ paragraph.iframe }) .to_maybe .fmap(->(iframe : PostResponse::IFrame) { # iframe is not nil! }) .fmap(#and so on) .value_or(Empty.new) ``` iframe only has one attribute: mediaResource which contains the iframe data. That was used to determine one of the three types above. Finally, Tufte.css has options for iframes. They mostly look good except for tweets which are too small and weirdly in the center of the page which actually looks off-center. That's for another day though.
2021-09-13 19:27:52 +02:00
end
Proxy GitHub gists with rate limiting Previously, GitHub gists were embedded. The gist url would be detected in a paragraph and the page would render a script like: ```html <script src="https://gist.github.com/user/gist_id.js"></script> ``` The script would then embed the gist on the page. However, gists contain multiple files. It's technically possible to embed a single file in the same way by appending a `file` query param: ```html <script src="https://gist.github.com/user/gist_id.js?file=foo.txt"></script> ``` I wanted to try and tackle proxying gists instead. Overview -------- At a high level the PageConverter kicks off the work of fetching and storing the gist content, then sends that content down to the `ParagraphConverter`. When a paragraph comes up that contains a gist embed, it retrieves the previously fetched content. This allows all the necessary content to be fetched up front so the minimum number of requests need to be made. Fetching Gists -------------- There is now a `GithubClient` class that gets gist content from GitHub's ReST API. The gist API response looks something like this (non-relevant keys removed): ```json { "files": { "file-one.txt": { "filename": "file-one.txt", "raw_url": "https://gist.githubusercontent.com/<username>/<id>/raw/<file_id>/file-o ne.txt", "content": "..." }, "file-two.txt": { "filename": "file-two.txt", "raw_url": "https://gist.githubusercontent.com/<username>/<id>/raw/<file_id>/file-t wo.txt", "content": "..." } } } ``` That response gets turned into a bunch of `GistFile` objects that are then stored in a request-level `GistStore`. Crystal's JSON parsing does not make it easy to parse json with arbitrary keys into objects. This is because each key corresponds to an object property, like `property name : String`. If Crystal doesn't know the keys ahead of time, there's no way to know what methods to create. That's a problem here because the key for each gist file is the unique filename. Fortunately, the keys for each _file_ follows the same pattern and are easy to parse into a `GistFile` object. To turn gist file JSON into Crystal objects, the `GithubClient` turns the whole response into a `JSON::Any` which is like a Hash. Then it extracts just the file data objects and parses those into `GistFile` objects. Those `GistFile` objects are then cached in a `GistStore` that is shared for the page, which means one gist cache per request/article. `GistFile` objects can be fetched out of the store by file, or if no file is specified, it returns all files in the gist. The GistFile is rendered as a link of the file's name to the file in the gist on GitHub, and then a code block of the contents of the file. In summary, the `PageConverter`: * Scans the paragraphs for GitHub gists using `GistScanner` * Requests their data from GitHub using the `GithubClient` * Parses the response into `GistFile`s and populates the `GistStore` * Passes that `GistStore` to the `ParagraphConverter` to use when constructing the page nodes Caching ------- GitHub limits API requests to 5000/hour with a valid api token and 60/hour without. 60 is pretty tight for the usage that scribe.rip gets, but 5000 is reasonable most of the time. Not every article has an embedded gist, but some articles have multiple gists. A viral article (of which Scribe has seen two at the time of this commit) might receive a little over 127k hits/day, which is an average of over 5300/hour. If that article had a gist, Scribe would reach the API limit during parts of the day with high traffic. If it had multiple gists, it would hit it even more. However, average traffic is around 30k visits/day which would be well under the limit, assuming average load. To help not hit that limit, a `GistStore` holds all the `GistFile` objects per gist. The logic in `GistScanner` is smart enough to only return unique gist URLs so each gist is only requested once even if multiple files from one gist exist in an article. This limits the number of times Scribe hits the GitHub API. If Scribe is rate-limited, instead of populating a `GistCache` the `PageConverter` will create a `RateLimitedGistStore`. This is an object that acts like the `GistStore` but returns `RateLimitedGistFile` objects instead of `GistFile` objects. This allows Scribe to gracefully degrade in the event of reaching the rate limit. If rate-limiting becomes a regular problem, Scribe could also be reworked to fallback to the embedded gists again. API Credentials --------------- API credentials are in the form of a GitHub username and a personal access token attached to that username. To get a token, visit https://github.com/settings/tokens and create a new token. The only permission it needs is `gist`. This token is set via the `GITHUB_PERSONAL_ACCESS_TOKEN` environment variable. The username also needs to be set via `GITHUB_USERNAME`. When developing locally, these can both be set in the .env file. Authentication is probably not necessary locally, but it's there if you want to test. If either token is missing, unauthenticated requests are made. Rendering --------- The node tree itself holds a `GithubGist` object. It has a reference to the `GistStore` and the original gist URL. When it renders the page requests the gist's `files`. The gist ID and optional file are detected, and then used to request the file(s) from the `GistStore`. Gists render as a list of each files contents and a link to the file on GitHub. If the requests were rate limited, the store is a `RateLimitedGistStore` and the files are `RateLimitedGistFile`s. These rate-limited objects rendered with a link to the gist on GitHub and text saying that Scribe has been rate-limited. If somehow the file requested doesn't exist in the store, it displays similarly to the rate-limited file but with "file missing" text instead of "rate limited" text. GitHub API docs: https://docs.github.com/en/rest/reference/gists Rate Limiting docs: https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate- limiting
2022-01-23 21:05:46 +01:00
def files : Array(GistFile) | Array(MissingGistFile) | Array(RateLimitedGistFile)
gist_store.get_gist_files(params.id, params.filename)
end
private def params
GistParams.extract_from_url(@href)
Render embedded content PostResponse::Paragraph's that are of type IFRAME have extra data in the iframe attribute to specify what's in the iframe. Not all data is the same, however. I've identified three types and am using the new EmbeddedConverter class to convert them: * EmbeddedContent, the full iframe experience * GithubGist, because medium or github treat embeds differently for whatever reason * EmbeddedLink, the old style, just a link to the content. Effectively a fallback The size of the original iframe is also specified as an attribute. This code resizes it. The resizing is determined by figuring out the width/height ratio and setting the width to 800. EmbeddedContent can be displayed if we have an embed.ly url, which most iframe response data has. GitHub gists are a notable exception. Gists instead can be embedded simply by taking the gist URL and attaching .js to the end. That becomes the iframe's src attribute. The PostResponse::Paragraph's iframe attribute is nillable. Previous code used lots of if-statements with variable bindings to work with the possible nil values: ```crystal if foo = obj.nillable_value # obj.nillable_value was not nil and foo contains the value else # obj.nillable_value was nil so do something else end ``` See https://crystal-lang.org/reference/syntax_and_semantics/if_var.html for more info In the EmbeddedConverter the monads library has been introduced to get rid of at least one level of nillability. This wraps values in Maybe which allows for a cleaner interface: ```crystal Monads::Try(Value).new(->{ obj.nillable_value }) .to_maybe .fmap(->(value: Value) { # do something with value }) .value_or(# value was nil, do something else) ``` This worked to get the iframe attribute from a Paragraph: ```crystal Monads::Try(PostResponse::IFrame).new(->{ paragraph.iframe }) .to_maybe .fmap(->(iframe : PostResponse::IFrame) { # iframe is not nil! }) .fmap(#and so on) .value_or(Empty.new) ``` iframe only has one attribute: mediaResource which contains the iframe data. That was used to determine one of the three types above. Finally, Tufte.css has options for iframes. They mostly look good except for tweets which are too small and weirdly in the center of the page which actually looks off-center. That's for another day though.
2021-09-13 19:27:52 +02:00
end
def ==(other : GithubGist)
Proxy GitHub gists with rate limiting Previously, GitHub gists were embedded. The gist url would be detected in a paragraph and the page would render a script like: ```html <script src="https://gist.github.com/user/gist_id.js"></script> ``` The script would then embed the gist on the page. However, gists contain multiple files. It's technically possible to embed a single file in the same way by appending a `file` query param: ```html <script src="https://gist.github.com/user/gist_id.js?file=foo.txt"></script> ``` I wanted to try and tackle proxying gists instead. Overview -------- At a high level the PageConverter kicks off the work of fetching and storing the gist content, then sends that content down to the `ParagraphConverter`. When a paragraph comes up that contains a gist embed, it retrieves the previously fetched content. This allows all the necessary content to be fetched up front so the minimum number of requests need to be made. Fetching Gists -------------- There is now a `GithubClient` class that gets gist content from GitHub's ReST API. The gist API response looks something like this (non-relevant keys removed): ```json { "files": { "file-one.txt": { "filename": "file-one.txt", "raw_url": "https://gist.githubusercontent.com/<username>/<id>/raw/<file_id>/file-o ne.txt", "content": "..." }, "file-two.txt": { "filename": "file-two.txt", "raw_url": "https://gist.githubusercontent.com/<username>/<id>/raw/<file_id>/file-t wo.txt", "content": "..." } } } ``` That response gets turned into a bunch of `GistFile` objects that are then stored in a request-level `GistStore`. Crystal's JSON parsing does not make it easy to parse json with arbitrary keys into objects. This is because each key corresponds to an object property, like `property name : String`. If Crystal doesn't know the keys ahead of time, there's no way to know what methods to create. That's a problem here because the key for each gist file is the unique filename. Fortunately, the keys for each _file_ follows the same pattern and are easy to parse into a `GistFile` object. To turn gist file JSON into Crystal objects, the `GithubClient` turns the whole response into a `JSON::Any` which is like a Hash. Then it extracts just the file data objects and parses those into `GistFile` objects. Those `GistFile` objects are then cached in a `GistStore` that is shared for the page, which means one gist cache per request/article. `GistFile` objects can be fetched out of the store by file, or if no file is specified, it returns all files in the gist. The GistFile is rendered as a link of the file's name to the file in the gist on GitHub, and then a code block of the contents of the file. In summary, the `PageConverter`: * Scans the paragraphs for GitHub gists using `GistScanner` * Requests their data from GitHub using the `GithubClient` * Parses the response into `GistFile`s and populates the `GistStore` * Passes that `GistStore` to the `ParagraphConverter` to use when constructing the page nodes Caching ------- GitHub limits API requests to 5000/hour with a valid api token and 60/hour without. 60 is pretty tight for the usage that scribe.rip gets, but 5000 is reasonable most of the time. Not every article has an embedded gist, but some articles have multiple gists. A viral article (of which Scribe has seen two at the time of this commit) might receive a little over 127k hits/day, which is an average of over 5300/hour. If that article had a gist, Scribe would reach the API limit during parts of the day with high traffic. If it had multiple gists, it would hit it even more. However, average traffic is around 30k visits/day which would be well under the limit, assuming average load. To help not hit that limit, a `GistStore` holds all the `GistFile` objects per gist. The logic in `GistScanner` is smart enough to only return unique gist URLs so each gist is only requested once even if multiple files from one gist exist in an article. This limits the number of times Scribe hits the GitHub API. If Scribe is rate-limited, instead of populating a `GistCache` the `PageConverter` will create a `RateLimitedGistStore`. This is an object that acts like the `GistStore` but returns `RateLimitedGistFile` objects instead of `GistFile` objects. This allows Scribe to gracefully degrade in the event of reaching the rate limit. If rate-limiting becomes a regular problem, Scribe could also be reworked to fallback to the embedded gists again. API Credentials --------------- API credentials are in the form of a GitHub username and a personal access token attached to that username. To get a token, visit https://github.com/settings/tokens and create a new token. The only permission it needs is `gist`. This token is set via the `GITHUB_PERSONAL_ACCESS_TOKEN` environment variable. The username also needs to be set via `GITHUB_USERNAME`. When developing locally, these can both be set in the .env file. Authentication is probably not necessary locally, but it's there if you want to test. If either token is missing, unauthenticated requests are made. Rendering --------- The node tree itself holds a `GithubGist` object. It has a reference to the `GistStore` and the original gist URL. When it renders the page requests the gist's `files`. The gist ID and optional file are detected, and then used to request the file(s) from the `GistStore`. Gists render as a list of each files contents and a link to the file on GitHub. If the requests were rate limited, the store is a `RateLimitedGistStore` and the files are `RateLimitedGistFile`s. These rate-limited objects rendered with a link to the gist on GitHub and text saying that Scribe has been rate-limited. If somehow the file requested doesn't exist in the store, it displays similarly to the rate-limited file but with "file missing" text instead of "rate limited" text. GitHub API docs: https://docs.github.com/en/rest/reference/gists Rate Limiting docs: https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate- limiting
2022-01-23 21:05:46 +01:00
other.gist_store == gist_store
Render embedded content PostResponse::Paragraph's that are of type IFRAME have extra data in the iframe attribute to specify what's in the iframe. Not all data is the same, however. I've identified three types and am using the new EmbeddedConverter class to convert them: * EmbeddedContent, the full iframe experience * GithubGist, because medium or github treat embeds differently for whatever reason * EmbeddedLink, the old style, just a link to the content. Effectively a fallback The size of the original iframe is also specified as an attribute. This code resizes it. The resizing is determined by figuring out the width/height ratio and setting the width to 800. EmbeddedContent can be displayed if we have an embed.ly url, which most iframe response data has. GitHub gists are a notable exception. Gists instead can be embedded simply by taking the gist URL and attaching .js to the end. That becomes the iframe's src attribute. The PostResponse::Paragraph's iframe attribute is nillable. Previous code used lots of if-statements with variable bindings to work with the possible nil values: ```crystal if foo = obj.nillable_value # obj.nillable_value was not nil and foo contains the value else # obj.nillable_value was nil so do something else end ``` See https://crystal-lang.org/reference/syntax_and_semantics/if_var.html for more info In the EmbeddedConverter the monads library has been introduced to get rid of at least one level of nillability. This wraps values in Maybe which allows for a cleaner interface: ```crystal Monads::Try(Value).new(->{ obj.nillable_value }) .to_maybe .fmap(->(value: Value) { # do something with value }) .value_or(# value was nil, do something else) ``` This worked to get the iframe attribute from a Paragraph: ```crystal Monads::Try(PostResponse::IFrame).new(->{ paragraph.iframe }) .to_maybe .fmap(->(iframe : PostResponse::IFrame) { # iframe is not nil! }) .fmap(#and so on) .value_or(Empty.new) ``` iframe only has one attribute: mediaResource which contains the iframe data. That was used to determine one of the three types above. Finally, Tufte.css has options for iframes. They mostly look good except for tweets which are too small and weirdly in the center of the page which actually looks off-center. That's for another day though.
2021-09-13 19:27:52 +02:00
end
def empty?
false
end
end
First step rendering a page The API responds with a bunch of paragraphs which the client converts into Paragraph objects. This turns the paragraphs in a PostResponse's Paragraph objects into the form needed to render them on a page. This includes converting flat list elements into list elements nested by a UL. And adding a limited markups along the way. The array of paragraphs is passed to a recursive function. The function takes the first paragraph and either wraps the (marked up) contents in a container tag (like Paragraph or Heading3), and then moves onto the next tag. If it finds a list, it starts parsing the next paragraphs as a list instead. Originally, this was implemented like so: ```crystal paragraph = paragraphs.shift if list? convert_list([paragraph] + paragraphs) end ``` However, passing the `paragraphs` after adding it to the already shifted `paragraph` creates a new object. This means `paragraphs` won't be mutated and once the list is parsed, it starts with the next element of the list. Instead, the element is `shift`ed inside each converter. ```crystal if paragraphs.first == list? convert_list(paragraphs) end def convert_list(paragraphs) paragraph = paragraphs.shift # ... end ``` When rendering, there is an Empty and Container object. These represent a kind of "null object" for both leafs and parent objects respectively. They should never actually render. Emptys are filtered out, and Containers are never created explicitly but this will make the types pass. IFrames are a bit of a special case. Each IFrame has custom data on it that this system would need to be aware of. For now, instead of trying to parse the seemingly large number of iframe variations and dealing with embedded iframe problems, this will just keep track of the source page URL and send the user there with a link.
2021-05-16 20:14:25 +02:00
end