This website requires JavaScript.
Explore
Help
Sign In
blacklight
/
scribe
Watch
1
Star
0
Fork
You've already forked scribe
0
Code
Issues
Pull Requests
Packages
Projects
Releases
Wiki
Activity
1f517f9031
scribe
/
src
/
version.cr
4 lines
43 B
Crystal
Raw
Normal View
History
Unescape
Escape
Add visible version This is to be able to track which instances (including the main one) have which fixes
2022-01-05 03:26:23 +01:00
module
Scribe
Fix markup errors caused by UTF-16/8 differences Medium uses UTF-16 character offsets (likely to make it easier to parse in JavaScript) but Crystal uses UTF-8. Converting strings to UTF-16 to do offset calculation then back to UFT-8 fixes some markup bugs. --- Medium calculates markup offsets using UTF-16 encoding. Some characters like Emoji are count as multiple bytes which affects those offsets. For example in UTF-16 💸 is worth two bytes, but Crystal strings only count it as one. This is a problem for markup generation because it can offset the markup and even cause out-of-range errors. Take the following example: 💸💸! Imagine that `!` was bold but the emoji isn't. For Crystal, this starts at char index 2, end at char index 3. Medium's markup will say markup goes from character 4 to 5. In a 3 character string like this, trying to access character range 4...5 is an error because 5 is already out of bounds. My theory is that this is meant to be compatible with JavaScript's string length calculations, as Medium is primarily a platform built for the web: ```js "a".length // 1 "💸".length // 2 "👩❤️💋👩".length // 11 ``` To get these same numbers in Crystal strings must be converted to UTF-16: ```crystal "a".to_utf16.size # 1 "💸".to_utf16.size # 2 "👩❤️💋👩".to_utf16.size # 11 ``` The MarkupConverter now converts text into UFT-16 byte arrays on initialization. Once it's figured out the range of bytes needed for each piece of markup, it converts it back into UTF-8 strings.
2022-01-30 17:47:08 +01:00
VERSION
=
"
2022-01-30
"
Add visible version This is to be able to track which instances (including the main one) have which fixes
2022-01-05 03:26:23 +01:00
end