Mastodon bot RSS plugin: Additional RSS entry elements #236

Closed
opened 2022-12-13 20:29:01 +01:00 by custompyramidfellow · 1 comment

Hi,

thank you for your excellent post on Mastodon and the related work and guide to run a bot via Platypush and it's RSS plugin. (Side question: If I may ask, is the example code of your blog post also under an MIT license?). I adapted your Mastodon bot code, enhanced it to my needs and it works fine.

Although, I have a feature request which has also been raised in issue #233. I want to add further elements from a RSS feed entry to my Mastodon toots, especially the 'author' and the 'tags' element. I already got it working in a standalone python script using the feedparser library but ultimately could not get it working by modifying the platypush pip package, it always ended up with "'NewFeedEntryEvent' object has no attribute 'author'"

I would appreciate if you could add these two elements officially, as you stated that you could add this via PR quite quickly.

(NO NEED TO READ FURTHER:) Below is to just to document my failed attempt to add parsing the 'author' element to the Platypush RSS plugin:

Installed platypush with editable parameter (to edit rss plugin in real-time).

git clone https://git.platypush.tech/platypush/platypush.git
pip install -e '.[rss]'

Edited platypush/platypush/plugins/rss/init.py to add 'author' element.

        return RssFeedEntrySchema().dump(
            sorted(
                [
                    {
                        'feed_url': url,
                        'feed_title': getattr(feed.feed, 'title', None),
                        'id': getattr(entry, 'id', None),
                        'url': entry.link,
                        'published': datetime.datetime.fromtimestamp(
                            time.mktime(entry.published_parsed)
                        ),
                        'title': entry.title,
                        'summary': getattr(entry, 'summary', None),
                        'content': self._parse_content(entry),
                        'author': getattr(entry, 'author', None),
                    }
                    for entry in feed.entries
                    if getattr(entry, 'published_parsed', None)
                ],
                key=lambda e: e['published'],
            ),
            many=True,
        )

Edited platypush/platypush/backend/http/request/rss/init.py to add 'author' element.

                    e = {
                        'entry_id': entry.id,
                        'title': entry.title,
                        'link': entry.link,
                        'summary': entry.summary,
                        'content': entry.content,
                        'source_id': source_record.id,
                        'published': entry_timestamp,
                        'author': entry.author,
                    }
class FeedEntry(Base):
    """ Models the FeedEntry table, which contains RSS entries """

    __tablename__ = 'FeedEntry'
    __table_args__ = ({'sqlite_autoincrement': True})

    id = Column(Integer, primary_key=True)
    entry_id = Column(String)
    source_id = Column(Integer, ForeignKey('FeedSource.id'), nullable=False)
    title = Column(String)
    link = Column(String)
    summary = Column(String)
    content = Column(String)
    published = Column(DateTime)
    author = Column(String)

So, unfortunately my changes didn't work, but as you see I tried ;)

Forgot to mention: I really enjoyed reading your take on Fediverse history in that blog post also, really nice essay, learned a lot from it :)

Hi, thank you for your [excellent post on Mastodon](https://blog.platypush.tech/article/Create-a-Mastodon-bot-to-forward-Twitter-and-RSS-feeds-to-your-timeline) and the related work and guide to run a bot via Platypush and it's RSS plugin. (Side question: If I may ask, is the example code of your blog post also under an MIT license?). I adapted your Mastodon bot code, enhanced it to my needs and it works fine. Although, I have a feature request which has also been raised in [issue #233](https://git.platypush.tech/platypush/platypush/issues/233). I want to add further elements from a RSS feed entry to my Mastodon toots, especially the 'author' and the 'tags' element. I already got it working in a standalone python script using the feedparser library but ultimately could not get it working by modifying the platypush pip package, it always ended up with "**'NewFeedEntryEvent' object has no attribute 'author'**" I would appreciate if you could add these two elements officially, as you stated that you could add this via PR quite quickly. (NO NEED TO READ FURTHER:) Below is to just to document my failed attempt to add parsing the 'author' element to the Platypush RSS plugin: Installed platypush with editable parameter (to edit rss plugin in real-time). ``` git clone https://git.platypush.tech/platypush/platypush.git pip install -e '.[rss]' ``` Edited **platypush/platypush/plugins/rss/init.py** to add 'author' element. ``` return RssFeedEntrySchema().dump( sorted( [ { 'feed_url': url, 'feed_title': getattr(feed.feed, 'title', None), 'id': getattr(entry, 'id', None), 'url': entry.link, 'published': datetime.datetime.fromtimestamp( time.mktime(entry.published_parsed) ), 'title': entry.title, 'summary': getattr(entry, 'summary', None), 'content': self._parse_content(entry), 'author': getattr(entry, 'author', None), } for entry in feed.entries if getattr(entry, 'published_parsed', None) ], key=lambda e: e['published'], ), many=True, ) ``` Edited **platypush/platypush/backend/http/request/rss/init.py** to add 'author' element. ``` e = { 'entry_id': entry.id, 'title': entry.title, 'link': entry.link, 'summary': entry.summary, 'content': entry.content, 'source_id': source_record.id, 'published': entry_timestamp, 'author': entry.author, } ``` ``` class FeedEntry(Base): """ Models the FeedEntry table, which contains RSS entries """ __tablename__ = 'FeedEntry' __table_args__ = ({'sqlite_autoincrement': True}) id = Column(Integer, primary_key=True) entry_id = Column(String) source_id = Column(Integer, ForeignKey('FeedSource.id'), nullable=False) title = Column(String) link = Column(String) summary = Column(String) content = Column(String) published = Column(DateTime) author = Column(String) ``` So, unfortunately my changes didn't work, but as you see I tried ;) Forgot to mention: I really enjoyed reading your take on Fediverse history in that blog post also, really nice essay, learned a lot from it :)
Owner

If I may ask, is the example code of your blog post also under an MIT license?

All the samples are released under MIT license as well. In hindsight I would have preferred to release both Platypush and any sample code under GPL3, but since it's hard to change licensing once things have started I'd rather keep consistency.

About the author and tags fields, they are quite straightforward to add (your solution was close, you just missed adding them to the NewFeedEntryEvent object as well).

The only problem is that adding columns to an existing SQLAlchemy model without asking people to remove and re-create their database requires migration scripts - i.e. Alembic in the Python world. So far I've tried to kick that can down the road as long as I could, but since migration scripts will also be useful for the new large feature I'm working on (support for the persistence of generic entities) I'd rather pull this tooth now rather than later. If I have a bit of time I should be able to merge a PR over this weekend.

p.s. Nevermind the part about Alembic and database migrations, we won't need it (at least not for this case). The code you found for the FeedEntry table actually is only used in backend.http.request.rss, which is a deprecated backend largely replaced by the rss plugin (one of the reasons is exactly that it was storing each single feed entry on a table for no good reason, and that table may have been hard to extend in the future). So adding the fields just to the event is more than enough to get things to work, since the rss plugin, unlike the previous implementation, is stateless. That should considerably speed up the implementation time.

> If I may ask, is the example code of your blog post also under an MIT license? All the samples are released under MIT license as well. In hindsight I would have preferred to release both Platypush and any sample code under GPL3, but since it's hard to change licensing once things have started I'd rather keep consistency. About the `author` and `tags` fields, they are quite straightforward to add (your solution was close, you just missed adding them to the `NewFeedEntryEvent` object as well). ~~The only problem is that adding columns to an existing SQLAlchemy model without asking people to remove and re-create their database requires migration scripts - i.e. Alembic in the Python world. So far I've tried to kick that can down the road as long as I could, but since migration scripts will also be useful for the new large feature I'm working on (support for the persistence of generic entities) I'd rather pull this tooth now rather than later.~~ If I have a bit of time I should be able to merge a PR over this weekend. p.s. Nevermind the part about Alembic and database migrations, we won't need it (at least not for this case). The code you found for the `FeedEntry` table actually is only used in `backend.http.request.rss`, which is a deprecated backend largely replaced by the `rss` plugin (one of the reasons is exactly that it was storing each single feed entry on a table for no good reason, and that table may have been hard to extend in the future). So adding the fields just to the event is more than enough to get things to work, since the `rss` plugin, unlike the previous implementation, is stateless. That should considerably speed up the implementation time.
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: platypush/platypush#236
No description provided.