wip (almost done)
This commit is contained in:
parent
0b01a34ed1
commit
9a7e720e7e
2 changed files with 957 additions and 0 deletions
BIN
img/music-automation.png
Normal file
BIN
img/music-automation.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 189 KiB |
957
markdown/Automate-your-music-collection.md
Normal file
957
markdown/Automate-your-music-collection.md
Normal file
|
@ -0,0 +1,957 @@
|
|||
[//]: # (title: Automate your music collection)
|
||||
[//]: # (description: Use Platypush to manage your music activity, discovery playlists and be on top of new releases.)
|
||||
[//]: # (image: /img/music-automation.png)
|
||||
[//]: # (author: Fabio Manganiello <fabio@platypush.tech>)
|
||||
[//]: # (published: 2022-09-18)
|
||||
|
||||
I have been an enthusiastic user of mpd and mopidy for nearly two decades. I
|
||||
have already [written an
|
||||
article](https://blog.platypush.tech/article/Build-your-open-source-multi-room-and-multi-provider-sound-server-with-Platypush-Mopidy-and-Snapcast)
|
||||
on how to leverage mopidy (with its tons of integrations, including Spotify,
|
||||
Tidal, YouTube, Bandcamp, Plex, TuneIn, SoundCloud etc.), Snapcast (with its
|
||||
multi-room listening experience out of the box) and Platypush (with its
|
||||
automation hooks that allow you to easily create if-this-then-that rules for
|
||||
your music events) to take your listening experience to the next level, while
|
||||
using open protocols and easily extensible open-source software.
|
||||
|
||||
There is a feature that I haven't yet covered in my previous articles, and
|
||||
that's the automation of your music collection.
|
||||
|
||||
Spotify, Tidal and other music streaming services offer you features such as a
|
||||
_Discovery Weekly_ or _Release Radar_ playlists, respectively filled with
|
||||
tracks that you may like, or newly released tracks that you may be interested
|
||||
in.
|
||||
|
||||
The problem is that these services come with heavy trade-offs:
|
||||
|
||||
1. Their algorithms are closed. You don't know how Spotify figures out which
|
||||
songs should be picked in your smart playlists. In the past months, Spotify
|
||||
would often suggest me tracks from the same artists that I had already
|
||||
listened to or skipped in the past, and there's no easy way to tell the
|
||||
algorithm "hey, actually I'd like you to suggest me more this kind of music -
|
||||
or maybe calculate suggestions only based on the music I've listened to in
|
||||
this time range".
|
||||
|
||||
2. Those features are tightly coupled with the service you use. If you cancel
|
||||
your Spotify subscription, you lose those smart features as well.
|
||||
Companies like Spotify use such features as a lock-in mechanism -
|
||||
"you can check out any time you like, but if you do then nobody else will
|
||||
provide you with our clever suggestions".
|
||||
|
||||
After migrating from Spotify to Tidal in the past couple of months (TL;DR:
|
||||
Spotify f*cked up their developer experience multiple times over the past
|
||||
decade, and their killing of libspotify without providing any alternatives was
|
||||
the last nail in the coffin for me) I felt like missing their smart mixes,
|
||||
discovery and new releases playlists - and, on the other hand, Tidal took a
|
||||
while to learn my listening habits, and even when it did it often generated
|
||||
smart playlists that were an inch below Spotify's. I asked myself why on earth
|
||||
my music discovery experience should be so tightly coupled to one single cloud
|
||||
service. And I decided that the time had come for me to automatically generate
|
||||
my service-agnostic music suggestions: it's not rocket science anymore, there's
|
||||
plenty of services that you can piggyback on to get artist or tracks similar to
|
||||
some music given as input, and there's just no excuses to feel locked in by
|
||||
Spotify, Google, Tidal or some other cloud music provider.
|
||||
|
||||
In this article we'll cover how to:
|
||||
|
||||
1. Use Platypush to automatically keep track of the music you listen to from
|
||||
any of your devices;
|
||||
2. Calculate the suggested tracks that may be similar to the music you've
|
||||
recently listen to by using the Last.FM API;
|
||||
3. Generate a _Discover Weekly_ playlist similar to Spotify's without relying
|
||||
on Spotify;
|
||||
4. Get the newly released albums and single by subscribing to an RSS feed;
|
||||
5. Generate a weekly playlist with the new releases by filtering those from
|
||||
artists that you've listened to at least once.
|
||||
|
||||
## Ingredients
|
||||
|
||||
We will use Platypush to handle the following features:
|
||||
|
||||
1. Store our listening history to a local database, or synchronize it with a
|
||||
scrobbling service like [last.fm](https://last.fm).
|
||||
2. Periodically inspect our newly listened tracks, and use the last.fm API to
|
||||
retrieve similar tracks.
|
||||
3. Generate a discover weekly playlist based on a simple score that ranks
|
||||
suggestions by match score against the tracks listened on a certain period
|
||||
of time, and increases the weight of suggestions that occur multiple times.
|
||||
4. Monitor new releases from the newalbumreleases.net RSS feed, and create a
|
||||
weekly _Release Radar_ playlist containing the items from artists that we
|
||||
have listened to at least once.
|
||||
|
||||
This tutorial will require:
|
||||
|
||||
1. A database to store your listening history and suggestions. The database
|
||||
initialization script has been tested against Postgres, but it should be
|
||||
easy to adapt it to MySQL or SQLite with some minimal modifications.
|
||||
2. A machine (it can be a RaspberryPi, a home server, a VPS, an unused tablet
|
||||
etc.) to run the Platypush automation.
|
||||
3. A Spotify or Tidal account. The reported examples will generate the
|
||||
playlists on a Tidal account by using the `music.tidal` Platypush plugin,
|
||||
but it should be straightforward to adapt them to Spotify by using the
|
||||
`music.spotify` plugin, or even to YouTube by using the YouTube API, or even
|
||||
to local M3U playlists.
|
||||
|
||||
## Setting up the software
|
||||
|
||||
Start by installing Platypush with the
|
||||
[Tidal](https://docs.platypush.tech/platypush/plugins/music.tidal.html),
|
||||
[RSS](https://docs.platypush.tech/platypush/plugins/rss.html) and
|
||||
[Last.fm](https://docs.platypush.tech/platypush/plugins/lastfm.html)
|
||||
integrations:
|
||||
|
||||
```
|
||||
[sudo] pip install 'platypush[tidal,rss,lastfm]'
|
||||
```
|
||||
|
||||
If you want to use Spotify instead of Tidal then just remove `tidal` from the
|
||||
list of extra dependencies - no extra dependencies are required for the
|
||||
[Spotify
|
||||
plugin](https://docs.platypush.tech/platypush/plugins/music.spotify.html).
|
||||
|
||||
If you are planning to listen to music through mpd/mopidy, then you may also
|
||||
want to include `mpd` in the list of extra dependencies, so Platypush can
|
||||
directly monitor your listening activity over the MPD protocol.
|
||||
|
||||
Let's then configure a simple configuration under `~/.config/platypush/config.yaml`:
|
||||
|
||||
```yaml
|
||||
music.tidal:
|
||||
# No configuration required
|
||||
|
||||
# Or, if you use Spotify, create an app at https://developer.spotify.com and
|
||||
# add its credentials here
|
||||
# music.spotify:
|
||||
# client_id: client_id
|
||||
# client_secret: client_secret
|
||||
|
||||
lastfm:
|
||||
api_key: your_api_key
|
||||
api_secret: your_api_secret
|
||||
username: your_user
|
||||
password: your_password
|
||||
|
||||
# Subscribe to updates from newalbumreleases.net
|
||||
rss:
|
||||
subscriptions:
|
||||
- https://newalbumreleases.net/category/cat/feed/
|
||||
|
||||
# Optional, used to send notifications about generation issues to your
|
||||
# mobile/browser. You can also use Pushbullet, an email plugin or a chatbot if
|
||||
# you prefer.
|
||||
ntfy:
|
||||
# No configuration required if you want to use the default server at
|
||||
# https://ntfy.sh
|
||||
|
||||
# Include the mpd plugin and backend if you are listening to music over
|
||||
# mpd/mopidy
|
||||
music.mpd:
|
||||
host: localhost
|
||||
port: 6600
|
||||
|
||||
backend.music.mopidy:
|
||||
host: localhost
|
||||
port: 6600
|
||||
```
|
||||
|
||||
Start Platypush by running the `platypush` command. The first time it should
|
||||
prompt you with a tidal.com link required to authenticate your user. Open it in
|
||||
your browser and authorize the app - the next runs should no longer ask you to
|
||||
authenticate.
|
||||
|
||||
Once the Platypush dependencies are in place, let's move to configure the
|
||||
database.
|
||||
|
||||
## Database configuration
|
||||
|
||||
I'll assume that you have a Postgres database running somewhere, but the script
|
||||
below can be easily adapted also to other DBMS's.
|
||||
|
||||
Database initialization script (download/pastebin link is
|
||||
[here](https://paste.fabiomanganiello.com/blacklight/music.sql)):
|
||||
|
||||
```sql
|
||||
-- New listened tracks will be pushed to the tmp_music table, and normalized by
|
||||
-- a trigger.
|
||||
drop table if exists tmp_music cascade;
|
||||
create table tmp_music(
|
||||
id serial not null,
|
||||
artist varchar(255) not null,
|
||||
title varchar(255) not null,
|
||||
album varchar(255),
|
||||
created_at timestamp with time zone default CURRENT_TIMESTAMP,
|
||||
primary key(id)
|
||||
);
|
||||
|
||||
-- This table will store the tracks' info
|
||||
drop table if exists music_track cascade;
|
||||
create table music_track(
|
||||
id serial not null,
|
||||
artist varchar(255) not null,
|
||||
title varchar(255) not null,
|
||||
album varchar(255),
|
||||
created_at timestamp with time zone default CURRENT_TIMESTAMP,
|
||||
primary key(id),
|
||||
unique(artist, title)
|
||||
);
|
||||
|
||||
-- Create an index on (artist, title), and ensure that the (artist, title) pair
|
||||
-- is unique
|
||||
create unique index track_artist_title_idx on music_track(lower(artist), lower(title));
|
||||
create index track_artist_idx on music_track(lower(artist));
|
||||
|
||||
-- music_activity holds the listen history
|
||||
drop table if exists music_activity cascade;
|
||||
create table music_activity(
|
||||
id serial not null,
|
||||
track_id int not null,
|
||||
created_at timestamp with time zone default CURRENT_TIMESTAMP,
|
||||
primary key(id)
|
||||
);
|
||||
|
||||
-- music_similar keeps track of the similar tracks
|
||||
drop table if exists music_similar cascade;
|
||||
create table music_similar(
|
||||
source_track_id int not null,
|
||||
target_track_id int not null,
|
||||
match_score float not null,
|
||||
primary key(source_track_id, target_track_id),
|
||||
foreign key(source_track_id) references music_track(id),
|
||||
foreign key(target_track_id) references music_track(id)
|
||||
);
|
||||
|
||||
-- music_discovery_playlist keeps track of the generated discovery playlists
|
||||
drop table if exists music_discovery_playlist cascade;
|
||||
create table music_discovery_playlist(
|
||||
id serial not null,
|
||||
name varchar(255),
|
||||
created_at timestamp with time zone default CURRENT_TIMESTAMP,
|
||||
primary key(id)
|
||||
);
|
||||
|
||||
-- This table contains the track included in each discovery playlist
|
||||
drop table if exists music_discovery_playlist_track cascade;
|
||||
create table music_discovery_playlist_track(
|
||||
id serial not null,
|
||||
playlist_id int not null,
|
||||
track_id int not null,
|
||||
primary key(id),
|
||||
unique(playlist_id, track_id),
|
||||
foreign key(playlist_id) references music_discovery_playlist(id),
|
||||
foreign key(track_id) references music_track(id)
|
||||
);
|
||||
|
||||
-- This trigger normalizes the tracks inserted into tmp_track
|
||||
create or replace function sync_music_data()
|
||||
returns trigger as
|
||||
$$
|
||||
declare
|
||||
track_id int;
|
||||
begin
|
||||
insert into music_track(artist, title, album)
|
||||
values(new.artist, new.title, new.album)
|
||||
on conflict(artist, title) do update
|
||||
set album = coalesce(excluded.album, old.album)
|
||||
returning id into track_id;
|
||||
|
||||
insert into music_activity(track_id, created_at)
|
||||
values (track_id, new.created_at);
|
||||
|
||||
delete from tmp_music where id = new.id;
|
||||
return new;
|
||||
end;
|
||||
$$
|
||||
language 'plpgsql';
|
||||
|
||||
drop trigger if exists on_sync_music on tmp_music;
|
||||
create trigger on_sync_music
|
||||
after insert on tmp_music
|
||||
for each row
|
||||
execute procedure sync_music_data();
|
||||
|
||||
-- (Optional) accessory view to easily peek the listened tracks
|
||||
drop view if exists vmusic;
|
||||
create view vmusic as
|
||||
select t.id as track_id
|
||||
, t.artist
|
||||
, t.title
|
||||
, t.album
|
||||
, a.created_at
|
||||
from music_track t
|
||||
join music_activity a
|
||||
on t.id = a.track_id;
|
||||
```
|
||||
|
||||
Run the script on your database - if everything went smooth then all the tables
|
||||
should be successfully created.
|
||||
|
||||
## Synchronizing your music activity
|
||||
|
||||
Now that all the dependencies are in place, it's time to configure the logic to
|
||||
store your music activity to your database.
|
||||
|
||||
If most of your music activity happens through mpd/mopidy, then storing your
|
||||
activity to the database is as simple as creating a hook on a
|
||||
[`NewPlayingTrackEvent`](https://docs.platypush.tech/platypush/events/music.html)
|
||||
that inserts any newly playing track on `tmp_music`. Paste the following
|
||||
content to a new Platypush user script (e.g.
|
||||
`~/.config/platypush/scripts/music/sync.py`):
|
||||
|
||||
```python
|
||||
# ~/.config/platypush/scripts/music/sync.py
|
||||
|
||||
from logging import getLogger
|
||||
|
||||
from platypush.context import get_plugin
|
||||
from platypush.event.hook import hook
|
||||
from platypush.message.event.music import NewPlayingTrackEvent
|
||||
|
||||
logger = getLogger('music_sync')
|
||||
|
||||
# SQLAlchemy connection string that points to your database
|
||||
music_db_engine = 'postgresql+pg8000://dbuser:dbpass@dbhost/dbname'
|
||||
|
||||
|
||||
# Hook that react to NewPlayingTrackEvent events
|
||||
@hook(NewPlayingTrackEvent)
|
||||
def on_new_track_playing(event, **_):
|
||||
track = event.track
|
||||
|
||||
# Skip if the track has no artist/title specified
|
||||
if not (track.get('artist') and track.get('title')):
|
||||
return
|
||||
|
||||
logger.info(
|
||||
'Inserting track: %s - %s',
|
||||
track['artist'], track['title']
|
||||
)
|
||||
|
||||
db = get_plugin('db')
|
||||
db.insert(
|
||||
engine=music_db_engine,
|
||||
table='tmp_music',
|
||||
records=[
|
||||
{
|
||||
'artist': track['artist'],
|
||||
'title': track['title'],
|
||||
'album': track.get('album'),
|
||||
}
|
||||
for track in tracks
|
||||
]
|
||||
)
|
||||
```
|
||||
|
||||
Alternatively, if you also want to sync music activities that happens through
|
||||
other clients (such as the Spotify/Tidal app or web view, or over mobile
|
||||
devices), you may consider leveraging Last.fm. Last.fm (or its open alternative
|
||||
Libre.fm) are _scrobbling_ websites that are compatible with most of the music
|
||||
players out there. Both Spotify and Tidal support scrobbling, the [Android
|
||||
app](https://apkpure.com/last-fm/fm.last.android) can grab any music activity
|
||||
on your phone and scrobble, and there are even [browser
|
||||
extensions](https://chrome.google.com/webstore/detail/web-scrobbler/hhinaapppaileiechjoiifaancjggfjm?hl=en)
|
||||
that allow you to record any music activity from any browser tab.
|
||||
|
||||
So an alternative approach may be to send both your mpd/mopidy music activity,
|
||||
as well as your in-browser or mobile music activity, to last.fm / libre.fm. The
|
||||
corresponding hook would be:
|
||||
|
||||
```python
|
||||
# ~/.config/platypush/scripts/music/sync.py
|
||||
|
||||
from logging import getLogger
|
||||
|
||||
from platypush.context import get_plugin
|
||||
from platypush.event.hook import hook
|
||||
from platypush.message.event.music import NewPlayingTrackEvent
|
||||
|
||||
logger = getLogger('music_sync')
|
||||
|
||||
|
||||
# Hook that react to NewPlayingTrackEvent events
|
||||
@hook(NewPlayingTrackEvent)
|
||||
def on_new_track_playing(event, **_):
|
||||
track = event.track
|
||||
|
||||
# Skip if the track has no artist/title specified
|
||||
if not (track.get('artist') and track.get('title')):
|
||||
return
|
||||
|
||||
lastfm = get_plugin('lastfm')
|
||||
logger.info(
|
||||
'Scrobbling track: %s - %s',
|
||||
track['artist'], track['title']
|
||||
)
|
||||
|
||||
lastfm.scrobble(
|
||||
artist=track['artist'],
|
||||
title=track['title'],
|
||||
album=track.get('album'),
|
||||
)
|
||||
```
|
||||
|
||||
If you go for the scrobbling way, then you may want to periodically synchronize
|
||||
your scrobble history to your local database - for example, through a cron that
|
||||
runs every 30 seconds:
|
||||
|
||||
```python
|
||||
# ~/.config/platypush/scripts/music/scrobble2db.py
|
||||
|
||||
import logging
|
||||
|
||||
from datetime import datetime
|
||||
|
||||
from platypush.context import get_plugin, Variable
|
||||
from platypush.cron import cron
|
||||
|
||||
logger = logging.getLogger('music_sync')
|
||||
music_db_engine = 'postgresql+pg8000://dbuser:dbpass@dbhost/dbname'
|
||||
|
||||
# Use this stored variable to keep track of the time of the latest
|
||||
# synchronized scrobble
|
||||
last_timestamp_var = Variable('LAST_SCROBBLED_TIMESTAMP')
|
||||
|
||||
|
||||
# This cron executes every 30 seconds
|
||||
@cron('* * * * * */30')
|
||||
def sync_scrobbled_tracks(**_):
|
||||
db = get_plugin('db')
|
||||
lastfm = get_plugin('lastfm')
|
||||
|
||||
# Use the last.fm plugin to retrieve all the new tracks scrobbled since
|
||||
# the last check
|
||||
last_timestamp = int(last_timestamp_var.get() or 0)
|
||||
tracks = [
|
||||
track for track in lastfm.get_recent_tracks().output
|
||||
if track.get('timestamp', 0) > last_timestamp
|
||||
]
|
||||
|
||||
# Exit if we have no new music activity
|
||||
if not tracks:
|
||||
return
|
||||
|
||||
# Insert the new tracks on the database
|
||||
db.insert(
|
||||
engine=music_db_engine,
|
||||
table='tmp_music',
|
||||
records=[
|
||||
{
|
||||
'artist': track.get('artist'),
|
||||
'title': track.get('title'),
|
||||
'album': track.get('album'),
|
||||
'created_at': (
|
||||
datetime.fromtimestamp(track['timestamp'])
|
||||
if track.get('timestamp') else None
|
||||
),
|
||||
}
|
||||
for track in tracks
|
||||
]
|
||||
)
|
||||
|
||||
# Update the LAST_SCROBBLED_TIMESTAMP variable with the timestamp of the
|
||||
# most recent played track
|
||||
last_timestamp_var.set(max(
|
||||
int(t.get('timestamp', 0))
|
||||
for t in tracks
|
||||
))
|
||||
|
||||
logger.info('Stored %d new scrobbled track(s)', len(tracks))
|
||||
```
|
||||
|
||||
This cron will basically synchronize your scrobbling history to your local
|
||||
database, so we can use the local database as the source of truth for the next
|
||||
steps - no matter where the music was played from.
|
||||
|
||||
To test the logic, simply restart Platypush, play some music from your
|
||||
favourite player(s), and check that everything gets inserted on the database -
|
||||
even if we are inserting tracks on the `tmp_music` table, the listening history
|
||||
should be automatically normalized on the appropriate tables by the triggered
|
||||
that we created at initialization time.
|
||||
|
||||
## Updating the suggestions
|
||||
|
||||
Now that all the plumbing to get all of your listening history in one data
|
||||
source is in place, let's move to the logic that recalculates the suggestions
|
||||
based on your listening history.
|
||||
|
||||
We will again use the last.fm API to get tracks that are similar to those we
|
||||
listened to recently - I personally find last.fm suggestions often more
|
||||
relevant than those of Spotify's.
|
||||
|
||||
For sake of simplicity, let's map the database tables to some SQLAlchemy ORM
|
||||
classes, so the upcoming SQL interactions can be notably simplified. The ORM
|
||||
model can be stored under e.g. `~/.config/platypush/music/db.py`:
|
||||
|
||||
```python
|
||||
# ~/.config/platypush/scripts/music/db.py
|
||||
|
||||
from sqlalchemy import create_engine
|
||||
from sqlalchemy.ext.automap import automap_base
|
||||
from sqlalchemy.orm import sessionmaker, scoped_session
|
||||
|
||||
music_db_engine = 'postgresql+pg8000://dbuser:dbpass@dbhost/dbname'
|
||||
engine = create_engine(music_db_engine)
|
||||
|
||||
Base = automap_base()
|
||||
Base.prepare(engine, reflect=True)
|
||||
Track = Base.classes.music_track
|
||||
TrackActivity = Base.classes.music_activity
|
||||
TrackSimilar = Base.classes.music_similar
|
||||
DiscoveryPlaylist = Base.classes.music_discovery_playlist
|
||||
DiscoveryPlaylistTrack = Base.classes.music_discovery_playlist_track
|
||||
|
||||
|
||||
def get_db_session():
|
||||
session = scoped_session(sessionmaker(expire_on_commit=False))
|
||||
session.configure(bind=engine)
|
||||
return session()
|
||||
```
|
||||
|
||||
Then create a new user script under e.g.
|
||||
`~/.config/platypush/scripts/music/suggestions.py` with the following content:
|
||||
|
||||
```python
|
||||
# ~/.config/platypush/scripts/music/suggestions.py
|
||||
|
||||
import logging
|
||||
|
||||
from sqlalchemy import tuple_
|
||||
from sqlalchemy.dialects.postgresql import insert
|
||||
from sqlalchemy.sql.expression import bindparam
|
||||
|
||||
from platypush.context import get_plugin, Variable
|
||||
from platypush.cron import cron
|
||||
|
||||
from scripts.music.db import (
|
||||
get_db_session, Track, TrackActivity, TrackSimilar
|
||||
)
|
||||
|
||||
|
||||
logger = logging.getLogger('music_suggestions')
|
||||
|
||||
# This stored variable will keep track of the latest activity ID for which the
|
||||
# suggestions were calculated
|
||||
last_activity_id_var = Variable('LAST_PROCESSED_ACTIVITY_ID')
|
||||
|
||||
|
||||
# A cronjob that runs every 5 minutes and updates the suggestions
|
||||
@cron('*/5 * * * *')
|
||||
def refresh_similar_tracks(**_):
|
||||
last_activity_id = int(last_activity_id_var.get() or 0)
|
||||
|
||||
# Retrieve all the tracks played since the latest synchronized activity ID
|
||||
# that don't have any similar tracks being calculated yet
|
||||
with get_db_session() as session:
|
||||
recent_tracks_without_similars = \
|
||||
_get_recent_tracks_without_similars(last_activity_id)
|
||||
|
||||
try:
|
||||
if not recent_tracks_without_similars:
|
||||
raise StopIteration(
|
||||
'All the recent tracks have processed suggestions')
|
||||
|
||||
# Get the last activity_id
|
||||
batch_size = 10
|
||||
last_activity_id = (
|
||||
recent_tracks_without_similars[:batch_size][-1]['activity_id'])
|
||||
|
||||
logger.info(
|
||||
'Processing suggestions for %d/%d tracks',
|
||||
min(batch_size, len(recent_tracks_without_similars)),
|
||||
len(recent_tracks_without_similars))
|
||||
|
||||
# Build the track_id -> [similar_tracks] map
|
||||
similars_by_track = {
|
||||
track['track_id']: _get_similar_tracks(track['artist'], track['title'])
|
||||
for track in recent_tracks_without_similars[:batch_size]
|
||||
}
|
||||
|
||||
# Map all the similar tracks in an (artist, title) -> info data structure
|
||||
similar_tracks_by_artist_and_title = \
|
||||
_get_similar_tracks_by_artist_and_title(similars_by_track)
|
||||
|
||||
if not similar_tracks_by_artist_and_title:
|
||||
raise StopIteration('No new suggestions to process')
|
||||
|
||||
# Sync all the new similar tracks to the database
|
||||
similar_tracks = \
|
||||
_sync_missing_similar_tracks(similar_tracks_by_artist_and_title)
|
||||
|
||||
# Link listened tracks to similar tracks
|
||||
with get_db_session() as session:
|
||||
stmt = insert(TrackSimilar).values({
|
||||
'source_track_id': bindparam('source_track_id'),
|
||||
'target_track_id': bindparam('target_track_id'),
|
||||
'match_score': bindparam('match_score'),
|
||||
}).on_conflict_do_nothing()
|
||||
|
||||
session.execute(
|
||||
stmt, [
|
||||
{
|
||||
'source_track_id': track_id,
|
||||
'target_track_id': similar_tracks[(similar['artist'], similar['title'])].id,
|
||||
'match_score': similar['score'],
|
||||
}
|
||||
for track_id, similars in similars_by_track.items()
|
||||
for similar in (similars or [])
|
||||
if (similar['artist'], similar['title'])
|
||||
in similar_tracks
|
||||
]
|
||||
)
|
||||
|
||||
session.flush()
|
||||
session.commit()
|
||||
except StopIteration as e:
|
||||
logger.info(e)
|
||||
|
||||
last_activity_id_var.set(last_activity_id)
|
||||
logger.info('Suggestions updated')
|
||||
|
||||
|
||||
def _get_similar_tracks(artist, title):
|
||||
"""
|
||||
Use the last.fm API to retrieve the tracks similar to a given
|
||||
artist/title pair
|
||||
"""
|
||||
import pylast
|
||||
lastfm = get_plugin('lastfm')
|
||||
|
||||
try:
|
||||
return lastfm.get_similar_tracks(
|
||||
artist=artist,
|
||||
title=title,
|
||||
limit=10,
|
||||
)
|
||||
except pylast.PyLastError as e:
|
||||
logger.warning(
|
||||
'Could not find tracks similar to %s - %s: %s',
|
||||
artist, title, e
|
||||
)
|
||||
|
||||
|
||||
def _get_recent_tracks_without_similars(last_activity_id):
|
||||
"""
|
||||
Get all the tracks played after a certain activity ID that don't have
|
||||
any suggestions yet.
|
||||
"""
|
||||
with get_db_session() as session:
|
||||
return [
|
||||
{
|
||||
'track_id': t[0],
|
||||
'artist': t[1],
|
||||
'title': t[2],
|
||||
'activity_id': t[3],
|
||||
}
|
||||
for t in session.query(
|
||||
Track.id.label('track_id'),
|
||||
Track.artist,
|
||||
Track.title,
|
||||
TrackActivity.id.label('activity_id'),
|
||||
)
|
||||
.select_from(
|
||||
Track.__table__
|
||||
.join(
|
||||
TrackSimilar,
|
||||
Track.id == TrackSimilar.source_track_id,
|
||||
isouter=True
|
||||
)
|
||||
.join(
|
||||
TrackActivity,
|
||||
Track.id == TrackActivity.track_id
|
||||
)
|
||||
)
|
||||
.filter(
|
||||
TrackSimilar.source_track_id.is_(None),
|
||||
TrackActivity.id > last_activity_id
|
||||
)
|
||||
.order_by(TrackActivity.id)
|
||||
.all()
|
||||
]
|
||||
|
||||
|
||||
def _get_similar_tracks_by_artist_and_title(similars_by_track):
|
||||
"""
|
||||
Map similar tracks into an (artist, title) -> track dictionary
|
||||
"""
|
||||
similar_tracks_by_artist_and_title = {}
|
||||
for similar in similars_by_track.values():
|
||||
for track in (similar or []):
|
||||
similar_tracks_by_artist_and_title[
|
||||
(track['artist'], track['title'])
|
||||
] = track
|
||||
|
||||
return similar_tracks_by_artist_and_title
|
||||
|
||||
|
||||
def _sync_missing_similar_tracks(similar_tracks_by_artist_and_title):
|
||||
"""
|
||||
Flush newly calculated similar tracks to the database.
|
||||
"""
|
||||
logger.info('Syncing missing similar tracks')
|
||||
with get_db_session() as session:
|
||||
stmt = insert(Track).values({
|
||||
'artist': bindparam('artist'),
|
||||
'title': bindparam('title'),
|
||||
}).on_conflict_do_nothing()
|
||||
|
||||
session.execute(stmt, list(similar_tracks_by_artist_and_title.values()))
|
||||
session.flush()
|
||||
session.commit()
|
||||
|
||||
tracks = session.query(Track).filter(
|
||||
tuple_(Track.artist, Track.title).in_(
|
||||
similar_tracks_by_artist_and_title
|
||||
)
|
||||
).all()
|
||||
|
||||
return {
|
||||
(track.artist, track.title): track
|
||||
for track in tracks
|
||||
}
|
||||
```
|
||||
|
||||
Restart Platypush and let it run for a bit. The cron will operate in batches of
|
||||
10 items each (it can be easily customized), so after a few minutes your
|
||||
`music_suggestions` table should start getting populated.
|
||||
|
||||
## Generating the discovery playlist
|
||||
|
||||
So far we have achieved the following targets:
|
||||
|
||||
- We have a piece of logic that synchronizes all of our listening history to a
|
||||
local database.
|
||||
- We have a way to synchronize last.fm / libre.fm scrobbles to the same
|
||||
database as well.
|
||||
- We have a cronjob that periodically scans our listening history and fetches
|
||||
the suggestions through the last.fm API.
|
||||
|
||||
Now let's put it all together with a cron that runs every week (or daily, or at
|
||||
whatever interval we like) that does the following:
|
||||
|
||||
- It retrieves our listening history over the specified period.
|
||||
- It retrieves the suggested tracks associated to our listening history.
|
||||
- It excludes the tracks that we've already listened to, or that have already
|
||||
been included in previous discovery playlists.
|
||||
- It generates a new discovery playlist with those tracks, ranked according to
|
||||
a simple score:
|
||||
|
||||
$$
|
||||
\rho_i = \sum_{j \in L_i} m_{ij}
|
||||
$$
|
||||
|
||||
Where \( \rho_i \) is the ranking of the suggested _i_-th suggested track, \(
|
||||
L_i \) is the set of listened tracks that have the _i_-th track among its
|
||||
similarities, and \( m_{ij} \) is the match score between _i_ and _j_ as
|
||||
reported by the last.fm API.
|
||||
|
||||
Let's put all these pieces together in a cron defined in e.g.
|
||||
`~/.config/platypush/scripts/music/discovery.py`:
|
||||
|
||||
```python
|
||||
# ~/.config/platypush/scripts/music/discovery.py
|
||||
|
||||
import logging
|
||||
from datetime import date, timedelta
|
||||
|
||||
from platypush.context import get_plugin
|
||||
from platypush.cron import cron
|
||||
|
||||
from scripts.music.db import (
|
||||
get_db_session, Track, TrackActivity, TrackSimilar,
|
||||
DiscoveryPlaylist, DiscoveryPlaylistTrack
|
||||
)
|
||||
|
||||
logger = logging.getLogger('music_discovery')
|
||||
|
||||
|
||||
def get_suggested_tracks(days=7, limit=25):
|
||||
"""
|
||||
Retrieve the suggested tracks from the database.
|
||||
|
||||
:param days: Look back at the listen history for the past <n> days
|
||||
(default: 7).
|
||||
:param limit: Maximum number of track in the discovery playlist
|
||||
(default: 25).
|
||||
"""
|
||||
from sqlalchemy import func
|
||||
|
||||
listened_activity = TrackActivity.__table__.alias('listened_activity')
|
||||
suggested_activity = TrackActivity.__table__.alias('suggested_activity')
|
||||
|
||||
with get_db_session() as session:
|
||||
return [
|
||||
{
|
||||
'track_id': t[0],
|
||||
'artist': t[1],
|
||||
'title': t[2],
|
||||
'score': t[3],
|
||||
}
|
||||
for t in session.query(
|
||||
Track.id,
|
||||
func.min(Track.artist),
|
||||
func.min(Track.title),
|
||||
func.sum(TrackSimilar.match_score).label('score'),
|
||||
)
|
||||
.select_from(
|
||||
Track.__table__
|
||||
.join(
|
||||
TrackSimilar.__table__,
|
||||
Track.id == TrackSimilar.target_track_id
|
||||
)
|
||||
.join(
|
||||
listened_activity,
|
||||
listened_activity.c.track_id == TrackSimilar.source_track_id,
|
||||
)
|
||||
.join(
|
||||
suggested_activity,
|
||||
suggested_activity.c.track_id == TrackSimilar.target_track_id,
|
||||
isouter=True
|
||||
)
|
||||
.join(
|
||||
DiscoveryPlaylistTrack,
|
||||
Track.id == DiscoveryPlaylistTrack.track_id,
|
||||
isouter=True
|
||||
)
|
||||
)
|
||||
.filter(
|
||||
# The track has not been listened
|
||||
suggested_activity.c.track_id.is_(None),
|
||||
# The track has not been suggested already
|
||||
DiscoveryPlaylistTrack.track_id.is_(None),
|
||||
# Filter by recent activity
|
||||
listened_activity.c.created_at >= date.today() - timedelta(days=days)
|
||||
)
|
||||
.group_by(Track.id)
|
||||
# Sort by aggregate match score
|
||||
.order_by(func.sum(TrackSimilar.match_score).desc())
|
||||
.limit(limit)
|
||||
.all()
|
||||
]
|
||||
|
||||
|
||||
def search_remote_tracks(tracks):
|
||||
"""
|
||||
Search for Tidal tracks given a list of suggested tracks.
|
||||
"""
|
||||
# If you use Spotify instead of Tidal, simply replacing `music.tidal`
|
||||
# with `music.spotify` here should suffice.
|
||||
tidal = get_plugin('music.tidal')
|
||||
found_tracks = []
|
||||
|
||||
for track in tracks:
|
||||
query = track['artist'] + ' ' + track['title']
|
||||
logger.info('Searching "%s"', query)
|
||||
results = (
|
||||
tidal.search(query, type='track', limit=1).output.get('tracks', [])
|
||||
)
|
||||
|
||||
if results:
|
||||
track['remote_track_id'] = results[0]['id']
|
||||
found_tracks.append(track)
|
||||
else:
|
||||
logger.warning('Could not find "%s" on TIDAL', query)
|
||||
|
||||
return found_tracks
|
||||
|
||||
|
||||
def refresh_discover_weekly():
|
||||
# If you use Spotify instead of Tidal, simply replacing `music.tidal`
|
||||
# with `music.spotify` here should suffice.
|
||||
tidal = get_plugin('music.tidal')
|
||||
|
||||
# Get the latest suggested tracks
|
||||
suggestions = search_remote_tracks(get_suggested_tracks())
|
||||
if not suggestions:
|
||||
logger.info('No suggestions available')
|
||||
return
|
||||
|
||||
# Retrieve the existing discovery playlists
|
||||
# Our naming convention is that discovery playlist names start with
|
||||
# "Discover Weekly" - feel free to change it
|
||||
playlists = tidal.get_playlists().output
|
||||
discover_playlists = sorted(
|
||||
[
|
||||
pl for pl in playlists
|
||||
if pl['name'].lower().startswith('discover weekly')
|
||||
],
|
||||
key=lambda pl: pl.get('created_at', 0)
|
||||
)
|
||||
|
||||
# Delete all the existing discovery playlists
|
||||
# (except the latest one). We basically keep two discovery playlists at the
|
||||
# time in our collection, so you have two weeks to listen to them before they
|
||||
# get deleted. Feel free to change this logic by modifying the -1 parameter
|
||||
# with e.g. -2, -3 etc. if you want to store more discovery playlists.
|
||||
for playlist in discover_playlists[:-1]:
|
||||
logger.info('Deleting playlist "%s"', playlist['name'])
|
||||
tidal.delete_playlist(playlist['id'])
|
||||
|
||||
# Create a new discovery playlist
|
||||
playlist_name = f'Discover Weekly [{date.today().isoformat()}]'
|
||||
pl = tidal.create_playlist(playlist_name).output
|
||||
playlist_id = pl['id']
|
||||
|
||||
tidal.add_to_playlist(
|
||||
playlist_id,
|
||||
[t['remote_track_id'] for t in suggestions],
|
||||
)
|
||||
|
||||
# Add the playlist to the database
|
||||
with get_db_session() as session:
|
||||
pl = DiscoveryPlaylist(name=playlist_name)
|
||||
session.add(pl)
|
||||
session.flush()
|
||||
session.commit()
|
||||
|
||||
# Add the playlist entries to the database
|
||||
with get_db_session() as session:
|
||||
for track in suggestions:
|
||||
session.add(
|
||||
DiscoveryPlaylistTrack(
|
||||
playlist_id=pl.id,
|
||||
track_id=track['track_id'],
|
||||
)
|
||||
)
|
||||
|
||||
session.commit()
|
||||
|
||||
logger.info('Discover Weekly playlist updated')
|
||||
|
||||
|
||||
@cron('0 6 * * 1')
|
||||
def refresh_discover_weekly_cron(**_):
|
||||
"""
|
||||
This cronjob runs every Monday at 6 AM.
|
||||
"""
|
||||
try:
|
||||
refresh_discover_weekly()
|
||||
except Exception as e:
|
||||
logger.exception(e)
|
||||
|
||||
# (Optional) If anything went wrong with the playlist generation, send
|
||||
# a notification over ntfy
|
||||
ntfy = get_plugin('ntfy')
|
||||
ntfy.send_message(
|
||||
topic='mirrored-notifications-topic',
|
||||
title='Discover Weekly playlist generation failed',
|
||||
message=str(e),
|
||||
priority=4,
|
||||
)
|
||||
```
|
||||
|
||||
You can test the cronjob without having to wait for the next Monday through
|
||||
your Python interpreter:
|
||||
|
||||
```python
|
||||
>>> import os
|
||||
>>>
|
||||
>>> # Move to the Platypush config directory
|
||||
>>> path = os.path.join(os.path.expanduser('~'), '.config', 'platypush')
|
||||
>>> os.chdir(path)
|
||||
>>>
|
||||
>>> # Import and run the cron function
|
||||
>>> from scripts.music.discovery import refresh_discover_weekly_cron
|
||||
>>> refresh_discover_weekly_cron()
|
||||
```
|
||||
|
||||
If everything went well, you should soon see a new playlist in your collection
|
||||
named _Discover Weekly [date]_. Congratulations!
|
Loading…
Add table
Reference in a new issue