FeedDirectory.org as a possible alternative for gpodder.net

dellagustin · February 9, 2021, 9:41pm

Hello All,

There is a new service for synchronizing RSS feeds subscription and episode position: https://feeddirectory.org/

It is still in early stages, but it might be interesting to follow up, if it matures it could be an alternative for gpodder.net

Best Regards,
Guilherme

ByteHamster · February 10, 2021, 8:17am

Sounds pretty nice. They even plan to open-source the code: Building feeddirectory.org: We're live, kinda I really like the fact that they use hashing, so they do not need to store the email address.

A few things I am missing before we can integrate:

Get the state of multiple episodes at once using the API
Get all episode/subscription changes after a specific date (otherwise, we need to request every single episode of every single subscription on each sync)
Private subscriptions (I think the API can currently query who is subscribed to which feed, which I consider a privacy issue)
Clarification on how they deal with url redirects (do they keep the old url or do they switch? What happens to episode states if they switch?)

stucoates · February 16, 2021, 11:10pm

All of these shouldn’t be an issue… give me a week or so and I’ll have a solution for at least the first 3 points.

With regards to the “URL redirects” - feeddirectory.org doesn’t perform any processing or fetching from the URLs themselves, the URLs are used to identify unique feeds so I’m not sure how this is relevant… maybe I’m misunderstanding the requirement though.

ByteHamster · February 17, 2021, 7:25am

Okay, so FeedDirectory (unlike gpodder) is just a database that does not do any feed parsing by itself. Then the fourth point is no longer relevant

To clarify my message above: The feature “get state of multiple episodes” would be used for the initial sync - so basically downloading the complete information stored for one user.

keunes · February 22, 2021, 5:28pm

@stucoates First of all thanks for working on this - it’s a great initiative. I have two questions:

Since you don’t do any processing or fetching URLs, I’m wondering: do you ‘aggregate’ data in a way? As in: are you able to provide/develop a list of popular feeds (average for the whole userbase)?
And a related question, are the two datasets (‘the things you publish’ and ‘the things you consume’) separate or linked somehow?

I’m asking because of the following scenario:
I as an AntennaPod user follow xyz.com/rss, and the feed changes address (to abc.com/rss) with a nice 301/308 redirect on the server. Because of the redirect AntennaPod updates the feed it follows. However, the person owning/hosting the feed forgot to update their entry on feeddirectory.org, which thus stays xyz.com/rss. My AntennaPod installation and the feeddirectory.org entry are then ‘conflicting’

(Also, but you might label this ‘out of scope’, it would be cool to get data on which feeds are popular, so that relational recommendations (‘those subscribed to x are subscribed to y’) could be presented for example directly in AntennaPod for those that sync with your service.)

stucoates · February 22, 2021, 5:52pm

There’s no API at the moment to do any “top 20” type of thing but it’s pretty trivial to do. I’d have to think if there are any possible privacy issues relating to this, but I suppose if it just uses public subscriptions and ignores private ones then there wouldn’t be an issue.

The published and subscriptions are held separately but are joined by the account so “those subscribed to x are subscribed to y” is perfectly possible, but as before, I’ll have to think of any possible privacy issues.

The automatic upgrading of links in feeddirectory.org is something I’m thinking about when I start building the “health check” backend… if a 301/308 is returned when the crawler visits a link, it’s pretty easy to update everything. Whether this is the correct thing to do will require some thought.

stucoates · February 22, 2021, 10:19pm

Here’s a preview of the APIs that’ll be rolling out this week to “fix” the things mentioned in this thread: FeedDirectory.org

I still need to do some testing prior to putting the code live, but thought I’d give you an early look into my thinking about how things should be working with FD.

Hope this all makes sense - please let me know if there’s anything too confusing.

keunes · February 22, 2021, 10:25pm

Thanks for the quick reply

Do I correctly understand that while you note there’s a link between the two:

there’s no risk associated with the ‘conflict’ between client and server?

Nice that a top x and relational recommendations are technically possible, understandably privacy is a concern. From a user perspective I would keep my subscriptions private, but I wouldn’t mind making them usable for ‘the algorithm’ to determine relational recommendations (provided that there’s no way to de-anonymise my data in a way, e.g. if no results are returned until there’s 25 or 50 users subscribed to a given feed).

keunes · February 22, 2021, 10:48pm

Just to recap ByteHamster’s comments:

Get All Episode Positions

See above, using lookback-seconds

About the privacy/permissions: I understand from the docs that subscription privacy has two switches: private and follower-only. However, they seem mutually exclusive to me, so I would rather expect one switch with three possible values: public, followers only, private (as you suggested in your reply on Mastodon).

Also, as a user I probably wouldn’t set this on a per-feed level, but rather for my account as a whole. Would it make sense to add POST/GET calls for the account level, and that this account value is taken as default for all newly added feeds?

ByteHamster · February 23, 2021, 7:00am

Thanks, I just had a quick look at it. Wouldn’t specifiying a timestamp be more precise than lookback-seconds? What if the request somehow takes a few seconds to be processed and returned? Also, both you and the clients need to do date calculations (for example when using paged loading) whereas with a simple timestamp, we can just pass the most recent timestamp +1 to get the next page.

I think the client can handle this

keunes · February 23, 2021, 7:21am

Sure, but if in AP there is a ‘global’ setting that determines the behaviour of individual requests, the behaviour would be different on the client (AP) and the server’s web interface (or another client) - or I would have to change that setting on every client (incl web). If these other clients offer such setting, anyway. Doesn’t feel very logic.

stucoates · February 23, 2021, 9:34am

I’m not sure I understand the concern around a possible conflict between published items and subscriptions. They are held separately but against the same account - different APIs also.

I’d also thought about having thresholds around inclusion in any data aggregation algorithms… will have a think about what the best approach is here once I have some real life data to look at.

stucoates · February 23, 2021, 9:43am

Completely understand the possible confusion here. The private and follower-only switches are kind of mutually exclusive in this case - the “most private” case will win (e.g. private overrules follower-only). There’s a reason why I’ve done the implementation like this - it makes adding extra switches a lot easier at a later date. App developers can decide how they surface the settings and flip the appropriate bits on the FD side. Possible to have a higher-level API that flips multiple switches in one shot, but there’s a danger there of API bloat if too many of those exist.

There will be (soon) switched at the account level for defaulting the subscription/publish settings so, for example, all new subscriptions are private (or follower-only) by default.

stucoates · February 23, 2021, 9:54am

The reason I’ve gone for an offset from current time is that I’ve had so many issues in the past (far too many years to mention) with people (“developers”) not understanding the concept of timezones let alone the ability to specify a correctly formatted timestamp, so for the sake of my own sanity I’ve removed that from the equation completely. The FD service will work at UTC, and the apps can work with whatever they want, as long as they remain consistent.

If there’s enough demand I’ll add the option to supply a timestamp instead, but maybe with the caveat that if you F it up you’re on your own.

ByteHamster · February 23, 2021, 4:04pm

Yeah, I already suspected timezones to be the problem. How about returning the timestamp in the server responses? Then a client can just take that same timestamp when synchronizing next time.

stucoates · February 23, 2021, 7:42pm

Yep, good idea - I could return the server timestamp in all API responses. The only problem I can see here is that because a lot can happen at the same time, if the client relies on the timestamp being the exact point that data is read, it’s possible for a parallel write to have a timestamp before the read but is not yet visible to the read due to the way RDBMS transaction isolation works… only the matter of milliseconds, but still possible.

So, whatever I do, it’s probably safer for clients to assume that they may receive duplicate items if they’re being retrieved based on the time. A “safer” way would be to provide a serial event number where the client says “last event I received is 12345, send me everything from event 12346” but this means a greater amount of smarts on the client, a more complex API, and more complexity on the server to assign sequential event IDs on the server.

I’ll go ahead with the timestamp in API responses and the ability to send timestamp instead of “loopback-seconds” and see how it goes.

stucoates · February 23, 2021, 9:34pm

Have just put the new code live on feeddirectory.org, along with a short blogpost: Building feeddirectory.org: Permissions, Followers, Comments

I think I’ve covered all of the features that have been requested here and would really appreciate any further feedback. Oh, and yes @ByteHamster, it now supports timestamps.

Sadly, I don’t own an Android device to enable me to test any AP integration, but will assist where I can with getting things up and running.

stucoates · February 24, 2021, 11:15am

Oops, forgot to mention… as part of each API response there’s now a “server-time” component - this is the database transaction time - which should help with any subsequent “modified-since” requests. Use with the caveat I mentioned earlier about updates happening in parallel with the read being invisible.

keunes · February 25, 2021, 7:33am

I guess it’s time to create an issue on GitHub for the technical implementation. I’m wondering though, @stucoates, do you plan to make a web interface for people to create & manage accounts?
I can create mock-ups for how that could work in AntennaPod, but I feel that account creation is out of scope (for gpodder.net we also only offer ‘Login’).

stucoates · February 25, 2021, 9:09am

Hold off doing any design/implementation for a few days. I’ve had a discussion with another app developer about how these things can work and it may be that FD issues unique tokens for an app/user combination - this way a secret reset will not result in the need to re-authenticate everything that’s connected to an account.

For this, yes I will be building a simple account management UI and associated new APIs.

I’ll get back to you once this is done - should be a few days before I’ve made the decision and built the APIs, the UI will probably be done at some point next week.