Hello, I would like to highlight an issue which at least partly seems to be causing a problem related to this.
on line 45 in core/src/main/java/de/danoeh/antennapod/core/storage/FeedItemDuplicateGuesser.java
DateFormat dateFormat = DateFormat.getDateInstance(DateFormat.SHORT, Locale.US); // MM/DD/YY
The DateFormat.SHORT causes a false positive when a podcast publishes several times daily, and has the same title. The following feed is a point in case:
This is a republished, thrice-daily radio news programme, which is always scheduled to be ~10 min long, and the title tag is always the same (mimetype as well, obviously). As you can imagine, this is a perfect confluence for a whole lot of false positves.
If you check the linked xml feed, the pubDate tags there actually contain the precise publishing datetime, so changing the DateFormat.SHORT in the snippet above to DateFormat.LONG would probably solve one part of the problem.
(I moved your post to a new topic, as the topic you posted it in is about the podcast host actually duplicating episodes)
The duplicate detection deliberately uses only the date, not the time. It happens regularly that publishers try to edit an episode but instead change the download URL and also break the ID. These duplicates then have a different time but usually the same date. Having the duplicate detection is therefore necessary.
Publishing 3 episodes with the same title on the same day is a practice that I have never seen yet. I am not sure we can make your feed work without breaking other, more common feeds that do not use duplicate titles.
What about a comparison between the link tags? Although they don’t get used in a uniform manner, in some cases they point to some unique url. So, as a last resort, if they’re identical, then so be it, but maybe it would decrease the number of false positive duplicates, without introducing any false negatives. Would that cause a problem? I can’t think of anything else.
Thanks for the reply.
This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.