Skip to content

A (Mostly) IndieWeb-Compatible RSS Reader

What started out as a fork of Aperture and Monocle turned into an almost entirely new feed reader. Prior to switching to (a forked) Aperture + Monocle (+ Indigenous), I’d been using Miniflux, and, before that, Feedly. The new reader is inspired by all of these!

TL;DR

I built a traditional feed reader with IndieAuth, h-feed, and partial Microsub support. (It supports Micropub, too, but I’m not sure I really need it. I currently have it switched off.) The whole thing’s a dead-simple, PHP-based responsive web app. Works wonders.


Anyhow, here’s what I wanted from a modern feed reader:

  • Microformats, i.e., h-feed, support
  • Ability to manage and consume feeds from the same (responsive) web app
  • Built-in polling mechanism
  • Ability to self-host nearly anywhere
  • Consistent entry markup and styles
  • Clear error messages if feeds go 404 or time out, etc.

Less (or not at all) important:

  • Multiple categories per feed
  • Advanced filters
  • Mute or block feeds or authors
  • A strict server-client separation

Nice to have:

  • Full entries, even for summary-only feeds
  • (Partial) Microsub compatibility
  • Ability to “manually” push notifications to a certain “feed”
  • Cursor-based entry pagination
  • Micropub (to post reactions to my own site)
  • WebSub compatibility
  • Custom CSS

Additional constraints, design decisions:

  • PHP & Laravel
  • Vanilla CSS and JavaScript
  • IE compatibility

And here’s a couple notes on some of the decisions I had to make. (I’ll almost certainly update this post a few times in the near future.)

On Polling

The first thing I worked on was the polling - I haven’t added WebSub support, yet - mechanism. Scheduling tasks in Laravel is extremely easy, and relies on it being called by a cron job exactly once a minute. I wanted a more flexible approach, and found inspiration in WordPress’s cron system, which executes all overdue tasks whenever it is called (rather than just the tasks scheduled that very minute).

There also are polling tiers, inspired by the Yarns plugin for WordPress, so that oft-updated feeds are polled hourly, and rarely updated feeds no more than once a day. And, lastly, I added a bit of randomness, so that feeds would get spread out a bit rather than all get polled exactly on the hour or so.

Of note: Aperture uses pivot tables to connect feeds and entries to channels (and users). This means less duplicate table rows in case multiple users follow the same feeds. It also means that you’d have to store per-user feed or entry data in the pivot table itself, and complicates the use of, e.g., global scopes. Since I knew from the start I wanted to allow users, i.e., myself, to (1) also scrape and filter (web) entries, and (2) modify things like feed URLs, I went with an explicit user_id column on the categories, feeds, and entries table.

The downside: possible duplicate feed and entry rows. The upside: flexibility. In Aperture, if the URL of a feed you’re following is updated, you’re going to have to remove and re-add that feed. Here, I can just update the URL (and other properties) and it still wouldn’t affect other users on my instance. (I think that’s how Miniflux does it, too.)

Now, I of course don’t want to go and download the same feed all the time, just because it’s got multiple followers. That’s why I cache feeds for just under an hour (the top “polling tier”). (Additional note: I don’t cache parsed feeds, but the raw HTML, XML, or whatever, precisely because different users might have different parsing preferences.)

On Entry Markup, and Image Proxies

Like Aperture, I went with X-Ray and PicoFeed. PicoFeed itself strips and sanitizes HTML, and X-Ray then does more of the same. This sometimes leads to broken (image) links and tables, and perfectly harmless (and semantic) HTML being stripped away. At the same time some inline styles are left intact. Luckily, none of this is very hard to “correct.”

I have also added oddly specific regexes, which, e.g., prevent images from appearing twice. (Some pages that use JavaScript-based lazy image loading actually include image sources twice, the second time inside a noscript tag.)

Speaking of, I cooked up (or rather, gathered from diverse sources) a relatively simple “image proxy,” to prevent “insecure content” warnings over HTTPS.

The final step is to run everything through WordPress “auto paragrapher.” This makes it a whole lot easier to consistently style, e.g., text inside a blockquote inside an outer blockquote.

On Timeline Chronology

I order entries on dates published rather than date added to the database, and correct erroneous dates (those in the future or distant past) when inserting new entries. All “publish” dates are UTC, and only get converted to the instance’s timezone when ultimately rendered to HTML.

On Microsub

It’s really simple. I map categories to “channels,” and feeds to “sources.” Read statuses are synced okay. Other methods aren’t quite supported. One thing I should improve, still, is the way items get saved to the database.

Current Annoyances, Aka “To Do”

  • Whatever is sent “over” Microsub is still what happens to be in the JSON column, and not the “normalized” HTML and other attributes (see the notes on entry markup and Microsub above).
  • source tag support is still missing.
  • The u-video microformat tag isn’t always used consistently, it seems. Somehow ensure “videos” really are video files and scrap them if not.
  • Some relative fragment links in RSS feeds still don’t work.
  • “Reply by email” links are stripped away, and probably shouldn’t.
  • i tags are stripped away and definitely shouldn’t.