feed discovery, the blogosphere, and Sage

Feed discovery is becoming a hot topic with Sage as we move toward our next release. This feels like uncharted territory and I want to try and highlight some of the options and their tradeoffs.

Discovery is an important component of the aggregator usability equation and an significant opportunity for Sage as its browser integration gives us the ability to do this well. The goal here is to allow users to find and subscribe to new feeds in the most effortless manner possible. I'm going to suggest that there are two complimentary approaches to feed discovery, a lightweight and a heavyweight technique.

At this point in the game, doing feed discovery is something of a black art. There is an accepted method for explicitly associating feeds with HTML documents, but it has yet to spread to the far reaches of the blogosphere and there are a good number of sites that don't make use of it. This is where discovery gets tricky. Without an explicit feed declaration, the only option is to scan through any links present in the HTML document trying to determine which of those might be pointing to a valid feed. There is no feed URL naming convention, so these links can look like anything. Sometimes they're obvious, sometimes they're indistinguishable from other links in the document.

The heavyweight approach is to cover all the bases by checking for explicit feed declarations as well as scan the document identifying feed links in the case that there are no explicit feeds. This is what happens in Sage when you click the 'Discover Feeds' icon. All potential links are probed, meta data is collected for those that turn out to be valid feeds, and the results are ranked by relevance and displayed to the user. The benefit to this approach is the quality of the output, if there's a feed to be found, we'll probably get it. The downside is that it's expensive. It takes network bandwidth and CPU cycles to probe the links in a document and this means you won't get instant results. It also means it would be difficult to perform automatically in the background as the user browses.

The lightweight approach doesn't scan the document body for links to probe, but looks only in the <head> section for explicit feed declarations. If these are found, great, you've got some URL's that probably lead to valid feeds and enough meta data for the user to choose between them. This method is inexpensive and can quickly be done at the users request or in the background as they browse from page to page a la Firefox 1.0 Livemarks and Safari RSS. The downside is that it will come up empty handed on any site that doesn't explicitly list its feeds.

For now, I think the answer may be to make use of both methods, a lightweight discovery process running in the background during regular browsing, and a heavyweight catch-all mechanism available at the user's request. Lucky us, it appears that we're being given that lightweight process with the new Livemarks functionality in Firefox 1.0. In the interim, we should be able to piggyback on this feature, giving users access to automatic feed discovery.

September 15, 2004 @ 7:27 PM | Category: Technology


Post a comment

Remember Me?

(you may use HTML tags for style)