Damn you, Slate. I'm really starting to get annoyed with various sites' implementation of advertising in feeds. Yes, I'm looking at you too, Boing Boing.
The one upside to the ads in Boing Boing's feed, so far anyway, is that at least they don't change from poll-to-poll. So, although I get a face full of the ads repeated down my aggregator page, at least I only see them each once.
With Slate's feed, on the other hand, entries I've already fetched change on every poll to the feed because they randomly rotate through and generate new ads every time I request the feed. So, in effect, nearly every entry in the feed ends up being "updated" when I request it.
This sucks. Please stop. Stop now. Feed entries are not the same as web pages.
Of course, part of this has to do with FeedSpool's approach to tracking whether an entry is new or has changed. That is, it takes an MD5 hash of the entry's XML source and uses that as a unique ID. Yes, this is lazy. But it's worked for me for a few years now in my other aggregators.
What I really should be doing in miniagg is watching the unique IDs on entries to determine freshness, when they're available. Which, thankfully, they are in Slate's feed. In other feeds, well, I'm not always so lucky, which is how the MD5 technique came about in the first place.
Nothing's ever nice & easy, is it? :)
As I'm sure you know, the hashing the content is the industry standard approach to tracking changes in feeds, because a huge percentage of feeds don't deliver valid unique ids and updated times. Hashing is the only thing that works.
I suppose you could support a suite of algorithms, and choose between them on a feed by feed basis, but ick.
Bloglines has a boolean option per-subscription that allows you to mark whether changed entries should be marked as new. Default is on, but some feeds benefit greatly from this; such as Alterslash and, I imagine, Slate.
True: feeds are supposed to be feeds of information and not mixed with ads. But, it's their sandbox. If you pick up someone's feed, you have to either play by their rules or play somewhere else. For people who use feeds in their aggregation client, this is regrettable, because they are using RSS feeds as a replacement for news sites and traditional media. When RSS feeds are used to fuel the content of other sites, it's a case of Scraper Beware. Before, site creators would have to manually pull together links from other sites and add that content to their own site. RSS feeds allow them to save a lot of time and effort. They will have to adopt to a WSE tool to load in feeds and keep them in a holding pattern; then purposefully allow or bar content, creating a release lag. Maybe the objectionability of ads will spur feed users to be more selective instead of bombarding us with everyone that someone else saw fit to add to an RSS feed.