Hmm... now that I finally stopped babbling and read the docs, I just noticed that the Google APIs has methods to access their cache.

Sounds like I need to write a personal HTTP proxy that includes "404 Correction" by consulting Google's cache whenever one encounters a 404. Could be a new project, too, since someone I was talking to wanted searchable personal web browsing history and I think a personal HTTP proxy could help with that.


Archived Comments

  • The problem is that it wouldn't help the most common case, which is where someone like Time mag or the NYTimes moves their archives into a Paid category. I never find those things in Google. (But I haven't looked that carefully, so I could be wrong...) (Hmm, I wonder whether the Wayback machine holds such things? Quick check shows the NYTimes blocks robots from content.) (Hmm, I wonder if the Wayback Machine has an API?)