Scraping with web services: Success
Okay, so I took another shot at scraping HTML with web services with another site that passes the HTML Tidy step. Luckily, this is a site that I already scrape using my own tool, so I have XPath expressions already cooked up to dig out info for RSS items. So, here are the vitals:
- Site: http://www.jlist.com
- XSL: http://www.decafbad.com/jlist.xsl
- Tidy URL: http://cgi.w3.org/cgi-bin/tidy?
- Final URL: http://www.w3.org/2000/06/webdata/xslt?
xslfile=http%3A%2F%2Fwww.decafbad.com%2Fjlist.xsl&
xmlfile=http%3A%2F%2Fcgi.w3.org%2Fcgi-bin%2Ftidy%3F
docAddr%3Dhttp%253A%252F%252Fwww.jlist.com%252FUPDATES%252FPG%252F365%252F&
transform=Submit
Unfortunately, although it looks okay to me, this feed doesn’t validate yet, but I’m still poking around with it to get things straight. Feel free to help me out! :)
shortname=rss_scrape_urls2