"There's lots of useful data on the internet - crime statistics, government spending, missing kittens. But getting at it isn't always easy. There's a table here, a report there, web pages, PDFs, spreadsheets... And it can be scattered over thousands of different places on the web, making it hard to see the whole picture and the story behind it...ScraperWiki is an online tool to make that process simpler and more collaborative. Anyone can write a screen scraper using the online editor, and the code and data are shared with the world. Because it's a wiki, other programmers can contribute to and improve the code. And, if you're not a programmer yourself, you can request a scraper or ask the ScraperWiki team to write one for you."
"Imagine the sensory overload of a walk in the park. Every path shimmers with the flow of humanity. Every person drips with the scent of information: experience, opinion, karma, contacts. Every tree has a story: taxonomies and ontologies form bright lattices of logic. Desire lines flicker with unthinkable complexity in this consensual hallucination of space and non-space, a delicious yet overwhelming sociosemantic experience."
Webstemmer is a web crawler and HTML layout analyzer. It extracts articles from news sites as plain text and removes banners, ads and/or navigation links automatically. You only need to give a URL of the top page of a site and it works in an almost fully automatic way with little human intervention.