HTML to text conversion written in C with Python wrapper; a component of a full text search application being written of of Xappy Python interface to Xapian
Webstemmer is a web crawler and HTML layout analyzer. It extracts articles from news sites as plain text and removes banners, ads and/or navigation links automatically. You only need to give a URL of the top page of a site and it works in an almost fully automatic way with little human intervention.
Based on PyGreSQL but feels more solid, includes a bulkload() capability which likely uses pg's COPY command - very fast for large initial imports.
Prepared statement support, unlike psycopg2 - all in all pretty solid despite being a new offshoot.