SoupScraperalpha

intro

this is a scraper webservice written in python for google's appengine using BeautifulSoup and soupselect.py.

url scheme

scheme: [scrape-requests]: [scrape-request]: [key]: full example url: /scrape/{items%5B%5D:table.sortable%20tr%20td%20a{title:~title,href:~href}}?url=http://de.wikipedia.org/wiki/Hamburg&callback=jsonpCallback

examples

get a list of links from a rolling stone article
/scrape/{
	links[]: div.brief-post-text a { 
		title: {TEXT},
		href: ~href
	}
}?url=http://www.rollingstone.com/rockdaily/index.php/2008/12/08/
remembering-dimebag-darrell-abbott-on-the-anniversary-of-his-death/
a lot about madonna
/scrape/{
	artist: div#view div#content {
		title: h1 {TEXT},
		bio: div#artist_bio {HTML},
		image: div.portrait img{
			src: ~src,
			width: ~width,
			height: ~height
		}
	},
	similars[]: div.rgt div.tpbox li {
		title: span.title {TEXT}
	}
}?url=http://uk.real.com/music/artist/Madonna/
get similar artists
/scrape/{
	links[]: div.tpbox ul li span.title a {
		title: {TEXT},
		href:~href
	}
}?url=http://uk.real.com/music/artist/Madonna/
get hamburgs bezirke from wikipedia
/scrape/{
	bezirke[]: table.sortable tr td a {
		title:~title
	}
}?url=http://de.wikipedia.org/wiki/Hamburg

todo

used libs

author

(c) 2008 by mathias leppich
muhqu.de