I don't know why you want to do this, but if its a regular thing where you want to extract some information from a webpage at regular intervals, then a better choice might be Python coupled with something like Beautiful Soup. Its pretty easy to use even if you don't know Python (it took me about 15 minutes or so to parse a webpage in the exact way that I wanted and I've never use python before). You can tell it (copied from the website) "Find all the links", or "Find all the links of class externalLink", or "Find all the links whose urls match "foo.com", or "Find the table heading that's got bold text, then give me that text.". I find it perfect for screen scraping. It also doesn't choke on invalid XML.




Reply With Quote
Bookmarks