Semantic data extractor (HTML webpage data miner from W3.org)

Started by Darren Dirt, August 29, 2006, 10:41:18 AM

Previous topic - Next topic

Darren Dirt

http://www.w3.org/2003/12/semantic-extractor

Quote
Semantic data extractor
This tool, implemented using an XSLT stylesheet, tries to extract some information from a HTML semantic rich document. It only uses informations available through a good usage of the semantics defined in HTML.

The aim is to show that providing a semantically rich HTML gives much more value to your code: using a semantically rich HTML code allows a better use of CSS, makes your HTML intelligible to a wider range of user agents (especially search engines bots).

As an aside, it can give clues to user agents developers on some hooks that could be interesting to add in their product.

examples I tried:

itself

Mozilla.com

a typical page on "Ropin' The Web"

"http://en.wikipedia.org/wiki/Semantic_web" (strangely, it seems that Wikipedia uses "HTML tidy service" when serving out ANY page!?)

_____________________

Strive for progress. Not perfection.
_____________________