Looks like the Semantic Web hurts Russell Beattie's branes. Hurts mine too. But, I tried explaining what I think I understand in a comment on his blog and I figure it's worth reposting here for ridicule and correction:

Did you happen to catch Tim Berners-Lee on NPR Science Friday today? Not sure if you get the broadcast there, or listen to the stream. He was expounding on the Semantic Web a bit.

Maybe I'll take a shot at explaining, since I think I understand the idea. Likely I'll fail miserably, but here goes.

First simple thing: Look at your weblog page. What would it take to extract the list of people from your blogroll, just given your URL? What about the titles of all the weblog posts on that page?

You, personally, can extract that information very easily since you, as a learned human, grasp the semantics of the page quite quickly. (The semantics are, basically, what's what and what's where and what does it all describe.)

Imagine a document containing exactly all of the same info your weblog page presents - only the data is completely, easily accessible to a robot in a universal, easily handled format.

Furthermore, imagine that the schema describing the data to be found on your page is in that same format. And then, imagine that the document describing the construction of schema is in that same format. And then imagine that the decomposition continues, all of the way down to base data types and relationships. Eventually, the whole thing rests on the back of a turtle - er I mean a sort of universal client.

Now, what if every single page on the web were available in this manner? No scraping, no regex, no tricks. I could use the entire web as database and execute queries that draw from data available on a myriad of disparate URLs. My client can figure out what to do with whatever it finds at a URL by chasing descriptions and meta-descriptions until it reaches the level of understanding implemented in the client.

Going out on a limb here, but imagine a practical example: "Hello computer, find me 2 roundtrip tickets for 7 days anytime in the next 10 weeks, for under US$300 each, to a vacation spot where the weather this time of year is usually warm and sunny, the exchange rate is better than 3 to 1 US dollar, and is rated as better than average by Ann Arbor, MI bloggers."

Assume my semantic web client knows some URLs to airlines, to international weather services, to exchange rates, and to vacation spot reviews in weblogs in Ann Arbor, MI. Assume that there are schema available for the things these URLs describe. Assume that my semantic web client can parse my natural language query.

So, it takes my request, goes out and snags the URLs appropriate to the various topics involved. Once it has all it need to process the data in each URL, it can find me the answer to my query, based on data pulled from all over the place.

Now, get nuttier and bring in some intelligence with robots that can do some inference and reasoning. Say I throw out some facts: Mammals breathe oxygen. Men are mammals. Joe is a man. With the right client, the query "Give me all oxygen breathers," will include Joe in its results.

Whew. There. That's what I think I understand about the Semantic Web.


Archived Comments

  • Yope! This explanation should make the concept pretty clear for everyone... But there's one little thing I wonder about: you assume so much about what your semantic web client can do, that you could actually go on assuming it can grasp the semantics of any web page without any further tagging, couldn't you? just kidding :-))
  • For an extension of the semantic web to word meanings you may want to look at: http://jorl.com/inventions/num/index.htm Probably the best, first use of this idea would be to add a duplicate metatag in webpages, for keywords, giving the "disambiguating" "number-words" corresponding to the keywords, and helping everybody get the search results they want. No more finding out about Mercury the car, planet, and God when you wanted to know whether your teeth were poisoning you...