Position Paper - Hendler

From Atoms to OWLs -- the new ecosystem of the Web

Jim Hendler

It is becoming clear to me that at this workshop we will end up spending some time (maybe too much) exploring the issues of the Semantic Web and (or vs) folksonomies, blogs, flickr and the like. The problem is that these things are being put in opposition to each other, which is ridiculous and misleading. The difference between the microformats or tech.no.rati and the Semantic Web "folksie" stuff (like FOAF) have more to do with syntax than semantics, and it is clear that keyword based approaches of text documents won't get us a whole lot further than Google searches. So I see these as non-competing approaches that complement each other well. However, I think there is a MUCH more important aspect of the future of the Web that is ignored in this discussion, and which I hope we will spend more time on -- how do we bring the rest of human knowledge, the stuff not yet on the Web, into the Web world? (In fact, I believe that Tim, Ora and I were really trying to push towards this when we wrote the widely cited "Semantic Web" article).

Tim B-L is often cited for saying (in various online sources and in an article published in Japan) for the data in our lives we are still pre-Web! and he is right -- for example, there's an estimate that the majority of all business knowledge in the world is encoded in Excel spreadsheets and another huge amount is in structured databases -- these can be seen on the web, as tables, but try searching for the information in them, or for what database has the info you are looking for, and you get almost no help. Even worse, try to cut and paste the info from some database into another, the way you can cut and paste (and link) text on the Web - it can't be done. So Tim is right, for data we are still pre-Web.

In fact, this is also true of the content in images and videos and other multimedia sources. One of my PhD students is working on the following challenge problems - being able to search and query video, on the web, to answer queries like "find the scene in the James Bond film where the guy throws his hat at the statue" or even harder "find a clip showing the Japanese response to the 9/11 attacks in the US." Without content-based metadata (and given a picture is worth a thousand words, it would be nice to do as much as possible without typing those thousands of words") I contend this cannot easily be done, and with it. For example, compare looking for NASA photos at The Semspace Semantic Web demo or the group photos at my group's home page and compare them with the work at Flickr. We have work to do in tools (lots!) to get them in front of users, but we're working on getting this stuff into the real content-creation world (yay Adobe and their RDF support) and there's a lot of room for hope.

So let me reply to Hal Abelson and Craig Knoblock in advance -- Craig argues for more use of statistics and text type work, I say great, but let's see it for databases and multimedia and more -- there's a lot more stuff that is NOT on the Web than stuff that is. To Hal, who argues it's 80s AI all over again, I'll argue that not only did we learn the lessons from the mistakes of the 80s, but far more importantly we have learned the lessons from the architecture of the Web -- and that's a lot more important!

So, in conclusion, my belief is that if we are, as a computing agenda, going to look at the future of the Web, the key is to spend a lot more time thinking about what is not on the Web (but could be) and not the mere few billions of pages on the current Web.