Toolman in the Toyshop of the World: The Semantic Web

Have you ever stopped to ask your self, What is the Semantic Web? Here I quote the introduction from the W3C working group

The vision of the Semantic Web is to extend principles of the Web from documents to data. This extension will allow to fulfill more of the Web’s potential, in that it will allow data to be shared effectively by wider communities, and to be processed automatically by tools as well as manually.

The Semantic Web allows two things.

It allows data to be surfaced in the form of real data, so that a program doesn’t have to strip the formatting and pictures and ads off a Web page and guess where the data on it is.
it allows people to write (or generate) files which explain—to a machine—the relationship between different sets of data. For example, one is able to make a “semantic link” between a database with a “zip-code” column and a form with a “zip” field that they actually mean the same – they are the same abstract concept. This allows machines to follow links and hence automatically integrate data from many different sources.

Semantic Web technologies can be used in a variety of application areas; for example: in data integration, whereby data in various locations and various formats can be integrated in one, seamless application; in resource discovery and classification to provide better, domain specific search engine capabilities; in cataloging for describing the content and content relationships available at a particular Web site, page, or digital library; by intelligent software agents to facilitate knowledge sharing and exchange; in content rating; in describing collections of pages that represent a single logical “document”; for describing intellectual property rights of Web pages (see, eg, the Creative Commons), and in many others.

(permalink)

So what we are talking about here is richer data, and ubiquitous or at least pervasive use of common or translatable markups. Technology is being used to collate data from different sources (data integration), to index for searches within specific domains (resource discovery, classification) etc..

RDF defines a technique for defining these, so is fundamental to Semantic Web as it exists now. It is a drafted standard and is in use. RDF data can be represented many formats, the most common being XML and RDF triples. The different formats suit different data types, and tools exists to translate. This makes disparate data collate-able in a common format, and is interchangeable.

RDF defines a triplet relationship for every association: subject, predicate, object. This means: this subject is related to this object by this predicate. Links are similar to HTML, but are labeled, and define relationships instead of jump links. There is no "current page"! Objects can be URIs or literals, subjects and prediactes must be URIs.

RDF can be referenced on an HTML page and in some senses it is like meta tags, but more structured as it is a shared resource with shared structure. The whole idea is for RDF to be shared by lots of systems by being publicly referenced.

The Semantic Web is an extension of the current Web and not its replacement. Islands of RDF and possibly related ontologies can be developed incrementally. Major application areas (like Health Care and Life Sciences) may choose to “locally” adopt Semantic Web technologies, and this can then spread over the Web in general. In other words, one should not think in terms of “rebuilding” the Web.

(permalink)

This kind of data structure is different to XML Schemas. They are specific to a busines transaction, protocol or similarRDF it is a knowledge representation, not a message format. As the RDF is shared it is reusable for reasoning by other systems to have knowledge outside of the systems ontological domain. Folksonomies or "tags" are not quite the same. They are unstructured and do not completely define the triplet relationship defined earlier.

Microformats, embedded in HTML, are very small datastructues in comparison and so have not used RDF. There are tools to bridge to RDF (GRRDL). To be involved in the semantic web you only need to provide RDF data access to your structures. This can be done "on the fly" and probably often is; most data not stored in RDF!

Web2 has the spirit of semantic web. The sharing of disparate data sources and web mash ups. Not all are implemented cleanly, but RDF can and is used in some (mainly larger, more structured) mash ups.

There are catalog websites lising millions or live RDF services.

OWL is a language that site on top of RDF, and gives more information about the ontology of the RDF items. It is a richer language to specify things such as cardinality and other characteristics of relationships. Very useful for knowledge representation, searching over multiple sources. It is rich enough to have classes and instances (individuals), and maps their relationships.

Those of you who read and understood some of it might be wondering why I'm writing about the Semantic Web. I accidentally put the words "Semantic web evangelist" on my CV when I really meant "Web standards evangelist" - oops! I do love the semantic web ideals, but haven't followed it rigorously or had a chance to build anything with it. Also, it was really late when I finished my CV...

XHTML, Web2.0 and micro formats are the thin edge of the semantic wedge, but they are in use and start the migration towards a richer web. I needed to refresh my mind as to some of the more technical aspects of the semantic web so I though I'd do a writeup so I had something to refer to. Some nerd out there might find it interesting (looking at you Sam).

I might point it out at interviews.

Toolman in the Toyshop of the World

Sunday, 3 February 2008

The Semantic Web

No comments:

About Me

Blog Archive