June 9, 2009
I stumbled upon a short video on the BBC of Tim Berners-Lee trying to explain the importance of the data web, aka semantic web, again. He himself says that he can’t say where it will lead us, as it is paradigm changing. True – but I can think of a few applications that anyone interested in the EU should know about.
I can’t embed the video here so you’ll just have to visit it.
I’ve seen him and others try to explain this many times, live and on video, but I’ve yet to see anyone really capture and convey the semantic web in a few short sentences. This is a problem, because the semantic web has a much greater potential impact on today’s Web than today’s web has had so far.
Why is it so hard to explain? In the video, he says that the web as we know it today would have been difficult to explain in the 1980s, but I disagree – the paper publishing metaphor would have worked as an on-ramp for almost everybody (“it’s like you publish a book, but it’s free, and online, and you can link them“) with many people able to grok the significance of online databases (already existing), search engines (“kinda like being able to search all the card catalogues that you find at your local library from your PC, but searching a lot more“) and so on.
But there are no available metaphors for the semantic web, so it’s harder to ‘get’ without being a techie.
So what is the Semantic Web?
With the semantic web, the computing infrastructure – your PC, the websites you use – ‘understand’ the data it’s processing, and so can take data from across the Web and process it for you, extracting meaning you couldn’t have gotten yourself without a hell of a lot of work.
This data has to be ‘semantically encoded’, following now well-developed standards. If everything was published according to these standards, it would essentially turn the entire Web into one sophisticated database, rather than the collection of pages, stand-alone databases and so on it is now.
I put ‘understand’ in quotes because we’re not talking artificial intelligence. Instead, information is encoded according to the publisher’s ontologies (like a category list); the ontologies themselves are published on the web; and they can be linked together, including across linguistic frontiers.
So what? Can you give me an example?
Well, look at it this way:
Q: When you look for information to answer your question today, what do you do?
A: I type keywords relevant to my question into Google.
Q: What do you get back?
A: A list of websites that might have the answer(s) to my question, because they feature the keywords.
Q: Then what?
A: I visit the sites and try to find the answer. I usually find myself copying numbers into an excel file if I really need to process the information, because it’s all just text.
With the semantic web, you ask your question, and the Google-equivalent ‘reads’ the semantic web, compiles and compares the data from various sources, and gives you an answer.
If this looks like a gross simplification that’s because it is. The business information processing is quite complex – but then so is the Google algorithm. Given the data, the applications would follow. But the data won’t come without the applications to process for them, giving us a classic chicken-and-egg problem.
The SW and the EU
I saw TBL try to explain the potential of the semantic web to various people in the EU institutions a few years ago, but noone in his audience were IT engineers, so I don’t think many grasped what this could mean for most EU policies.
Can you imagine what it could mean for policymakers if they could quickly find out who was doing what across the EU in research, environmental protection, social policy, and a hundred other fields, and then process and query this information as easily as they use Google?
Currently, this sort of information is painfully and slowly extracted out of national and regional bodies by armies of consultants, brigades of steering committees and armoured divisions of task forces. Attempts to standardise data formats – e.g., “all EU countries must publish their research data following these categories on this website” – consistently fail, because each country categorises their information their own way.
But if national data was published semantically, countries could still publish it as they see fit – but the semantic web could still bring it together as quickly as Google collects pages for you to read. More time could be spent on analysis, and less on collection. The structuring effect on everything from European research to the single market could be profound.
The European Commission, moreover, has both the most to gain, and is in the best position (theoretically) to prime the pump and overcome the chicken and egg problem.
Further reading: some semantic web bookmarksMathew Lowry