May 13, 2014
Recent conversations on how Apache Stanbol, an EC research project, could both support the BloggingPortal reboot and use it to expand its language coverage led me to conTEXT, which
“… allows to semantically analyze text corpora (such as blogs, RSS/Atom feeds, Facebook, G+, Twitter or SlideWiki.org decks) and provides novel ways for browsing and visualizing the results….”
Sounds interesting, right? While there’s zero information (apart from a dreadful video) on the site, you can get an idea by giving it a whirl, which is exactly what I did with BloggingPortal’s ‘best of’ feed (before the entire site died over the weekend). Explore the result yourself here, or watch this short slidecast:
As you can see, conText will automatically tag the content you give it, provide faceted search and all sorts of nice visualisations.
But it's not a 'plug and play' solution to improving content discovery in the EU online public sphere:
- conText will only grab the last items in a feed - we need to process 317000-and-counting posts just to handle the legacy;
- it only processes the content aggregated by BP - a script would be required to follow the link and semantically process the source content;
- there's no mention of an API;
- nor of multilingualism.
In fact, there's absolutely zero documentation on the conTEXT site, excepting an academic paper (pdf) hidden inside a popup. It's research for researchers.
Along with my last post about Stanbol, this shows is that the technologies are there ... but they're stuck in the lab, with noone seemingly interested in applying them to the real world. The EC and national governments seem happy to finance research, but then seem to lose interest when it comes to applying the results to benefit society.
A shame. Tools like this could become building blocks for genuinely useful public spaces and online communities, but before that happens someone will need to provide something non-technical people can use.Mathew Lowry