Mathew Lowry

Recent conversations on how Apache Stanbol, an EC research project, could both support the BloggingPortal reboot and use it to expand its language coverage led me to conTEXT, which

“… allows to semantically analyze text corpora (such as blogs, RSS/Atom feeds, Facebook, G+, Twitter or SlideWiki.org decks) and provides novel ways for browsing and visualizing the results….”

Sounds interesting, right? While there’s zero information (apart from a dreadful video) on the site, you can get an idea by giving it a whirl, which is exactly what I did with BloggingPortal’s ‘best of’ feed (before the entire site died over the weekend). Explore the result yourself here, or watch this short slidecast:

As you can see, conText will automatically tag the content you give it, provide faceted search and all sorts of nice visualisations.

But it's not a 'plug and play' solution to improving content discovery in the EU online public sphere:

  • conText will only grab the last items in a feed - we need to process 317000-and-counting posts just to handle the legacy;
  • it only processes the content aggregated by BP - a script would be required to follow the link and semantically process the source content;
  • there's no mention of an API;
  • nor of multilingualism.

In fact, there's absolutely zero documentation on the conTEXT site, excepting an academic paper (pdf) hidden inside a popup. It's research for researchers.

Along with my last post about Stanbol, this shows is that the technologies are there ... but they're stuck in the lab, with noone seemingly interested in applying them to the real world. The EC and national governments seem happy to finance research, but then seem to lose interest when it comes to applying the results to benefit society.

A shame. Tools like this could become building blocks for genuinely useful public spaces and online communities, but before that happens someone will need to provide something non-technical people can use.

Tweet about this on TwitterShare on Facebook0Share on Google+0Share on LinkedIn1
Author :
Print

Comments

  1. Hi Mathew,
    Thanks for your interest in conTEXT. Actually, we are still working on conTEXT to improve its scalability and performance, and to make it more flexible (e.g. supporting APIs for third-parties who are interested to use conTEXT as a service). Indeed, the Bloggingportal as you described could be a nice demonstrator for conTEXT and we are interested to collaborate on that.
    What I would suggest as a starting point is that: we can import a sample amount of posts (e.g. <2000 posts to ensure scalability) from Bloggingportal into conTEXT, analyze it and see the results. If the resulting insights are of our/your interests, then we can think of the next steps.

    With regards to the multilinguality, we are using DBpedia Spotlight (http://dbpedia-spotlight.github.io/demo/) as our NLP engine. Spotlight currently supports 9 Eu languages. If we detect the blog language correctly, we can then forward it to the right DBpedia API for analysis.

    p.s. I just have to add: There have been many people from both academia & industry who were interested to apply conTEXT and we are already in collaboration with them. So, It is not only research for research! It is basically an in-use research…

  2. Hi Ali, many thanks for your comment and suggestion!

    One of my fellow editors just got http://bloggingportal.eu back onto its feet. Can you grab the BloggingPortal posts direct from this RSS feed, or do you need a database dump?

    And will you be grabbing each curated resource – i.e., each URL that each BP post points to – or are you just processing the BP posts themselves?
    As I mentioned, the posts on BP only contain the first few lines of each curated resource. An analysis will probably only make sense if your engine follows each link and processes the resources which each BP post points to.

    Looking forward to exploring this with you. And I hope to learn more about other applications in academia and industry. My point really only applied to the lack of information on the site itself ūüėČ

Leave a Reply