Project KHRESMOI: Challenges in ehealth search

Published Sep 13, 2010 by Katja Mayer, Henning Müller

Started on 1 September 2010, the project Khresmoi aims to develop a multi-lingual multi-modal search and access system for biomedical information and documents. met with Henning Müller, the project coordinator (University of Applied Sciences, Western Switzerland) to talk about objectives and challenges of the project.


Project PLuTO: making the lives of patent users easier

Published Aug 31, 2010 by Katja Mayer, Aalt Van De Kuilen

The EU research project PLuTO (EU-PSP-ICT) aims to overcome language barriers in the patent domain by providing an integrated, online translation tool, where several human experts (technical, legal, consultants) can take advantage of existing web-content and state-of-the-art, data-driven machine translation and information retrieval tools to collaboratively retrieve and translate patents. Patent information specialist Aalt van de Kuilen talks about the participation of the WON user group (NL) in such a large scale project and his expectations from the involvement of the European Patent Office.


Scoring and Ranking Techniques - tf-idf term weighting and cosine similarity

Published Mar 31, 2010 by Michael Dittenbach

What mechanisms determine which documents are retrieved and how is the relevance score calculated that finally determines the ranking?


Hot Topic: Cloud computing

Published Mar 12, 2010 by Michael Kohl

"Nature is a mutable cloud which is always and never the same." - Ralph Waldo Emerson. Like nature in Emerson's quote, information technology too is an ever-changing topic. To deal with the constant flux of demands and challenges, IT experts over the years came up with various strategies. In the process many holy grails have been found, only to be discarded again shortly thereafter. But no matter how resistant to these fads your company usually is, chances are that cloud computing will be on your agenda in some form in 2010.


Anatomy of the ASPIRE'10 Collection

Published Mar 9, 2010 by Veronika Zenz

In this article we present an analysis of the anatomy of the ASPIRE patent corpus. We present the number of unique terms in this corpus, the average term frequency, the distribution of unique terms over document frequencies and an analysis over different term types. Those statistics are then compared to the New York Times Annotated Corpus, a collection of news-paper articles, and the differences between the two corpora are highlighted.


Machine translation – 3 buzzwords, Mr. Spock and the Babel fish

Published Mar 4, 2010 by Andreas Tuerk

Machine translation is one of many technologies that are heavily used in Sci-Fi stories. While we are still waiting for some of these fictive technologies to come to fruition, machine translation has made steady progress over the last decades. In many cases the performance of today's machine translation might still pale in comparison to the Babel fish or the Universal Translator in Star Trek, but it can still be of great value in many situations


Free Patent Information ... Sustainable Sources for Information Professionals?

Published Jan 28, 2010 by Thomas E Wolff

The allure of copyright-free patent data has resulted in development of countless free patent databases on the world wide web. What are the business models for the commercial patent information providers?


Beyond the document

Published Jan 11, 2010 by Erik Graf

What is a document? Ordinarily the word 'document' describes a textual record, a writing conveying information. In this sense we quite naturally first and foremost think and perceive of documents as sheets of paper covered with sequences of characters, static self-contained entities as encountered by us in daily life in the form of scientific papers, tax records, blog entries, and shopping lists. Based on an exploration of research in Information Retrieval this article wants to shed light on a steadily advancing trend that has blurred the boundaries of documents, slowly redefines their perception, and keeps Information scientists busy trying to clarify the very question raised at the beginning of this article: What is a document?