Discover how well-established information retrieval methods can optimize the management of large amounts of data within your organization.
Common IR techniques used by universities and libraries can be implemented in a company environment where huge quantities of unstructured information accumulate every day: Different studies show that, on average, an employee spends about one hour every day looking for the right information in emails, internal memos, presentations, texts, calculations or distinct databases -unproductive time that can be reduced by indexing, annotating and/or classifying the documents.
Domain-specific IR applications have been specifically developed to retrieve complex documents, such as patent and legal documentation, scientific and technical documents, or chemical structures. Current research projects in the fields of image retrieval, machine translation and information visualization address sophisticated retrieval challenges and build the basis for the development of customized solutions.
Most frequently, the implementation of existing academic tools combined with well-established open source components will deliver a quick, but sustainable increase in the efficiency of your IR processes. More specific retrieval challenges will require the set up of a dedicated R&D project. In both cases, the IRF will assist you with the necessary IR skills and resources to take the right decisions.
The introduction of appropriate IR techniques can also provide huge contributions in speeding up your R&D, increasing the efficiency of your HR processes by introducing new ways to identify internal experts, or improving your sales and marketing processes with opinion mining. Please contact us to find out more!
The purpose of creating an index is to optimize the speed and the performance of finding relevant documents. Without an index, the search engine scans every document in the corpus, which requires considerable time and computing power. While an index of 10,000 documents, for example, can be queried within milliseconds, a sequential scan of every word in 10,000 large documents could take hours.
By creating appropriate indices, we improve the accessibility of different kinds of data within your entire organization, which increases the productivity of your staff.
Annotation, or tagging, is about attaching names, attributes, comments, descriptions etc. to a document or to a selected part in a text. It provides additional information (metadata) about an existing piece of data. Compared to tagging, which speeds up searching and helps you find relevant and precise information, Semantic Annotation goes one level deeper: It enriches the unstructured or semi-structured data with a context that is further linked to the structured knowledge of a domain and it allows results that are not explicitly related to the original search.
The implementation of automatic and semi-automatic annotation techniques will raise the data quality for the whole organization, give more meaning to existing documents and allow generating new connections between them.
Document classification (or categorization) consists in assigning an electronic document to one or more categories, based on its contents. We are developing a novel categorization method for documents that combines supervised machine learning and ontological reasoning, which facilitates further processing and future searches.
The implementation of classification techniques will not only raise the accessibility of data within your organization, it will also reduce the overhead costs of information management by automating content access.
Automated information extraction enables search without language barriers, extracts relevant information from unstructured text and makes it available in the form of a database that bests suits your needs.
Image Mining is one of the most challenging research areas of Information Retrieval, from an academic point of view but also because of the economic implications: Manufacturing, chemical and pharmaceutical industries have to manage enormous quantities of images, from chemical formulae to technical drawings, and depend heavily on how rapidly they are able to find the relevant images – for example in a patent filing process.
The IRF is developing a technique that enables efficient image searches (even when using only part of an image) for a variety of technical drawings, such as flow charts, block diagrams, time charts and graph plots.
Statistical Machine Translation (SMT) engines generate translations on the basis of statistical models derived from the analysis of bilingual text corpora. Although far from replacing human translations, SMT offers means to locate and identify potential strategic information in foreign language documents.
The IRF Chinese-to-English translation engine has been trained with more than 4 million bilingual aligned sentences obtained from human-translated patents. It can be open to other domains and trained with other languages.
One of our projects aims at translating extensive information search tasks that rely partly on complicated search algorithms in an easy-to-use graphical interface based on the workflow paradigm. One of the benefits is the reusability of complicated search patterns by less experienced professionals.
How to work with IR Experts
The IR techniques described here can be adapted to your environment and implemented within your organization by leading IR scientists under the guidance of the IRF as an independent project coordinator: See R&D Projects for more details.