Text Mining for Intellectual Property
Overview
Basically, patents are long stretches of complicated technical text; for computers, they are very difficult to analyse thoroughly.
Text Mining for Intellectual Property (TM4IP) aims at providing a better means for modelling complex dependencies in patent texts and for searching patents using these dependencies.
Goals
The goal of this project is to generate linguistic resources for accurate dependency parsing of patent documents and to apply these resources in a new kind of search engine, which uses dependency triples as terms. The resulting system will allow for sophisticated searching of patents, using both thesaurus information and feedback from the index to achieve high precision and recall. Although the project focuses on the development of concrete tools and resources, it will also contribute to the state-of-the-art in natural language processing and information retrieval through research and publications.
Expected outcome for IP experts
- An IP search engine based on deep linguistic techniques and suitable for professional search in patent documents.
- Accurate re-useable linguistic resources (parsers and lexica) for the IP domain.
Timeline
The project started in 2008 and will run for 3 years
- End of 2009: parser and search engine prototypes
- End of 2010: beta versions
- End of 2011: final versions
Project Partners
- Radboud University Nijmegen, NL (Research & Development)
- Matrixware Information Services GmbH, AT (showcase, funding)
- Information Retrieval Facility, AT (data, infrastructure)
Links
Matixware.net/Text Mining (for more information about methods and findings, as well as publications and related works)
Contact
Text mining can be applied to various aspects of information management/retrieval. The IRF can provide you with more details about how text mining can help in addressing your concrete needs. Please send your inquiry to: science@ir-facility.org.

