Overview
Evaluating Information Retrieval techniques in the Intellectual Property domain.
The CLEF-IP track was launched in 2009 to investigate IR techniques for patent retrieval. It is part of the CLEF 2009 evaluation campaign.
The track utilizes a data collection of more than 1M patent documents derived from EPO sources, covering English, French, and German patents with at least 100,000 documents in each language.
There are two kinds of tasks in the track:
- The main task is to find patent documents that constitute prior art to a given patent.
- Three facultative subtasks that use parallel monolingual queries in English, German, and French. The goal of these subtasks is to evaluate the impact of language on retrieval effectiveness.
Jump to
People involved in this evaluation track
Track's timeline
Relevant bibliography
Citation based test collections
Co-ordinators:
- John Tait, Information Retrieval Facility
- Giovanna Roda, Matrixware
Advisory board:
- Gianni Amati (Fondazione Ugo Bordoni - IT)
- Atsushi Fujii (University of Tsukuba - JP)
- Makoto Iwayama (Tokyo Institute of Technology - JP)
- Kalervo Jarvelin (University of Tampere - FI)
- Noriko Kando – honorary advisor - (Keio University - JP)
- Mark Sanderson (University of Sheffield -UK)
- Henk Thomas (IP services - NL)
- Christa Womser-Hacker (University of Hildesheim - DE)
Staff
- Florina Piroi, Information Retrieval Facility
- Giovanna Roda, Matrixware
- Veronika Zenz, Matrixware
Schedule for the CLEF-IP 2009 track
Event | Planned Date |
|---|---|
Data Release (together with a small | 02.03.2009 |
Topics Release | 28.04.2009 (was 30.3.20009) |
Submission of Runs by Participants | 15.06.2009 (was 01.06.2009) |
Release of Topics for Patent Experts | 22.06.2009 (was 02.06.2009) |
Evaluation Results available | 13.07.2009 |
Submission of manual relevance assessment | 23.08.2009 |
Submission of Paper for Working Notes | 23.08.2009 |
| Workshop | 30.09.- 02.10.2009 |
Relevant Material
CLEFIP @ CiteULike - a comprehensive collaborative collection of literature related to CLEF-IP
Knowledge Base - You can find here a tutorial on "IR 4 IP" and corresponding glossaries for IP and IR
CLEF - Campaign - Main Website of the Cross Language Evaluation Forum
Selected Resources
J. Michel. Considerations, challenges and methodologies for implementing best practices in patent office and like patent information departments. World Patent Information 28:132-135, 2006.
Sougata Mukherjea, Bhuvan Bamba. BioPatentMiner: an information retrieval system for biomedical patents. In VLDB '04: Proceedings of the Thirtieth international conference on Very large data bases pp. 1066-1077, 2004.
Jae-Ho Kim, Key-Sun Choi. Patent document categorization based on semantic structural information. Information Processing & Management 43, 2007.
T. Takaki, A. Fuji, T. Ishikawa. Associative Document Retrieval by Query Subtopic Analysis and its Application to Invalidity Patent Search. In Proc. of CIKM , 2004.
H. Mase, T. Matsubayashi, Y. Ogawa, M. Iwayama, T. Oshio. Proposal for Two-Stage Patent Retrieval Method Considering the Claim Structure. ACM Transactions on Asian Language Information Processing 4(2), 2005.
Generation of a test collection based on citations
The data resulting from the prior art searches performed during a patenting process are available in the EPO or USPTO databases as:
- citations in patent applications
- citations in search report
- citations in opposition’s legal files
While the citations provided by patent applicants are sometimes incomplete and could include irrelevant items, the citations in the European search report attempt to provide a complete picture of prior art. In addition to that, European search reports categorize cited documents as ”highly relevant” (X), ”relevant in conjunction with some other document” (Y), ”prior art” (A), etc.
Finally, opposition procedures provide an additional source of prior art. Whenever an opposition against a granted patent is filed on the basis of some newly found prior art, the documents that invalidate that patent are cited in the EPO legal documents’ database. These additional documents constitute the missing pieces in the prior art search.
We plan to construct a test collection that uses patents as topics. Given a topic/patent, the task will be to identify its prior art. Relevance assessments will be formed by merging the five lists of citations extracted from
- the patent application
- the European search report
- documents mentioned by the applicant in the invention disclosure form
- documents found by the USPTO
- the opposition’s legal files
In order to maximize the number of relevance assessments for each topic, we will select as topics exactly those patents that have been opposed. These patents have high business value -and are therefore most likely to be equipped with thorough lists of prior art citations.
Opposition files can be searched in the publicly available database of the European Patent Offce http://www.epoline.org/
The methodology for constructing a test collection below is described by Erik Graf (Univ. Glasgow) and in greater detail in the paper "A methodology for building a patent test collection for prior art search” by E. Graf and L. Azzopardi, published at the EVIA-2008 Workshop, NTCIR-7.

