Skip to content. | Skip to navigation

Personal tools
Sections
Home  /  Research  /  Evaluation Tracks  /  CLEF-IP '09  /  Overview

Overview

Evaluating Information Retrieval techniques in the Intellectual Property domain.

The CLEF-IP track was launched in 2009 to investigate IR techniques for patent retrieval. It is part of the CLEF 2009 evaluation campaign.

The track utilizes a data collection of more than 1M patent documents derived from EPO sources, covering English, French, and German patents with at least 100,000 documents in each language.

There are two kinds of tasks in the track:

  • The main task is to find patent documents that constitute prior art to a given patent.
  • Three facultative subtasks that use parallel monolingual queries in English, German, and French. The goal of these subtasks is to evaluate the impact of language on retrieval effectiveness.

 

 Jump to

     People involved in this evaluation track
     Track's timeline
     Relevant bibliography
     Citation based test collections

 

 

 

Co-ordinators:

  • John Tait, Information Retrieval Facility
  • Giovanna Roda, Matrixware

 

Advisory board:

  • Gianni Amati (Fondazione Ugo Bordoni - IT)
  • Atsushi Fujii (University of Tsukuba - JP)
  • Makoto Iwayama (Tokyo Institute of Technology - JP)
  • Kalervo Jarvelin (University of Tampere - FI)
  • Noriko Kando – honorary advisor - (Keio University - JP)
  • Mark Sanderson (University of Sheffield -UK)
  • Henk Thomas (IP services - NL)
  • Christa Womser-Hacker (University of Hildesheim - DE)

 

Staff

  • Florina Piroi, Information Retrieval Facility
  • Giovanna Roda, Matrixware
  • Veronika Zenz, Matrixware

 back to top

 

Schedule for the CLEF-IP 2009 track

 

Event

Planned Date

Data Release (together with a small
training set of topics)

02.03.2009

Topics Release

28.04.2009 (was 30.3.20009)

Submission of Runs by Participants

15.06.2009 (was 01.06.2009)

Release of Topics for Patent Experts

22.06.2009 (was 02.06.2009)

Evaluation Results available

13.07.2009

Submission of manual relevance assessment
by patent experts

23.08.2009

Submission of Paper for Working Notes

23.08.2009

Workshop 30.09.- 02.10.2009

back to top

 

Relevant Material

 

CLEFIP @ CiteULike - a comprehensive collaborative collection of literature related to CLEF-IP
Knowledge Base - You can find here a tutorial on "IR 4 IP" and corresponding glossaries for IP and IR
CLEF - Campaign - Main Website of the Cross Language Evaluation Forum

 

Selected Resources

 

J. Michel. Considerations, challenges and methodologies for implementing best practices in patent office and like patent information departments. World Patent Information 28:132-135, 2006.

Sougata Mukherjea, Bhuvan Bamba. BioPatentMiner: an information retrieval system for biomedical patents. In VLDB '04: Proceedings of the Thirtieth international conference on Very large data bases pp. 1066-1077, 2004.

Jae-Ho Kim, Key-Sun Choi. Patent document categorization based on semantic structural information. Information Processing & Management 43, 2007.

T. Takaki, A. Fuji, T. Ishikawa. Associative Document Retrieval by Query Subtopic Analysis and its Application to Invalidity Patent Search. In Proc. of CIKM , 2004.

H. Mase, T. Matsubayashi, Y. Ogawa, M. Iwayama, T. Oshio. Proposal for Two-Stage Patent Retrieval Method Considering the Claim Structure. ACM Transactions on Asian Language Information Processing 4(2), 2005.

 

  back to top

 

Generation of a test collection based on citations

 

The data resulting from the prior art searches performed during a patenting process are available in the EPO or USPTO databases as:

  • citations in patent applications
  • citations in search report
  • citations in opposition’s legal files

While the citations provided by patent applicants are sometimes incomplete and could include irrelevant items, the citations in the European search report attempt to provide a complete picture of prior art. In addition to that, European search reports categorize cited documents as ”highly relevant” (X), ”relevant in conjunction with some other document” (Y), ”prior art” (A), etc.

Finally, opposition procedures provide an additional source of prior art. Whenever an opposition against a granted patent is filed on the basis of some newly found prior art, the documents that invalidate that patent are cited in the EPO legal documents’ database. These additional documents constitute the missing pieces in the prior art search.
We plan to construct a test collection that uses patents as topics. Given a topic/patent, the task will be to identify its prior art. Relevance assessments will be formed by merging the five lists of citations extracted from

  1. the patent application
  2. the European search report
  3. documents mentioned by the applicant in the invention disclosure form
  4. documents found by the USPTO
  5. the opposition’s legal files

 

In order to maximize the number of relevance assessments for each topic, we will select as topics exactly those patents that have been opposed. These patents have high business value -and are therefore most likely to be equipped with thorough lists of prior art citations.

Opposition files can be searched in the publicly available database of the European Patent Offce http://www.epoline.org/

The methodology for constructing a test collection below is described by Erik Graf (Univ. Glasgow) and in greater detail in the paper "A methodology for building a patent test collection for prior art search” by E. Graf and L. Azzopardi, published at the EVIA-2008 Workshop, NTCIR-7.

 

 back to top

 

IRF Conference

The 1st Information Retrieval Facility Conference provides a multi-disciplinary, scientific forum for researchers and aims at bringing young researchers into contact with industry at an early stage. The conference focuses on large scale research projects. read more
 

MAREC

IRF Scientific Members now have access to the first standardised patent data corpus for research purposes. read more