Generation of a test collection based on citations

The data resulting from the prior art searches performed during a patenting process are available in the EPO or USPTO databases as:

  • citations in patent applications
  • citations in search report
  • citations in opposition’s legal files

While the citations provided by patent applicants are sometimes incomplete and could include irrelevant items, the citations in the European search report attempt to provide a complete picture of prior art. In addition to that, European search reports categorize cited documents as ”highly relevant” (X), ”relevant in conjunction with some other document” (Y), ”prior art” (A), etc.

Finally, opposition procedures provide an additional source of prior art. Whenever an opposition against a granted patent is filed on the basis of some newly found prior art, the documents that invalidate that patent are cited in the EPO legal documents’ database. These additional documents constitute the missing pieces in the prior art search.
We plan to construct a test collection that uses patents as topics. Given a topic/patent, the task will be to identify its prior art. Relevance assessments will be formed by merging the five lists of citations extracted from

  • the patent application
  • the European search report
  • documents mentioned by the applicant in the invention disclosure form
  • documents found by the USPTO
  • the opposition’s legal files

In order to maximize the number of relevance assessments for each topic, we will select as topics exactly those patents that have been opposed. These patents have high business value -and are therefore most likely to be equipped with thorough lists of prior art citations.

Opposition files can be searched in the publicly available database of the European Patent Offce http://www.epoline.org/

The methodology for constructing a test collection is described by Erik Graf (Univ. Glasgow) and in greater detail in the paper "A methodology for building a patent test collection for prior art search” by E. Graf and L. Azzopardi, published at the EVIA-2008 Workshop, NTCIR-7.

Methodology