Prior Art Candidates Search Task

The Prior Art Candidate Search Task (PAC) consists in finding parent documents in the target collection that may invalidate a given patent application. In intellectual property language: find documents that may constitute prior art for a topic patent. The topic document is a patent application document, A1 or A2, where the citation information was removed. This years topics are more equally distributed over the different languages, with more than 1000 topics for English, French and German each. A PAC topic file contains a concatenation of topics. The structure of one topic is as follows:

   <narr>Find all patents in the collection that
         potentially invalidate patent application

where <num> contains the unique topic identifier consisting of the patent number, which itself contains a country code (always EP in this data set), a seven-digit number and the kind code (A1, A2). The <file> tag contains the name of the XML file.

Training Set

Similar to last year, we have released a set of topics, together with their relevances for training purposes. The participants can use this set to train and tune their systems. The CLEF-IP 2011 training set contains documents and relevance assessments for 300 topics similar to the PAC search task. The relevance assessments are automatically extracted from patent citation information, and have the following format:

EP-0000001-A1 EP-7654321 1
EP-0000001-A1 EP-7654322 2
EP-0000001-A1 EP-7654323 1

where the first column contains the topic number, the second column contains
the relevant patent as identified by the country and patent-number, and the last
column is the relevance degree (2 being more relevant than 1). We did not
compile a training set for the classification task as participants can use the
whole target data of the PAC task to train their classifiers.

