Dataset
A subset of 400,000 documents of the MAREC dataset is available for download. These documents can be accessed after registering to the MATRIXWARE.NET community (free registration).
The MAREC 400.000 collection consists of 100.000 randomly picked patents from each sub-collection of the MAREC dataset (EPO, JPO, USPTO, WIPO). It is targeted at people submitting papers to the AsPIRe'10 workshop at the ECIR. Participants are encouraged to apply the techniques they develop to this dataset, where possible. This will allow the results of the presented techniques applied to the same dataset to be more easily comparable. Furthermore, the MAREC 400.000 collection will allow initial patent processing experiments to be done on a representative dataset of a reasonable size, before scaling these up to the 19 million patents of the MAREC collection.

