Skip to Content

Data Collections


The IRF provides a number of test data collections that have either been developed by the IRF, by one of its members or by third parties.


Access to the IRF data collections

The IRF data collections can be accessed by IRF members  for scientific experimentation and evaluation. 



ClueWeb09 is a 25 terabyte dataset of about 1 billion web pages crawled in January and February, 2009.


MAREC consists of 19 million patent documents in different languages, normalized to a highly specific XML format.