Skip to content. | Skip to navigation

Personal tools
Sections
Home  /  Research  /  Data Collections

Data Collections

The IRF provides a number of test data collections that have either been developed by the IRF, by one of its members or by third parties. These data collections can be used freely for scientific experimentations.


 

ClueWeb09

ClueWeb09 is a 25 terabyte dataset of about 1 billion web pages crawled in January and February, 2009.
read more

MAtrixware REsearch Collection

MAREC consists of 19 million patent documents in different languages, normalised to a highly specific XML format developed by Matrixware for the IRF.
read more