Skip to Content

Image-based Classification Task

Images are an essential component of patents, as they illustrate key aspects of the invention. There are many different types of image in patents, including technical drawings, photos, flow charts, and graphs.

However, even though in many applications it is important to focus an analysis on a specific type of image, the annotation of the images according to the type in patents is in general either non-existant or poor with many errors.

The aim of this task is to automatically classify patent images according to type based on visual content. Manually classified and checked data is provided for training, and the long term aim is, based on these training data, to make it possible to reliably classify the millions of images in patents.

The classification is into 9 classes:

  • abstract drawing
  • graph
  • flow chart
  • gene sequence
  • program listing
  • symbol
  • chemical structure
  • table
  • mathematics

Training data with between 300 and 6,000 training images for each of these classes is provided (see description below). Only these data may be used to train image type classification techniques.

At a later stage, we will publish a test database of 1,000 images. For each of these images, participating groups are required to determine the type of image.

Training data

To obtain the training data, it is necessary to register and fill out the MAREC agreement. See the main CLEF-IP page for information on doing this. Access to the training data is provided once this is done.

The training data, organised into 9 directories - one for each class - contains the following number of training images per class:

Class

Class Number Abbreviation   # Training Images
 drawing  1  ad  5566
 chemical structures  2  cf  5958
 program listing  3  cp  5574
 gene sequence (dna)  4  dn  5983
 flow chart  5  ff  311
 graph  6  gr  1664
 math  7  mf  5950
 table  8  tb  5502
 character (symbol)  9  tx  1579

 

Test data

Test data will consist of 1,000 unclassified images. It will be released later.

Evaluation

Please note that it is not permitted to use any additional data for training and setup of the systems. If you need test data for system tuning, you need to split the available training data into a training and validation set. We will use equal error rate to evaluate the performance of the individual runs.

We will make the script that we will use for evaluation available soon.

CLEF-IP 2011 CFP

How To Register To CLEF-IP

Follow these steps to register to the Lab.

CLEF-IP Past and Present