TWINC - To WIN Chemathlon

TWINC (To WIN Chemathlon) is a user-friendly interactive web-based platform dedicated to patent search in the domain of chemistry and life sciences. It has been developed by the BiTeM Group (Bibliomics and Text Mining) within the context of the PatOlympics.

TWINC is based on a pipeline of search engines developed during several International Patent & Retrieval competitions (TREC Genomics, CLEF Intellectual Property, TREC Chemistry, NTCIR Patent retrieval), enhanced with several features:


  • Automatic IPC categorization ( with up to 8 digits (IPC Subgroups, Teodoro et al. 2010)
  • Chemical, biological and medicinal named-entity recognition and normalization (, see Ruch 2006)
  • Thesaurus-based question expansion (PubChem, chEMBL, MeSH…)
  • Rocchio-based query refinement
  • Patent-related search (ranked #1 @ TREC 2009, Lupu et. al 2010)

Each TWINC service can be called separately using XML/HTTP protocols. The Flex-based GUI offers advanced query-authoring and user interaction skills. The user can iteratively expand his query using on-the-fly chemical entity normalization and synonym expansion services. He/she also can refine the scope of the search via an advanced IPC categorizer able to encode four levels of classification. Retrieved patents are then displayed in a navigation frame along with their claims, abstracts and descriptions, enriched with automatically extracted descriptors. The user can then iteratively refine the results by selecting the most relevant patents, from which additional features can be suggested by the engine using relevance feed-back services.

The web-based TWINC demonstrator is freely available for non-commercial use at The platform can be customized to be applied in corporate search environments to process domain and company-specific vocabularies, including non-English literature and patents reports (e.g. Chinese, Japanese).



