The use of sophisticated semantic algorithms implies much better results in many IR applications than simple text indexation. Applying these methods to large corpora, substantial computing capacity is needed. In fact, experimental handling of data on a Terabyte scale requires a supercomputing infrastructure.
The IRF hardware infrastructure is one of the most powerful systems worldwide that deals with semantic processing of text. It comprehends the following 3 elements:
Large Data Collider (LDC)
- 320 Gbytes of main memory
- 80 Itanium CPUs running at 1.4 GHz
- 1 SGI Infinite Storage
- 1 mptsas SCSI controller
- 12 mptfc fibre channel controllers
- 4 SGI RASC RC100 FPGA
- 2 Broadcom BCM5704 Gigabit Ethernet interfaces
To fully exploit the potential of the LDC, parallel C, C++ and FORTRAN code are recommended. The Itanium processors are designed for optimal usage of the large shared memory in parallel computing, but are much less performant for serial or Java applications.
Medium Data Collider (MDC)
- 2 IBM x3950
- 32 Cores (4 quad core Intel Xeon@2.93GHz per node)
- 256 Gbytes of main memory
- 600 Gbytes storage on internal disks
- Production cluster for Java software and serial code.
Storage, SAN
will be provided based on the needs of a given research project after approval by the Scientific Board.
If you are interested in accessing the IRF supercomputing infrastructure, please contact membership@ir-facility.org.