The use of sophisticated semantic algorithms implies much better results in many IR applications than simple text indexation. Applying these methods to large corpora, substantial computing capacity is needed. In fact, experimental handling of data on a Terabyte scale requires a supercomputing infrastructure.
The IRF hardware infrastructure is one of the most powerful systems worldwide that deals with semantic processing of text. It is comprised of the following 3 elements:
Large Data Collider (LDC)
- 320 Gbytes of main memory
- 80 Itanium CPUs running at 1.4 GHz
- 1 SGI Infinite Storage
- 12 mptfc fibre channel controllers
- 4 SGI RASC RC100 FPGA
- 2 Broadcom BCM5704 Gigabit Ethernet interfaces
To fully exploit the potential of the LDC, parallel C, C++ and FORTRAN code are recommended. The Itanium processors are designed for optimal usage of the large shared memory in parallel computing, but are much less performant for serial or Java applications.
Medium Data Collider (MDC)
- 2 IBM x3950M2
- 32 Cores (4 quad core Intel Xeon@2.93GHz per node)
- 256 Gbytes of main memory
- 600 Gbytes storage on internal disks
- Production cluster for Java software and serial code