January 30, 2004

Colloquium Speaker: Dr. Srinidhi Varadarajan


Dr. Srinidhi Varadarajan is the Director of the Terascale Computing Facility and Assistant Professor of Computer Science at Virginia Polytechnic Institute and State University in Blacksburg, Virginia. He received his Ph.D. in Computer Science from the State University of New York, Stony Brook in 2000. Dr. Varadarajan is the recipient of a CAREER award from the National Science Foundation, the Egg Factory Technology Innovation award and a Faculty Fellow award from the College of Engineering, Virginia Tech. Dr. Varadarajan's research covers transparent fault tolerance for massively parallel supercomputers, scalable network emulation, compiler directed strategies for flexible data sharing models and routing algorithms for backbone IP networks. His work involves developing incremental and automatic checkpointing, recovery and code migration algorithms, dynamic load balancing and 3D environments for network traffic visualization. His research focuses on building a distributed system that can scale to emulate hundreds of thousands of virtual nodes. He is exploring the use of AI techniques such as reinforced learning for use in a probabilistic framework for multi-path routing protocols. Dr. Varadarajan is the architect of System X, the third fastest supercomputer in the world located at the Terascale Computing Facility at Virginia Tech.


Colloquium Topic: System X: Building the Virginia Tech Supercomputer

System X was conceived in March 2003, designed in July 2003 and by October it had achieved a sustained performance of 10.28 Teraflops, making it the third fastest supercomputer in the world today. System X has several novel features. First, it is based on an Apple G5 platform with the new IBM PowerPC 970 64-bit CPUs. Secondly, it uses a high performance switched communications fabric called Infiniband. Finally, System X is cooled by a hybrid liquid-air cooling system. This talk will present the motivation for System X, its architecture, and the challenges faced in building, deploying and maintaining a large-scale supercomputer.