Computer Infrastructure

Introduction

It is still challenging and time-consuming to design, build, and maintain large computational resources despite remarkable progress in the last 5 years. The three main components of a computational system are the network, the storage, and the CPUs. The most sophisticated and expensive strategy is to build super-computers where the CPUs have access to the same memory and the same file system. These systems require very sophisticated network technology, and it is challenging to write software that can utilize the resources fully. The cheaper clusters (Beowulf clusters) do not have shared memory, which limits what type of computing that can be performed on these machines. The shared file-system puts some constraints on the proximity of each component to the others and sets an upper limit on how large these systems can grow. Clouds and grids are both the cheapest to build, maintain, and they can grow larger than the two previous ones. These systems are built up by completely independent computers with no shared file system and no shared memory. They are also often geographically distributed and on a very heterogeneous network. This puts some severe constraints on what can be computed on these systems.

Research

Traditionally, Rosetta has been operated on small-medium sized Beowulf clusters. I helped Keith Laidig (Formix, LLC) build the first of these in 2001. It was called the YRC cluster and had 80 PIII CPUs at 1GHz with 1GB of RAM per CPU. The file-system was a 2.2TB RAID-5 file server, and the network was a single 100MBit switch, and the entire system was housed in a single rack. The second system was built in collaboration with Greg Taylor. Apollo was a 100-node/800-core 64-bit Xeon cluster with 2GB of RAM per core (16 per node). The shared file system was a 22TB PolyServe SAN served over 1GB network (10GE between switches), and the systems were housed in 5 racks. As Rosetta is highly CPU intensive, the big genome annotation projects were run on the world community grid that is operated by IBM. This system is based on volunteers donating CPU power when their desktops/laptops are idle. We have used over 100.000 CPU years and still counting.

References

Nr.	Reference
4.	Bauch, Angela; Adamczyk, Izabela; Buczek, Piotr; Elmer, Franz-Josef; Enimanev, Kaloyan; Glyzewski, Pawel; Kohler, Manuel; Pylak, Tomasz; Quandt, Andreas; Ramakrishnan, Chandrasekhar; Beisel, Christian; Malmstrom, Lars; Aebersold, Ruedi; Rinn, Bernd; openBIS: a flexible framework for managing and analyzing complex data in biology research. BMC Bioinformatics (2011), 12: 468.
3.	Kunszt, Peter; Malmstrom, Lars; Fantini, Nicola; Subholt, Wibke; Lautenschlager, Marcel; Reifler, Roland; Ruckstuhl, Stefan; Accelerating 3D Protein Modeling Using Cloud Computing. Seventh IEEE International Conference on e-Science Workshops (2011), --: 166-169.
2.	Drew, Kevin; Winters, Patrick; Butterfoss, Glenn; Berstis, Viktors; Uplinger, Keith; Armstrong, Jonathan; Riffle, Michael; Schweighofer, Erik; Bovermann, Bill; Goodlett, David; Davis, Trisha; Shasha, Dennis; Malmstrom, Lars; Bonneau, Richard; The proteome folding project: Proteome-scale prediction of structure and function. Genome Res (2011), 21: 1981-94.
1.	Malmstrom, Lars; Riffle, Michael; Strauss, Charlie; Chivian, Dylan; Davis, Trisha; Bonneau, Richard; Baker, David; Superfamily Assignments for the Yeast Proteome through Integration of Structure Prediction with the Gene Ontology. PLoS Biol (2007), 5: e76.