Facilities |
ICM Computing Resources:
• 250 node, 2000 core, IBM iDataPlex cluster attached to a 1 Petabyte Storage Area Network
• IBM TotalStorage 3584 UltraScalable tape library with 4 LTO4 drives and 250 slots
Center for Computational Biology and Medicine Computing Resources:
• 63 node, 126 core, IBM BladeCenter cluster
• 2 16 core Dell R900 application servers attached to 10 Terabytes of dedicated storage
• 8 dual core IBM and Dell application servers
Trayanova Lab Computing Resources:
• 120 node, 500 core, Penguin Computing cluster attached to 30 Terabytes of dedicated storage
• 2 dual core file servers
Karchin Lab Computing Resources:
• 20 node, 80 core, Dell cluster
• 3 dual core Dell application servers attached to 25 Terabytes of dedicated storage
• Dell ML6000 tape library with 2 LTO3 drives and 28 slots
CSEB 213 machine room:
ICM computing resources are housed in a 1000 square foot facility in the Computational Science and Engineering Building on the Homewood Campus. The facility has dedicated power, cooling, and network infrastructure. Facility network connectivity is via the building’s 10 Gigabit Ethernet backbone to the Homewood Campus core.
Databases |
MAGE-DB2
The MicroArray Gene Expression (MAGE) object model (OM)
is an emerging standard for the representation of microarray
data (Spellman
et al.). MAGE-OM provides the necessary data types
and relationships to support storage of very complete
descriptions of microarray experiments and the resulting
data including: a) description of RNA isolation, hybridization,
and scanning protocols; b) descriptions of chip architecture;
c) annotation data relating to chip probes; d) raw and
processed data at the lowest feature level of the arrays.
A MAGE-ML toolkit is available for exporting MAGE data
to XML-format for purposes of data distribution . We have
implemented a database called MAGE-DB2 that is a full relational mapping of the MAGE-OM optimized
to run on IBM’s scalable, parallel database DB2.
The database is supplemented with: a) data loaders for
importing chip description files for both cDNA, long-oligonucleotide
(Agilent) and Affymetrix arrays, annotation data, and
primary microarray data; b) a web-interface that enables
users to design, save and re-use descriptions of all protocols;
c) a natural language query builder that enables non-expert
users to query the database; d) data export tools that
output data stored in MAGE-DB2 to MAGE-ML format.
Protein-DB2
Protein-DB2 is a relational
database for primary proteomics experimental data. Its
schema also is an extension of the Proteomics Experiment Data
Repository (PEDRo) object model (Taylor
et al), now referred to as the Protein Standards Initiative
(PSI) object model. Protein-DB2 is designed to store complete descriptions of 2-D gel
experimental analyses output by the analysis tool Progenesis
(Nonlinear Dynamics Inc.), as well as results of subsequent
MS and MS/MS analysis. A suite of data translators which
convert binary files output by multiple mass spectrometry
instrumentation to either PSI-ML or mzXML formats
has been developed. Software for loading PSI-ML or mzXML
files into the database and associating these data with
particular spots has been developed. Software for linking
MS and MS/MS data from spot analysis to gels stored within
the database has been developed. Software for loading
the results of Mascot and Sequest MS data analysis and
linking this with MS and spot data has been developed. Versions
1 and 2 database source code is available at the CCBM website
(www.ccbm.jhu.edu).
The
Cardiac Anatomic Database System (CADS)
CADS is an object model and
relational database designed to store finite-element models
of imaged hearts along with MR imaging data measured at
each image voxel. It is currently used to store diffusion
tensor MR data describing fiber and laminar sheet organization
of imaged hearts. CADS is unique
in that it incorporates a visually driven query system
so that users may visualize imaged, segmented and rendered
hearts, graphically select voxel data within sub-regions
for analysis and specify arbitrary SQL queries to be executed
on the data within the selected sub-region. It is a powerful
tool for voxel by voxel comparative analysis of cardiac
imaging data. CADS is available
for download at www.ccbm.jhu.edu and is deployed and operational.
Software |
The Center for Cardiovascular Bioinformatics & Modeling - Model Source Code