Background image leftICM Logo
Background image right
Biological Systems Modeling Computational Anatomy Bioinformatics CardioVascular Research Grid


Biomedical research is being revolutionized by new technologies for generating high throughput data.  For example, the mRNA counts contained in gene microarrays provide a global view of cellular activity by simultaneously recording the expression levels of thousands of genes. Similarly, new methods for measuring the expression of proteins in cells and tissues and mapping protein-protein interactions are providing rich sources of information for learning about disease mechanisms. Research in bioinformatics at the ICM is currently focused on representing and analyzing such data.

Biomedical Data Management:

The ICM is developing relational databases for managing multi-scale biomedical data. The MAGE-DB2 Project is developing a full relational mapping of the MicroArray Gene Expression (MAGE) object model (OM) optimized to run on IBM’s scalable, parallel database DB2. The Protein-DB Project is developing a relational implementation of the Protein Standards Initiative (PSI) object model for storing complete descriptions of 2-D gel, MS and MS/MS experimental analyses. The Cardiac Anatomic Database System (CADS) is an object model and relational database designed to store finite-element models of imaged hearts along with magnetic resonance (MR) imaging data measured at each image voxel. It is currently used to store diffusion tensor MR data describing fiber and laminar sheet organization of imaged hearts.

Mathematical Bioinformatics:

Researchers in the ICM are designing new tools, even foundations, in machine learning, statistical inference and stochastic modeling and applying these methods to develop new approaches for assessing disease risk and treatment for the individual patient. This research is addressing two fundamental problems confronted when analyzing biomedical data. First, from the point of view of classical learning and inference, significant technical problems arise from the small number of observations (patients, tissue samples, etc.) relative to the large number of variables (genes, proteins, etc.) that may be measured using modern high-throughput assay technologies. This so-called “small-sample dilemma” is one of the main challenges of modern computational biology. As an example, DNA microarrays provide a powerful tool in cancer research and several studies have used this technology to identify genes that could be used as candidate markers to discriminate cancer from normal conditions as well as distinguish among cancer types. Accurate decision rules are essential for diagnostic purposes, as the treatment options, responses to therapy, and prognoses vary depending on the type, staging and grouping of tumors.  However, this poses many challenges to traditional machine learning methods due to the small-sample problem. An important research focus of the ICM is developing statistical learning methods that are appropriate for use in this small-sample regime. Second, when complex machine learning methods are utilized, the resulting classification rules, however accurate, may be largely incomprehensible to a biologist or physician.  Hence, research at the ICM is also focused on methods for discovering and explicitly characterizing the important interactions among biological variables.   As an example, we are exploring new methods for elucidating the logic, topology and statistical dependency structure of gene regulatory, protein-protein and metabolic networks, utilizing advances in areas such as stochastic simulation, graphical modeling and Bayesian inference.

See also the Center for Cardiovascular Bioinformatics and Modeling and the Center for Imaging Science