Bioinformatics and Evolutionary Genomics of Cancer
c/o IFOM-IEO Campus
Via Adamello, 16 - 20139 Milan, Italy
Tel: +39-02-574303053 - Fax: +39-02-574303231
Our research activity focuses on the role of genomic instability and somatic mutations in the development of human cancer. We tackle this issue using a combination of experimental and computational methods to sequence and analyze genomic data. Three main lines of investigations are currently pursued in the lab:
1. Measure of genomic instability using Next Generation Sequencing (NGS).
fig. 1A, 1B, 1C [+ zoom]
Cancer-associated genomic instability is usually detected through the quantification of clonal mutations, i.e. modifications that recur in the majority of cells. As a consequence, only genomic instability in already established neoplastic tissues, but not at pre-tumoral stages, can be identified. Using NGS, we developed a procedure for the quantification of genomic instability that does not require clonal mutations. Our method is based on the parallel sequencing of thousands single DNA filaments that bona fide derive from different cells. We reach a depth of coverage that allows the detections of high-frequency mutations as well as of random modifications that occur in a tiny fraction of cells, even prior to the establishment of the tumoral clone. To correct for random errors, we sequenced a portion of the human genome that includes an ultraconserved region (UCR) (Figure 1A). UCRs are genomic elements 100% identical between human, mouse, and rat, and significantly depleted in SNPs and copy number variants within the human population. We exploited the ‘frozen’ status of the UCRs, i.e. the impossibility to accumulate mutations, to control for the experimental errors and to quantify random mutations occurring in the non-frozen portions. For our screening, we amplified and sequenced the selected region in individuals with hereditary non-polyposis colorectal cancer (HNPCC), an autosomal dominant condition associated with heterozygous mutations in mismatch repair (MMR) genes (Figure 1B). Using the 454 platform, we sequenced more than 45,000 distinct DNA filaments from three different tissues of patients affected by HNPCC (neoplastic colon mucosa, nonneoplastic colon mucosa, and peripheral blood). As a negative control, we used the peripheral blood of nine healthy donors. When comparing the mutability outside and inside the UCR, the latter resulted significantly depleted in mutations in both neoplastic and nonneoplastic samples of HNPCC patients. No difference between the two regions was detectable in cells from healthy donors, indicating that also nonneoplastic HNPCC tissues have a constitutional mutation rate higher than healthy genomes (Figure 1C). To the best of our knowledge, this is the first direct evidence of an intrinsic genomic instability in individuals with heterozygous mutations in MMR genes, thus suggesting that these individuals have a predisposition to acquire the second hit that starts tumorigenesis. The study also constitutes the proof of principle for the development of a more sensitive molecular assay of genomic instability (De Grassi et al. 2010).
2. Identification of the systems-level properties of cancer genes.
fig. 2A, 2B, 2C [+ zoom]
Massive cancer genome re-sequencing projects have so far led to the identification of more than 1000 genes with putative cancer driver mutations, and the list is likely to grow. The majority of these genes are mutated in only a tiny fraction of samples, while recurrent mutations are overall very few. This unexpectedly high heterogeneity between and within tumor types suggests that the genetic routes to tumorigenesis may be in fact much more intricate and involve many more genes than previously foreseen (Ciccarelli. 2010).
In the past years, our lab undertook a systematic study of the systems-level properties of cancer genes in the attempt of finding recurrent features that could explain their involvement in cancer. As a first analysis, we sought to understand the relationship between the duplicability of genes mutated in cancer and the network properties of the encoded proteins, because connectivity and duplicability are usually indicative of gene fragility towards perturbations. We developed a method to count the duplications of each gene directly on the sequence of the human genome (Figure 2A). This procedure preferentially identifies recent and/or highly conserved duplicates, thus measuring the general propensity of each gene to retain duplications (i.e. gene duplicability). We showed that cancer genes overall avoid duplications when compared to the rest of human genes, even when only genes with the same functional properties are considered (Figure 2B). In addition, cancer genes tend to encode central hubs, i.e. they preferentially produce singleton proteins that engage several connections and occupy central positions at the crossroads of multiple biological processes (Figure 2C) (Rambaldi et al., 2008). These properties are uncommon within the human gene repertoire and help in interpreting the effect of somatic mutations as a sign of a broader fragility of cancer genes towards perturbations. Deletions, mutations, and amplifications of highly interconnected genes are likely to be deleterious because they can simultaneously affect several aspects of the cell life. Interestingly, these properties are not limited to well-known cancer genes but are also shared by genes whose modifications have been identified through large-scale mutational screenings (Syed et al., 2010). Again, this shows that there are common features of cancer genes, not immediately apparent from their individual function, that can explain their role in tumor development. Besides increasing our knowledge on the mechanisms of cancer genetics, we also used the recurrent properties of cancer genes to predict potential candidates. In collaboration with the zebrafish unit in our Institute, we confirmed that the expression of orthologs of these candidate cancer genes is overall deregulated in a zebrafish model of human cancer (Anelli et al., 2010).
3. Analysis of cancer-related genes and mutations.
fig. 3A, 3B, 3C, 3D [+ zoom]
We apply comparative genomics approaches to rebuild the evolution of genes and genomic determinants relevant for cancer. For example, we conducted a detailed analysis of the PRDM family of tumor suppressors that linked the vertebrate-specific expansion of these genes to their progressive functional specialization. Coupling comparative genomics and experimental validations we were able to prove that (1) the molecular evolution of PRDM paralogs correlates with their expression pattern; (2) PRDM diversification is obtained through rearrangements in the gene structure; and (c) splicing modifications contribute to the functional specialization of PRDM genes (Fumasoni et al., 2007). Starting from the anecdotic observation of the structural rearrangements in PRDM genes, we wondered whether this could be a recurrent mechanism of paralog diversification. We therefore performed a global survey of the rearrangements occurring in the structure of all human genes hosted within primate-specific segmental duplications (Figure 3). We specifically explored modifications caused by internal tandem repetitions occurring either inside exons or at exon-intron boundaries. We found that this type of modifications hits as many as 10% of primate-specific human genes that duplicated recently and that are still in the process of fixation within the human population. When the repetitions reside within exons, they encode variable amino acid repeats that are often involved in mediating the binding to other proteins. When located at exon-intron boundaries, they can generate alternative splicing isoforms through the formation of novel introns. In both cases, the resulting effect is the production of a variety of primate-specific proteins, which mostly differ in number and sequence of amino acid repeats. Interestingly, these genes are often under positive selection and enriched in alternative transcripts expressed cancer (De Grassi et al., 2009).
Web Servers and Public Databases
Network of Cancer Genes (NCG) http://bio.ifom-ieo-campus.it/ncg This public resource collects and integrates data on systems-level properties of cancer genes. It provides information on duplicability, orthology, evolutionary appearance and topological properties of the encoded protein in a comprehensive version of the human protein-protein interaction network. NCG also stores information on all primary interactors of cancer proteins, thus providing a complete overview of 5357 proteins that constitute direct and indirect determinants of human cancer (Syed et al., 2010).
FancyGene http://bio.ifom-ieo-campus.it/fancygene FancyGene is a web-based interactive tool for producing representations of one or more genes directly on the corresponding genomic locus. It is extremely flexible and allows the user to change the resulting image dynamically, to modify colors and shapes and to add and/or to remove objects. FancyGene is a useful tool to draw scientific pictures for scientific publications and presentations (Rambaldi et al., 2009).