Many aspects of genome evolution are best captured by numerical quantities. Examples include the density of introns in eukaryotic genes, number of genes within the same family across related organisms, and genome size. I will describe computational methods tailored for the analysis of such numerical characters evolving along a known evolutionary tree.
First, I will discuss a parsimony approach, where knowing the feature values at terminal tree nodes, ancestral values at inner nodes are inferred by minimizing the total number of changes on the edges using continuous-valued change-penalty measures. In particular, I will consider an asymmetric generalization of the Wagner parsimony approach, in which changes on the edges are penalized linearly, but with different factors applying to increases (gains) and decreases (losses).
Second, I will discuss probabilistic models for the evolution of intron density and gene family size. Recently completed genome sequences have been used for comprehensive analyses of exon-intron organization in orthologous genes of diverse organisms. I propose a method for estimating the number of introns lost or unobserved in all extant organisms, and show how to compute counts of intron gains and losses along the branches by using posterior probabilities. The analysis shows a dynamic history with frequent intron losses and gains, and fairly intron-rich ancestral organisms.
I will also talk about my work on the evolution of a gene family along an evolutionary tree. More precisely, we model the evolution of the number of homologs within the family, without sequence-level information. This work represents the first tractable probabilistic model that simultaneously handles the three main mechanisms that shape gene content: horizontal gene transfer, gene duplication, and gene loss.
From a mathematical viewpoint, the models bring about interesting computational problems about branching birth-death processes. Intron evolution is modeled by a 0-1 process, whereas gene content evolution is modeled by a linear birth-death-immigration process.
M. Csuros, I. B. Rogozin and E. V. Koonin. "Extremely intron-rich genes in the alveolate ancestors inferred with a flexible maximum likelihood approach." Molecular Biology and Evolution, 25:903-911, 2008. DOI: http://dx.doi.org/10.1093/molbev/msn039 [open access]
M. Csuros, J. A. Holey and I. B. Rogozin. "In search of lost introns" ISMB/ECCB 2007 (Intelligent Systems for Molecular Biology + European Conf. on Computational Biology) DOI: http://dx.doi.org/10.1093/bioinformatics/btm190 [open access]
M. Csuros and I. Miklos. "A probabilistic model for gene content evolution with duplication, loss, and horizontal transfer." RECOMB 2006 (Conf. Research in Computational Molecular Biology), DOI: http://dx.doi.org/10.1007/11732990_18. http://www.iro.umontreal.ca/~csuros/papers/gld.pdf.