High Throughput Unsupervised Genetic Sequence Analysis

Speaker:  Daniel Boley - University of Minnesota
  Wednesday, May 14, 2014 at 4:30 PM 16:15 rinfresco; 16:30 inizio seminario

 The rapid growth of the genome sequence data in recent years has offered a new dimension in big data visualization and interpretation. An application paradigm is presented here to visualize the evolution of the influenza virus using an unsupervised machine learning approach to non-numeric genetic sequence data based on Principal Component Analysis. Two influenza virus cases are presented in this talk: (1) human A/H3N2 vs avian H5 evolution history and (2) North American swine influenza virus since the swine H1N1 pandemic of 2009.  The results in the first case suggest a hypothesis that vaccination could be one of the driving forces in the evolution of the human A/H3N2 influenza virus.  The evolution in the second case shows a strong correlation between the diversification of the North American swine influenza virus and the mutations at two specific sites in the hemaggluttinin protein.  By using unsupervised methods, we minimize the need to make assumptions about the relationships among the viruses.


Ca' Vignal 2, Floor 1°, Lecture Hall L

Programme Director
Alessandro Farinelli

External reference
Publication date
May 2, 2014