The main focus of this course is the application of R programming to the analysis of genetic data, particularly “big data” sets with multiple measurements. The primary data sets considered will contain RNA-seq and/or other expression data for multiple/all genes in a given set of individuals. This course is for junior or senior students who are thinking of careers at the intersection of life sciences, statistics, and/or computer science, particularly students who are majoring in Genetics.  The course fulfills the laboratory requirement for the Genetics major.  Students will learn how to acquire such data, format it for R, plot the data, and perform statistical analyses. In addition, students will learn how to simulate data under different hypotheses, and how to perform power and sample size calculations for different statistical methods applied to real or simulated data.  Each class consists of a mixture of lecture and computer-based demos and/or exercises, as well as time for students to work on assignments. Guest investigators will frequently make short presentations (in person or by skype) to provide illustrations of how programming and informatics is critical for their research. The course provides the introductory skills needed to conduct basic computational research in the life sciences, including many aspects of computer programming and data analysis.

Credits: 3


Students must have previously completed Genetic Analysis I (01:447:384) or Genetics (01:447:380).