BIOE 598SZ Computational Techniques for Analysis of Biological Data
Course objective:
Introduce/review mathematical and computational methodologies frequently used in
the analysis of biological data. Introduce/review computational environments and
software tools that are prominent in Bioengineering/Bioinformatics applications
and especially in analyzing large scale genomics data and modeling biological
systems. Practice programming and computation on biological objects.
Text book:
กก
Logistics:
Meeting Time: Fall 2007, 9:00am ?10:50am, Tue Thur
Meeting place: 3211 Digital Computer Lab
Credits: 4 graduate hours. Required for all bioengineering PhDs.
Course Reference number: CRN 47185
Instructor: Sheng Zhong (szhong AT uiuc DOT edu)
Enrollment: 20
Prerequisites: BIOE598MI, Analytical Methods for Biological System Modeling (can be taken in parallel); or consent of instructor.
กก
Evaluation:
Course grade is based on homework (50%), in class presentation (25%) and final project (25%).
Guides to class project:
The subject of the
proposed project can be but is NOT limited to the project discussed in class.
Alternatively, you may propose other bio-related subjects that:
1) you have access to the data;
2) useful insights will be obtained with computational inference.
The project outline should look read an extended abstract of a real paper. The
following information should be included:
1) background of the proposed study
2) proposed analytical methods
3) (if appropriate) proposed simulation studies
4) proposed analysis procedure of real data
5) expected results
6) (if appropriate) limitations and alternative strategies
Datasets for course project: Gene expression data, Gene annotation data
Contents:
I. (4 hrs) Overview of recent technology developments & large scale measurements of biological data
II. (9 hrs) Fundamentals of probability and statistics
a) Set theory
b) Independence, conditional probabilities and Bayes?rules
c) Random variables
d) Expectation and moments
e) Discrete distributions: Binomial, Geometric, Multinomial
f) Continuous distributions: Normal, Exponential
g) Case study: Modeling DNA motif with product-multinomial distribution
III. (9 hrs) Parameter estimation & Expectation-Maximization method
a) Likelihood maximization
b) EM algorithm: overview
c) EM Recursions and error analysis
d) Case study: Identification of protein-DNA interaction sites
IV. (9 hrs) Clustering analysis
a) Hierarchical clustering
b) K-means clustering
c) Initialization and convergence
d) Visualization
e) Case study: Identification of co-expressed genes
V. (6 hrs) Statistical tests
a) The idea: a coin example
b) Parametric and non-parametric tests
d) Case study: Detecting differentially expressed genes
VI. (9 hrs) Markov chains
a) Transition probability and state transition graph
b) Time evolution of probability distributions of states
c) Classification of states: persistent, transient & periodic states
d) Stationary distribution
e) Case study: modeling genome sequence with a Markov chain
VII. (4 hrs) Markov Chain Monte Carlo (MCMC) methods
a) Metropolis-Hastings
b) Simulated Annealing