Bias correction and Bayesian analysis of aggregate counts in SAGE libraries
Abstract
Background: Tag-based techniques, such as SAGE, are commonly used to sample the mRNA
pool of an organism's transcriptome. Incomplete digestion during the tag formation
process may allow for multiple tags to be generated from a given mRNA transcript.
The probability of forming a tag varies with its relative location. As a result, the
observed tag counts represent a biased sample of the actual transcript pool. In SAGE
this bias can be avoided by ignoring all but the 3' most tag but will discard a large
fraction of the observed data. Taking this bias into account should allow more of
the available data to be used leading to increased statistical power. Results: Three
new hierarchical models, which directly embed a model for the variation in tag formation
probability, are proposed and their associated Bayesian inference algorithms are developed.
These models may be applied to libraries at both the tag and aggregate level. Simulation
experiments and analysis of real data are used to contrast the accuracy of the various
methods. The consequences of tag formation bias are discussed in the context of testing
differential expression. A description is given as to how these algorithms can be
applied in that context. Conclusions: Several Bayesian inference algorithms that account
for tag formation effects are compared with the DPB algorithm providing clear evidence
of superior performance. The accuracy of inferences when using a particular non-informative
prior is found to depend on the expression level of a given gene. The multivariate
nature of the approach easily allows both univariate and joint tests of differential
expression. Calculations demonstrate the potential for false positive and negative
findings due to variation in tag formation probabilities across samples when testing
for differential expression.
Type
Other articleSubject
gene-expressiondifferential expression
serial analysis
model
supersage
biochemical research methods
biotechnology & applied microbiology
mathematical & computational biology
Permalink
https://hdl.handle.net/10161/4340Published Version (Please cite this version)
10.1186/1471-2105-11-72Citation
Zaretzki,Russell L.;Gilchrist,Michael A.;Briggs,William M.;Armagan,Artin. 2010. Bias
correction and Bayesian analysis of aggregate counts in SAGE libraries. Bmc Bioinformatics
11( ): 72-72.
Collections
More Info
Show full item record
Articles written by Duke faculty are made available through the campus open access policy. For more information see: Duke Open Access Policy
Rights for Collection: Scholarly Articles
Works are deposited here by their authors, and represent their research and opinions, not that of Duke University. Some materials and descriptions may include offensive content. More info