SNPpy--database management for SNP data from genome wide association studies.
Abstract
BACKGROUND: We describe SNPpy, a hybrid script database system using the Python SQLAlchemy
library coupled with the PostgreSQL database to manage genotype data from Genome-Wide
Association Studies (GWAS). This system makes it possible to merge study data with
HapMap data and merge across studies for meta-analyses, including data filtering based
on the values of phenotype and Single-Nucleotide Polymorphism (SNP) data. SNPpy and
its dependencies are open source software. RESULTS: The current version of SNPpy offers
utility functions to import genotype and annotation data from two commercial platforms.
We use these to import data from two GWAS studies and the HapMap Project. We then
export these individual datasets to standard data format files that can be imported
into statistical software for downstream analyses. CONCLUSIONS: By leveraging the
power of relational databases, SNPpy offers integrated management and manipulation
of genotype and phenotype data from GWAS studies. The analysis of these studies requires
merging across GWAS datasets as well as patient and marker selection. To this end,
SNPpy enables the user to filter the data and output the results as standardized GWAS
file formats. It does low level and flexible data validation, including validation
of patient data. SNPpy is a practical and extensible solution for investigators who
seek to deploy central management of their GWAS data.
Type
Journal articlePermalink
https://hdl.handle.net/10161/15366Published Version (Please cite this version)
10.1371/journal.pone.0024982Publication Info
Mitha, Faheem; Herodotou, Herodotos; Borisov, Nedyalko; Jiang, Chen; Yoder, Josh;
& Owzar, Kouros (2011). SNPpy--database management for SNP data from genome wide association studies. PLoS One, 6(10). pp. e24982. 10.1371/journal.pone.0024982. Retrieved from https://hdl.handle.net/10161/15366.This is constructed from limited available data and may be imprecise. To cite this
article, please review & use the official citation provided by the journal.
Collections
More Info
Show full item recordScholars@Duke
Kouros Owzar
Professor of Biostatistics & Bioinformatics
cancer pharmacogenomicsdrug induced neuropathy, neutropenia and hypertensionstatistical
genetics statistical methods for high-dimensional data copulas survival analysis statistical
computing

Articles written by Duke faculty are made available through the campus open access policy. For more information see: Duke Open Access Policy
Rights for Collection: Scholarly Articles
Works are deposited here by their authors, and represent their research and opinions, not that of Duke University. Some materials and descriptions may include offensive content. More info