SNPpy--database management for SNP data from genome wide association studies.

dc.contributor.author

Mitha, Faheem

dc.contributor.author

Herodotou, Herodotos

dc.contributor.author

Borisov, Nedyalko

dc.contributor.author

Jiang, Chen

dc.contributor.author

Yoder, Josh

dc.contributor.author

Owzar, Kouros

dc.coverage.spatial

United States

dc.date.accessioned

2017-08-28T15:24:34Z

dc.date.available

2017-08-28T15:24:34Z

dc.date.issued

2011

dc.description.abstract

BACKGROUND: We describe SNPpy, a hybrid script database system using the Python SQLAlchemy library coupled with the PostgreSQL database to manage genotype data from Genome-Wide Association Studies (GWAS). This system makes it possible to merge study data with HapMap data and merge across studies for meta-analyses, including data filtering based on the values of phenotype and Single-Nucleotide Polymorphism (SNP) data. SNPpy and its dependencies are open source software. RESULTS: The current version of SNPpy offers utility functions to import genotype and annotation data from two commercial platforms. We use these to import data from two GWAS studies and the HapMap Project. We then export these individual datasets to standard data format files that can be imported into statistical software for downstream analyses. CONCLUSIONS: By leveraging the power of relational databases, SNPpy offers integrated management and manipulation of genotype and phenotype data from GWAS studies. The analysis of these studies requires merging across GWAS datasets as well as patient and marker selection. To this end, SNPpy enables the user to filter the data and output the results as standardized GWAS file formats. It does low level and flexible data validation, including validation of patient data. SNPpy is a practical and extensible solution for investigators who seek to deploy central management of their GWAS data.

dc.identifier

https://www.ncbi.nlm.nih.gov/pubmed/22039405

dc.identifier

PONE-D-11-09289

dc.identifier.eissn

1932-6203

dc.identifier.uri

https://hdl.handle.net/10161/15366

dc.language

eng

dc.publisher

Public Library of Science (PLoS)

dc.relation.ispartof

PLoS One

dc.relation.isversionof

10.1371/journal.pone.0024982

dc.subject

Databases, Genetic

dc.subject

Genome-Wide Association Study

dc.subject

Humans

dc.subject

Polymorphism, Single Nucleotide

dc.subject

Software

dc.title

SNPpy--database management for SNP data from genome wide association studies.

dc.type

Journal article

pubs.author-url

https://www.ncbi.nlm.nih.gov/pubmed/22039405

pubs.begin-page

e24982

pubs.issue

10

pubs.organisational-group

Basic Science Departments

pubs.organisational-group

Biostatistics & Bioinformatics

pubs.organisational-group

Duke

pubs.organisational-group

Duke Cancer Institute

pubs.organisational-group

Institutes and Centers

pubs.organisational-group

School of Medicine

pubs.publication-status

Published

pubs.volume

6

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
SNPpy--database management for SNP data from genome wide association studies.pdf
Size:
879.96 KB
Format:
Adobe Portable Document Format
Description:
Accepted version