The Genomic Next-generation Universal MAPper GNUMAP
is a program designed to
accurately map sequence data obtained from next-generation sequencing machines
(specifically that of Solexa/Illumina) back to a genome of any size.
With the emergence of high-throughput next-generation sequencing machines, an
incredible amount of data is being produced at a very high rate. The big problem
is mapping this data back to the genome. One significant problem with many
genomic mapping programs is the way duplicate regions in genomic DNA are dealt
with. Since it is impossible to know where exactly where a duplicate region
should be mapped to, many programs simply throw out these sequences. Often, this
results in a loss of nearly 40% of the data.
This project develops GNUMAP, a program capable of handling such repetitive
regions. By using the posterior probability of mapping a given read to a specific
genomic loation, we are able to account for these repetitive reads by distributing
them across several regions in the genome. In addition, the output of the program
is created in such a way that it can be easily viewed through other free and readily-
available programs. Several benchmark data sets were created with spiked-in duplicate
regions, and GNUMAP was able to more accurately account for these duplicate regions.
GNUMAP Users Google Group
- GNUMAP Bisulfite paper (GNUMAP-BS) has been submitted and is under review
- GNUMAP can now be used with MPI (Version 2.*). If several machines are available, the work can
be spread evenly across them, greatly decreasing the processing time. For an example submission, see the
GNUMAP user's group.
- When determining if a SNP is actual or just background noise, GNUMAP now uses a pairwise Hidden Markov Model
in addition to the calculation of a posterior probability to identify a SNP location. Preliminary results
outperform any existing program.
- GNUMAP can now save and read in a binary genome file! The latest version is available here.
Presentations and Publications
- May 2012: HiCOMB (in conjunction with IPDPS) in Shanghai, China.
The workshop paper can be found here,
and the presentation is available in PowerPoint format.
- May 2011: HiCOMB (in conjunction with IPDPS) in Anchorage, Alaska. The workshop paper can be found here,
and the presentation is available in Keynote or PDF formats
- The GNUMAP algorithm was recently published in Bioinformatics as
The GNUMAP algorithm: unbiased probabilistic mapping of oligonucleotides from next-generation sequencing.
Here is the link.
- June 2009: ISMB in Stockholm, Sweden.
The presentation can be downloaded as a pptx or pdf.
- GNUMAP features are constantly added. Recently, a pairwise Hidden Markhov Model was added to increase accuracy for SNP detection.
- For questions or usage directions, please email Nathan at nathanlclement (at) gmail.com or gnumap-users (at) googlegroups.com.
This page last modified Wednesday May 28, 2014