Genomic compression
-
Here, we introduce simplitigs as a compact, efficient, and scalable representation, and ProphAsm, a fast algorithm for their computation. For the example of assemblies of model organisms and two bacterial pan-genomes, we compare simplitigs to unitigs, the best existing representation, and demonstrate that simplitigs provide a substantial improvement in the cumulative sequence length and their number.
24p viarchimedes 26-01-2022 7 0 Download
-
Unsupervised compression algorithms applied to gene expression data extract latent or hidden signals representing technical and biological sources of variation. However, these algorithms require a user to select a biologically appropriate latent space dimensionality.
27p viarchimedes 26-01-2022 7 0 Download
-
As a photoperiod-sensitive and self-pollinated species, the growth periods traits play important roles in the adaptability and yield of soybean. GWAS was conducted by a compressed mixed linear model (CMLM) involving in both relative kinship and population structure.
13p viansan2711 30-07-2021 18 1 Download
-
The rapid development of Next-Generation Sequencing technologies enables sequencing genomes with low cost. The dramatically increasing amount of sequencing data raised crucial needs for efficient compression algorithms.
9p vijeeni2711 24-07-2021 15 0 Download
-
Despite significant advancement in alignment algorithms, the exponential growth of nucleotide sequencing throughput threatens to outpace bioinformatic analysis. Computation may become the bottleneck of genome analysis if growing alignment costs are not mitigated by further improvement in algorithms.
8p viwyoming2711 16-12-2020 14 0 Download
-
Genomic information allows population relatedness to be inferred and selected genes to be identified. Single nucleotide polymorphism microarray (SNP-chip) data, a proxy for genome composition, contains patterns in allele order and proportion.
21p vikentucky2711 26-11-2020 9 0 Download
-
Metagenomics is a genomics research discipline devoted to the study of microbial communities in environmental samples and human and animal organs and tissues. Sequenced metagenomic samples usually comprise reads from a large number of different bacterial communities and hence tend to result in large file sizes, typically ranging between 1–10 GB.
13p vioklahoma2711 19-11-2020 13 2 Download
-
The rapid progress of high-throughput DNA sequencing techniques has dramatically reduced the costs of whole genome sequencing, which leads to revolutionary advances in gene industry. The explosively increasing volume of raw data outpaces the decreasing disk cost and the storage of huge sequencing data has become a bottleneck of downstream analyses.
8p vioklahoma2711 19-11-2020 11 0 Download
-
With the reduction of gene sequencing cost and demand for emerging technologies such as precision medical treatment and deep learning in genome, it is an era of gene data outbreaks today. How to store, transmit and analyze these data has become a hotspot in the current research. Now the compression algorithm based on reference is widely used due to its high compression ratio.
12p viconnecticut2711 28-10-2020 10 1 Download
-
Recent advancements in high-throughput sequencing technologies have generated an unprecedented amount of genomic data that must be stored, processed, and transmitted over the network for sharing.
15p vicolorado2711 22-10-2020 16 0 Download
-
Advanced sequencing machines dramatically speed up the generation of genomic data, which makes the demand of efficient compression of sequencing data extremely urgent and significant. As the most difficult part of the standard sequencing data format FASTQ, compression of the quality score has become a conundrum in the development of FASTQ compression.
12p vicolorado2711 22-10-2020 13 0 Download
-
The development of next-generation sequencing technologies has helped sequence large genomes easily, producing a huge number of short-reads - small fragments of DNA. Despite the existence of many developed alignment tools, mapping short-read datasets to the reference genome, a crucial step of genome analysis, still remains a challenge. In this study, we develop a short-read alignment program, BWTaligner, based on the Burrows-Wheeler transform compression - exact and inexact matching.
5p caygaocaolon1 13-11-2019 15 0 Download
-
Hydrogen peroxide and catalyst solutions needed for Fenton's Reagent are usually added to the treatment area by pressure injection into one or more designated chemical oxidation injection wells, or gravity injection into one or more monitoring or other wells. In pressure injection, compressed air is used to sparge the ferrous iron catalyst and relatively large volumes of peroxide solution (e.g., hundreds to thousands of gallons) into the contaminated soil and groundwater over a short period of time (e.g., days).
32p cao_can 29-12-2012 65 3 Download