Topic 140

bioinformatics alignment algorithm sequence efficient introduce sequences alignments mers fast memory reads query computationally faster methods applications mer method problem accuracy large based allows graph search computational pairwise approaches compute read sequencing index reference present compression accurate queries approach data aligning available com structures implementation minimizers set github exact usage distance algorithms bruijn size software aligner can graphs hashing reduce called tools sets speed information use minimizer perform genomes scale resultswe structure availabilityhttps computation propose step running intensive time hash improve length able compact homology aligned pair order implemented programming solutions more improves finding prediction comparable scalability art so simulated

81 items. Top items listed below.

Phylogenetic placement of short reads without sequence alignment 140 25 4

Embedding the de Bruijn graph, and applications to metagenomics 140 26 4

Predicting Alignment Distances via Continuous Sequence Matching 140 26 4

Compact and evenly distributed k-mer binning for genomic sequences 140 4

Global, Highly Specific and Fast Filtering of Alignment Seeds 140 4

A faster implementation of association mapping from k-mers 140 25 4

Spectral Jaccard Similarity: A new approach to estimating pairwise sequence alignments 140 13 4

Detecting High Scoring Local Alignments in Pangenome Graphs 140 4

BLight: Efficient exact associative structure for k-mers 140 13 4

Strain Level Microbial Detection and Quantification with Applications to Single Cell Metagenomics 140 13 4

Aligning biological sequences by exploiting residue conservation and coevolution 140 4

Simplitigs as an efficient and scalable representation of de Bruijn graphs 140 4

Puffaligner: An Efficient and Accurate Aligner Based on the Pufferfish Index 140 13 4

ComPotts: Optimal alignment of coevolutionary models for protein sequences 140 105 4

Remote homology search with hidden Potts models 140 4

Cuttlefish: Fast, parallel, and low-memory compaction of de Bruijn graphs from large-scale genome collections 162 13 4

Kmer2SNP: reference-free SNP calling from raw reads based on matching 13 4

Representation of k-mer sets using spectrum-preserving string sets 140 13 4

QueryFuse Is A Sensitive Algorithm For Detection Of Gene-Specific Fusions 140 4

RNA structure prediction using positive and negative evolutionary information 140 105 4

Real-time structural motif searching in proteins using an inverted index strategy 140 4

MetaGraph: Indexing and Analysing Nucleotide Archives at Petabase-scale 140 4

An Algorithm to Build a Multi-genome Reference 140 13 4

Identifying Taxonomic Units in Metagenomic DNA Streams 140 4

Stochastic Sampling of Structural Contexts Improves the Scalability and Accuracy of RNA 3D Module Identification 140 105 4

Refining pairwise sequence alignments of membrane proteins by the incorporation of anchors 140 4

Improved design and analysis of practical minimizers 140 13 4

Fold recognition by scoring protein map similarities using the congruence coefficient 140 105 4

Real time structural search of the Protein Data Bank 140 111 4

BURST enables mathematically optimal short-read alignment for big data 140 25 4

Choosing representative proteins based on splicing structure similarity improves the accuracy of gene tree reconstruction 162 13 4

Frequent subgraph mining for biologically meaningful structural motifs 140 4

AStarix: Fast and Optimal Sequence-to-Graph Alignment 140 13 4

S-conLSH: Alignment-free gapped mapping of noisy long reads 13 4

Weighted minimizer sampling improves long read mapping 140 13 4

Reducing reference bias using multiple population reference genomes 140 25 4

Boundary-Forest Clustering: Large-Scale Consensus Clustering of Biological Sequences 145 4

Consistent Consideration of RNA Structural Alignments Improves Prediction Accuracy of RNA Secondary Structures 140 4

Fast lightweight accurate xenograft sorting 140 13 4

SCRAPP: A tool to assess the diversity of microbial samples from phylogenetic placements 162 13 4

A randomized parallel algorithm for efficiently finding near-optimal universal hitting sets 140 4

Long-read error correction: a survey and qualitative comparison 25 4

Single Individual Haplotype Reconstruction Using Fuzzy C-Means Clustering With Minimum Error Correction 145 4

Tumor Phylogeny Topology Inference via Deep Learning 140 26 4

Data structures based on k-mers for querying large collections of sequencing datasets 140 4

Demonstrating the utility of flexible sequence queries against indexed short reads with FlexTyper 25 13 4

A consensus-based ensemble approach to improve de novo transcriptome assembly 25 4

mbkmeans: fast clustering for single cell data using mini-batch k-means 147 43 13 4

QAlign: Aligning nanopore reads accurately using current-level modeling 25 13 4

kASA: Taxonomic Analysis of Metagenomic Data on a Notebook 140 62 4

Sapling: Accelerating Suffix Array Queries with Learned Data Models 13 4

PhISCS-BnB: A Fast Branch and Bound Algorithm for the Perfect Tumor Phylogeny Reconstruction Problem 140 13 4

CONSENT: Scalable long read self-correction and assembly polishing with multiple sequence alignment 25 13 4

Improving protein alignment algorithms using amino-acid hydrophobicities - Applications of TMATCH, A new algorithm 140 4

GPU accelerated partial order multiple sequencealignment for long reads self-correction 140 25 4

PHERI - Phage Host Exploration tool. 140 121 4

AERON: Transcript quantification and gene-fusion detection using long reads 25 4

Genetic Distance Calculation Based on Locality Sensitive Hashing 140 4

dagLogo: an R/Bioconductor package for identifying and visualizing differential amino acid group usage in proteomics data 140 62 4

OMAmer: tree-driven and alignment-free protein assignment to subfamilies outperforms closest sequence approaches 162 13 4

BATMAN: fast and accurate integration of single-cell RNA-Seq datasets via minimum-weight matching 147 43 13 4

Accurate and Efficient Gene Function Prediction using a Multi-Bacterial Network 140 4

Probabilistic Approach to Understand Errors in Sequencing and its based Applications 145 4

Balrog: A universal protein model for prokaryotic gene prediction 162 13 4

BrumiR: A toolkit for de novo discovery of microRNAs from sRNA-seq data. 13 4

Metalign: Efficient alignment-based metagenomic profiling via containment min hash 13 4

A Novel Encoding Algorithm for Textual Data Compression 140 4

Third generation indexing for third generation sequencing 25 4

On the automatic annotation of gene functions using observational data and phylogenetic trees 162 13 4

Storing and analyzing a genome on a blockchain 140 4

VariantStore: A Large-Scale Genomic Variant Search Index 140 4

HELLO: A hybrid variant calling approach 25 4

Annotating Gene Ontology terms for protein sequences with the Transformer model 105 26 4

Accel-Align: A Fast Sequence Mapper and Aligner based on the Seed-Embed-Extend Method 140 4

Structural classification of proteins based on the computationally efficient recurrence quantification analysis and horizontal visibility graphs 105 26 4

Minimally-overlapping words for sequence similarity search 140 4

IGD: high-performance search for large-scale genomic interval datasets 13 4

Strain-aware assembly of genomes from mixed samples using flow variation graphs 25 4

A composite method to infer drug resistance with mixed genomic data 26 4

CoCoNet: Boosting RNA contact prediction by convolutional neural networks 105 26 13 4

An improved mode of running PASTA 140 83 4

Deep Homology-Based Protein Contact-Map Prediction 105 26 4

ALPACA: a fast and accurate approach for automated landmarking of three-dimensional biological structures 140 4

Approximate k-nearest neighbors graph for single-cell Hi-C dimensional reduction with MinHash 140 4

Sequence representations and their utility for predicting protein-protein interactions 105 26 4

Structure-Based Function Prediction using Graph Convolutional Networks 105 26 4

Bipartite Tight Spectral Clustering (BiTSC) Algorithm for Identifying Conserved Gene Co-clusters in Two Species 140 4

CHEER: hierarCHical taxonomic classification for viral mEtagEnomic data via deep leaRning 162 13 4

Detection of pre-microRNA with Convolutional Neural Networks 105 26 4

Reconstruction Algorithms for DNA-Storage Systems 140 4

Scalable Classification of Organisms into a Taxonomy Using Hierarchical Supervised Learners 26 4

Reconstructing tumor evolutionary histories and clone trees in polynomial-time with SubMARine 162 13 4

Profiling variable-number tandem repeat variation across populations using repeat-pangenome graphs 25 19 9 4

Template-based prediction of protein structure with deep learning 105 26 13 4

Deep learning approaches predict non-coding RNA functions from only raw sequence data 162 105 13 4

DisCovER: distance-based covariational threading for weakly homologous proteins 162 140 105 13 4

SVJedi: Genotyping structural variations with long reads 25 13 4

Alignment-free machine learning approaches for the lethality prediction of potential novel human-adapted coronavirus using genomic nucleotide 37 26 1

A comparative study of supervised machine learning algorithms for the prediction of long-range chromatin interactions 105 26 4

Finding recurrent RNA structural networks with fast maximal common subgraphs of edge-colored graphs 140 4