High-dimensional data sets

Xem 1-20 trên 29 kết quả High-dimensional data sets

Prognosis of lasso-like penalized Cox models with tumor profiling improves prediction over clinical data alone and benefits from bi-dimensional pre-screening

Prediction of patient survival from tumor molecular ‘-omics’ data is a key step toward personalized medicine. Cox models performed on RNA profiling datasets are popular for clinical outcome predictions. But these models are applied in the context of “high dimension”, as the number p of covariates (gene expressions) greatly exceeds the number n of patients and e of events.

16p vialfrednobel 23-12-2023 3 3 Download

Evaluating shallow and deep learning strategies for the 2018 n2c2 shared task on clinical text classification

Automated clinical phenotyping is challenging because word-based features quickly turn it into a high-dimensional problem, in which the small, privacy-restricted, training datasets might lead to overfitting. Pretrained embeddings might solve this issue by reusing input representation schemes trained on a larger dataset.

8p visteverogers 24-06-2023 6 2 Download

Lecture Data mining: Lesson 20

Lecture Data mining: Lesson 20. The main topics covered in this chapter include: dimensionality reduction; high-dimensional datasets; multi-dimensional scaling; pseudo-projections; Monte-Carlo algorithm;... Please refer to the content of document.

22p tieuvulinhhoa 22-09-2022 7 3 Download
A comparative analysis of filter-based fea-ture selection methods for software fault pre-diction

The rapid growth of data has become a huge challenge for software systems. The quality of fault prediction model depends on the quality of software dataset. High-dimensional data is the major problem that affects the performance of the fault prediction models. In order to deal with dimensionality problem, feature selection is proposed by various researchers.

7p viplato 05-04-2022 17 1 Download
MIA-Sig: Multiplex chromatin interaction analysis by signal processing and statistical algorithms

The single-molecule multiplex chromatin interaction data are generated by emerging 3D genome mapping technologies such as GAM, SPRITE, and ChIA-Drop. These datasets provide insights into high-dimensional chromatin organization, yet introduce new computational challenges.

13p vielonmusk 30-01-2022 14 0 Download
EnrichedHeatmap: An R/Bioconductor package for comprehensive visualization of genomic signal associations

High-throughput sequencing data are dramatically increasing in volume. Thus, there is urgent need for efficient tools to perform fast and integrative analysis of multiple data types. Enriched heatmap is a specific form of heatmap that visualizes how genomic signals are enriched over specific target regions. It is commonly used and efficient at revealing enrichment patterns especially for high dimensional genomic and epigenomic datasets.

7p vibeauty 23-10-2021 12 0 Download
A general index for linear and nonlinear correlations for high dimensional genomic data

With the advance of high throughput sequencing, high-dimensional data are generated. Detecting dependence/correlation between these datasets is becoming one of most important issues in multi-dimensional data integration and co-expression network construction.

14p vijeeni2711 30-06-2021 13 1 Download
A balanced iterative random forest for gene selection from microarray data

The wealth of gene expression values being generated by high throughput microarray technologies leads to complex high dimensional datasets. Moreover, many cohorts have the problem of imbalanced classes where the number of patients belonging to each class is not the same.

10p viwyoming2711 16-12-2020 15 1 Download
A multivariate approach to the integration of multi-omics datasets

To leverage the potential of multi-omics studies, exploratory data analysis methods that provide systematic integration and comparison of multiple layers of omics information are required. We describe multiple co-inertia analysis (MCIA), an exploratory data analysis method that identifies co-relationships between multiple high dimensional datasets.

13p vikentucky2711 26-11-2020 17 3 Download
A framework for generalized subspace pattern mining in high-dimensional datasets

A generalized notion of biclustering involves the identification of patterns across subspaces within a data matrix. This approach is particularly well-suited to analysis of heterogeneous molecular biology datasets, such as those collected from populations of cancer patients.

14p vikentucky2711 26-11-2020 10 2 Download
NMF-mGPU: Non-negative matrix factorization on multi-GPU systems

In the last few years, the Non-negative Matrix Factorization (NMF) technique has gained a great interest among the Bioinformatics community, since it is able to extract interpretable parts from high-dimensional datasets.

12p vikentucky2711 26-11-2020 19 0 Download
Multiobjective triclustering of time-series transcriptome data reveals key genes of biological processes

Exploratory analysis of multi-dimensional high-throughput datasets, such as microarray gene expression time series, may be instrumental in understanding the genetic programs underlying numerous biological processes.

19p vikentucky2711 24-11-2020 13 1 Download
Variable selection for binary classification using error rate p-values applied to metabolomics data

Metabolomics datasets are often high-dimensional though only a limited number of variables are expected to be informative given a specific research question. The important task of selecting informative variables can therefore become complex. In this paper we look at discriminating between two groups.

12p vioklahoma2711 19-11-2020 12 4 Download
Combining location-and-scale batch effect adjustment with data cleaning by latent factor adjustment

In the context of high-throughput molecular data analysis it is common that the observations included in a dataset form distinct groups; for example, measured at different times, under different conditions or even in different labs. These groups are generally denoted as batches.

19p vioklahoma2711 19-11-2020 11 2 Download
Bagging survival tree procedure for variable selection and prediction in the presence of nonsusceptible patients

For clinical genomic studies with high-dimensional datasets, tree-based ensemble methods offer a powerful solution for variable selection and prediction taking into account the complex interrelationships between explanatory variables.

21p vioklahoma2711 19-11-2020 13 1 Download
Spectral consensus strategy for accurate reconstruction of large biological networks

The last decades witnessed an explosion of large-scale biological datasets whose analyses require the continuous development of innovative algorithms. Many of these high-dimensional datasets are related to large biological networks with few or no experimentally proven interactions.

13p vioklahoma2711 19-11-2020 10 1 Download
Distribution based nearest neighbor imputation for truncated high dimensional data with applications to pre-clinical and clinical metabolomics studies

High throughput metabolomics makes it possible to measure the relative abundances of numerous metabolites in biological samples, which is useful to many areas of biomedical research. However, missing values (MVs) in metabolomics datasets are common and can arise due to both technical and biological reasons.

13p vioklahoma2711 19-11-2020 8 0 Download
Bayesian Unidimensional Scaling for visualizing uncertainty in high dimensional datasets with latent ordering of observations

Detecting patterns in high-dimensional multivariate datasets is non-trivial. Clustering and dimensionality reduction techniques often help in discerning inherent structures. In biological datasets such as microbial community composition or gene expression data, observations can be generated from a continuous process, often unknown.

15p viflorida2711 30-10-2020 9 1 Download
Priority-Lasso: A simple hierarchical approach to the prediction of clinical outcome using multi-omics data

The inclusion of high-dimensional omics data in prediction models has become a well-studied topic in the last decades. Although most of these methods do not account for possibly different types of variables in the set of covariates available in the same dataset, there are many such scenarios where the variables can be structured in blocks of different types, e.g., clinical, transcriptomic, and methylation data.

14p viconnecticut2711 28-10-2020 16 1 Download
MS-Helios: A Circos wrapper to visualize multi-omic datasets

Advances in high-resolution mass spectrometry facilitate the identification of hundreds of metabolites, thousands of proteins and their post-translational modifications. This remarkable progress poses a challenge to data analysis and visualization, requiring methods to reduce dimensionality and represent the data in a compact way.

4p vicoachella2711 27-10-2020 14 0 Download