Trang chủ » Luận Văn - Báo Cáo » Báo cáo - Thuyết trình

42 trang

102 lượt xem

Báo cáo y học: " Integrating diverse genomic data using gene sets"

Tuyển tập các báo cáo nghiên cứu về y học được đăng trên tạp chí y học Wertheim cung cấp cho các bạn kiến thức về ngành y đề tài: Integrating diverse genomic data using gene sets...

Chủ đề:

thulanh22

Báo cáo tiến độ

Báo cáo tiến độ luận văn

This Provisional PDF corresponds to the article as it appeared upon acceptance. Copyedited and

fully formatted PDF and full text (HTML) versions will be made available soon.

Integrating diverse genomic data using gene sets

Genome Biology 2011, 12:R105 doi:10.1186/gb-2011-12-10-r105

Svitlana Tyekucheva (svitlana@jimmy.harvard.edu)

Luigi Marchionni (marchion@jhu.edu)

Rachel Karchin (karchin@jhu.edu)

Giovanni Parmigiani (gp@jimmy.harvard.edu)

ISSN 1465-6906

Article type Method

Submission date 6 May 2011

Acceptance date 21 October 2011

Publication date 21 October 2011

Article URL http://genomebiology.com/2011/12/10/R105

This peer-reviewed article was published immediately upon acceptance. It can be downloaded,

printed and distributed freely for any purposes (see copyright notice below).

Articles in Genome Biology are listed in PubMed and archived at PubMed Central.

For information about publishing your research in Genome Biology go to

http://genomebiology.com/authors/instructions/

Genome Biology

This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0),

which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Integrating diverse genomic data using gene sets

Svitlana Tyekucheva1,2, Luigi Marchionni3, Rachel Karchin4, and Giovanni

Parmigiani1,2,#

1Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 450 Brookline

Avenue, Boston, MA, 02115, USA

2Department of Biostatistics, Harvard School of Public Health, 677 Huntington Avenue, Boston, MA,

02115, USA

3Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University,

1550 Orleans Street, Baltimore, MD, 21231, USA

4Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University,

3400 N. Charles Street, Baltimore, MD, 21218, USA

# corresponding author: gp@jimmy.harvard.edu

Abstract

We introduce and evaluate data analysis methods to interpret simultaneous

measurement of multiple genomic features made on the same biological samples. Our

tools use gene sets to provide an interpretable common scale for diverse genomic

information. We show we can detect genetic effects, although they may act through

different mechanisms on different samples, and show we can discover and validate

important disease-related gene sets that would not be discovered by analyzing each

data type individually.

Background

The increasing affordability of high throughput genome-wide assays is enabling the

simultaneous measurement of several genomic features on the same biological samples.

Cancer genome projects have been at the forefront of this trend, and have faced the

challenge of integrating these diverse data types[1, 2] including RNA transcriptional levels,

genotype variation, DNA copy number variation, and epigenetic marks. Annotated

collections of gene sets, capturing established knowledge about biological processes and

pathways, have proven an essential tool for integration. Examples of these sets include

chromosomal locations, signaling and metabolic pathways, transcriptional programs, and

targets of specific transcription factors. Because one can make inferences about the

importance of a given gene set using several different genomic data types, gene set

analysis provides a direct and biologically motivated approach to analyzing these data

types in an integrated way. A widely used public collection of gene sets is the Molecular

Signatures Database (MSigDb[3]). A comprehensive list of conventional tools for gene sets

analysis for a single data type is in Ackermann et al[4]. Many of these approaches are

implemented in the extensively used statistical computing environment R/Bioconductor[5].

The gene set perspective makes sense both biologically and statistically. First, small

differences in the function of multiple genes in the same set may not be detectable at the

single gene level, but can add to create larger differences at the gene set level. This

increases the power for detecting real biological differences. Second, a single hit on a given

pathway may be sufficient to generate a phenotypic difference. If this hit can occur in any of

several components in the pathway, individuals with the same phenotype may show

variability in the specific genes that are hit, but show a more consistent pattern at the

pathway or gene set level[1, 6]. Importantly, even when a difference at the single gene level

can be detected, its biological importance may depend on the states of other interacting

genes and gene products.

Cancer genomes contain point mutations, insertions, deletions, translocations,

methylation abnormalities, copy-number and expression changes not seen in normal

tissues. In some cancers, such as glioblastoma multiforme (GBM), pathways involving the

TP53, PI3K, and RB1 genes, are found to be altered in different genes in different patients,

and, importantly, via different alteration mechanisms[1] such as point mutations and copy

number changes. Therefore, taking into account multiple data types should improve our

ability to detect gene sets associated with a phenotype.

In recent large-scale cancer genome studies[1, 6, 7] preliminary integration approaches

have been successfully applied. However, these approaches are tailored to the specific

context. A general, scalable, and rigorous statistical framework has not yet been

developed. In this article, our goal is to fill this gap. To this end, we introduce, compare, and

systematically evaluate two alternative set-based data integration approaches. The first

approach is based on computing model-based gene-to-phenotype association scores for

each gene using all data types together, followed by gene sets analysis of these scores.

We term this the integrative approach. The second is to perform separate conventional

gene set analyses for each data type, and then derive a consensus significance score

using a meta-analytic approach.

Results

Overview

We present both novel data analyses and controlled simulations. First, we jointly

examine gene expression and copy number variation data about glioblastoma multiforme

tumors, from The Cancer Genome Atlas (TCGA[2]), and detect differences in the Wnt,

glycolysis and stress pathways that appear relevant to differences between short- and long-

term survivors. We also validate these findings using independent samples from the NCI

REpository for Molecular BRAin Neoplasia DaTa (Rembrandt[8]). To provide a rigorous

counterpart to these results we perform extensive simulations. These show that the

integrative approach does enable the discovery of disease-related gene sets that would not

be discovered when each data type is analyzed using current approaches individually.

Discoveries remain reliable also when several features are highly noisy.

Tài liệu liên quan

Giải pháp truyền thông thu hút vốn đầu tư vào Khu công nghiệp Hoà Phú tỉnh Đắk Lắk: Tóm tắt luận văn Thạc sĩ Quản trị kinh doanh

Tóm tắt luận văn Thạc sĩ Quản trị kinh doanh: Xây dựng giải pháp truyền thông thu hút vốn đầu tư vào Khu công nghiệp Hoà Phú tỉnh Đắk Lắk

Báo cáo thực tập: Hỗ trợ ngôn ngữ tiếng Anh tại AM English center chi nhánh Hồ Chí Minh

Internship report: English language supporters at AM English center Ho Chi Minh branch

Chụp cắt lớp vi tính tim: Báo cáo hình thái tim trong bệnh lý tim bẩm sinh

Báo cáo: Chụp cắt lớp vi tính hình thái tim trong bệnh lý tim bẩm sinh

PET/CT trong ung thư phổi: Báo cáo [Năm]

Báo cáo: PET/CT trong ung thư phổi

Tổng quan bệnh phổi mô kẽ: Báo cáo khoa học chi tiết

Báo cáo khoa học: Tổng quan về Bệnh phổi mô kẽ

Phát triển văn hóa đọc cho học sinh Trường THCS Quốc Khánh, Tràng Định, Lạng Sơn: Đề tài nghiên cứu khoa học

Đề tài nghiên cứu khoa học: Phát triển văn hóa đọc cho học sinh Trường THCS Quốc Khánh huyện Tràng Định, tỉnh Lạng Sơn

Ứng dụng công nghệ thông tin trong quản lý văn bản và lập hồ sơ: Đề tài nghiên cứu khoa học tại UBND phường Ngọc Lâm, quận Long Biên, Hà Nội

Đề tài nghiên cứu khoa học: Ứng dụng công nghệ thông tin trong công tác quản lý văn bản và lập hồ sơ tại UBND phường Ngọc Lâm, quận Long Biên, Hà Nội

Hiện đại hóa văn phòng Tập đoàn Sơn Hà: Đề tài nghiên cứu khoa học

Đề tài nghiên cứu khoa học: Hiện đại hóa văn phòng tại Tập đoàn Sơn Hà

Nghiên cứu bảo tồn giống hoa Mai vàng Yên Tử: Báo cáo chuyên đề của TS. Đặng Văn Đông

Báo cáo chuyên đề: Nghiên cứu bảo tồn lưu giữ và phát triển giống hoa Mai vàng Yên Tử - TS. Đặng Văn Đông

Đề tài nghiên cứu khoa học: Thiết kế syllabus viết dựa trên nhiệm vụ, tích hợp kỹ năng làm việc cho sinh viên Hanoi University of Commerce

Đề tài nghiên cứu khoa học cấp trường: Designing a task-based writing syllabus integrated working skills for the students at Hanoi University of Commerce

Báo cáo y học: " Integrating diverse genomic data using gene sets"

Tuyển tập các báo cáo nghiên cứu về y học được đăng trên tạp chí y học Wertheim cung cấp cho các bạn kiến thức về ngành y đề tài: Integrating diverse genomic data using gene sets...

Chủ đề:

Tài liệu liên quan

Tài liêu mới

AI tóm tắt

Giới thiệu tài liệu

Đối tượng sử dụng

Từ khoá chính

Nội dung tóm tắt

Hỗ trợ

Phương thức thanh toán

Theo dõi chúng tôi