TCGA on AWS
The Cancer Genome Atlas (TCGA) is a joint effort of the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI) to accelerate our understanding of the molecular basis of cancer. TCGA-funded researchers across the United States have produced a corpus of raw and processed genomic, transcriptomic, and epigenomic data from thousands of cancer patients.
These data are now freely available on AWS via the National Cancer Institute Cancer Genomics Cloud pilot to credentialed researchers subject to NIH data sharing policies. As the NIH Trusted Partner for this project, Seven Bridges Genomics is responsible for authorizing access to the data.
The Cancer Genome Atlas is one of the world’s largest collections of cancer genome data available. Making the data available on a cloud platform greatly lowers the barrier to entry for researchers that are seeking to work with these data to create better models of disease, and ultimately develop new treatments for cancer. Qualified researchers can use the data on-demand without worrying about download time or storage costs.
For more information, please visit http://www.cancergenomicscloud.org/. If you have any questions, please email firstname.lastname@example.org.
Accessing the Data
While the data are hosted
For more information on gaining accessing to these data, visit: http://www.cancergenomicscloud.org/controlled-access-data or http://docs.cancergenomicscloud.org/.
Tools and Tutorials
The Cancer Genomics Cloud provides visual and programmatic methods of querying, analyzing, and securely collaborating with TCGA data. A semantic
Hundreds of Common Workflow Language-compliant tools and workflows are available, enabling you to immediately run the most common cancer genomics analyses. Additionally, a software development kit allows you to easily deploy your own tools in a reproducible and portable manner.
Tutorials (coming soon):
- Visually querying and accessing TCGA data.
- Programmatically querying and accessing TCGA data.
- Building and executing a computational workflow
The Cancer Genome Atlas, a joint effort of the National Cancer Institute (NCI) and the National Human Genome Research Institute
Genomics, Life Sciences
Typical genomics data formats are used throughout. These vary based on the type of analysis performed and include everything from raw files to delimited summarizations and metadata
Data use is subject to the access and publication
Amazon S3 in the US East region (N. Virginia)
The Cancer Genomics Cloud Pilot, operated by Seven Bridges Genomics, has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, Department of Health and Human Services, under Contract No. HHSN261201400008C.
If you are interested in using the TCGA data or learning more about this project, please fill out the form below.