The Prize datasets

Stage 1

Data available on DNAnexus

During Stage 1 of the Longitude Prize on ALS, successful teams will get access to harmonised multi-omics data, provided through a Trusted Research Environment, i.e. a secure, cloud-based platform that enables analysis without requiring data downloads or transfers. Supported by DNAnexus and AWS

ALS Dual Donut Charts

Whole Genome Sequence (WGS)

Participants will have access to harmonised WGS datasets containing more than 9,600 ALS cases and over 3,600 controls, sourced from leading ALS research initiatives, including Project MinE, ALS Compute, New York Genome Center (NYGC), ALS Therapy Development Institute (TDI), and Answer ALS.

Key data formats: CRAM, VCF, population VCF, PLINK2

ALS Cases

Controls

Multi-Omics

Beyond genome sequencing, participants will have access to comprehensive multi-omics data, which includes epigenomics, transcriptomics, and proteomics data. These datasets will enable participants to explore novel molecular insights into ALS mechanisms, paving the way for deeper understanding and innovative target discovery.

Key data formats:
Epigenomics: BAM, Peaks, Matrix of Consensus Peaks, Differential Peaks (DiffBind)
Transcriptomics: BAM, Counts, Matrix of Counts, Differential Genes (DESeq)
Proteomics: WIFF, mzML, Matrix of Intensities, Differential Proteins

Epigenomics

Transcriptomics

Proteomics

Clinical Data

Clinical information will accompany the omic datasets, offering critical context for the biological data. This integration will empower participants to make meaningful correlations between genetic, molecular, and disease traits.

Category Examples
Demographics Age, sex, ancestry
Disease History Age of onset, site of onset (bulbar/limb)
Progression ALSFRS-R scores, survival time
Cognitive Status ECAS scores, CBS
Genetics C9ORF72 status, family history
Treatments Medications, ventilation use

WGS data available on AnVIL

In addition to the Prize platform, participants can access the full ALS Compute collection through the AnVIL platform. The ALS Compute Project brings together whole genome sequencing (WGS) data from six major initiatives: Answer ALS, the CReATe Consortium, GTAC, the New York Genome Center ALS Consortium, the National Institutes of Health, and Project MinE USA. The current release includes harmonised WGS data from 6,952 ALS cases and 2,785 control individuals, with additional releases expected. The collection also features WGS data from individuals with related neurodegenerative conditions, including Lewy Body Dementia (LBD, 2,888 individuals) and Frontotemporal Dementia (FTD, 2,242 individuals).

Data showcase webinar – Coming soon

Data holders and leading voices in ALS will come together to showcase the datasets and tools available through the Longitude Prize on ALS. Join us for an upcoming Data Showcase webinar to explore how these resources can support your team’s discovery efforts.

Coming end of July (TBC) – stay tuned for the confirmed date and registration details.

Support for Participants

Technical Support

The Prize will offer technical training and support in collaboration with our data and technology partners. This includes guidance on navigating, analysing, and integrating datasets, tutorials and workshops on leveraging cloud-based platforms for multi-omics data analysis, direct support for troubleshooting and data interpretation.

Cloud Credits

In collaboration with AWS, the Prize will provide cloud computing credits to help offset storage and compute costs, ensuring that financial limitations do not hinder meaningful research.