Release note 08.28.17

SUPPORT FOR AMAZON WEB SERVICES SPOT INSTANCES

Seven Bridges has introduced support for Spot instances on the Amazon Web Services (AWS) deploy of Cavatica. Spot instance support can be selected as a default for projects and and an option for each task execution. By selecting a spot instance execution costs can be dramatically reduced. Our testing indicates an execution cost savings of over 75% on common workflows.

Due to the nature of how AWS handles Spot instances, they can be interrupted while tasks are running. If a Spot instance is interrupted, Seven Bridges’ job retry functionality will automatically restart interrupted and remaining unfinished jobs on an On-Demand instance to prevent further interruptions. Such an interruption may impact the cost savings from using a Spot instance and can result in a longer overall runtime, but the reliability of task execution is unaffected.

For more information see blog post and the feature documentation.

SBFS [BETA RELEASE]

SBFS is a command line tool which enables interaction with Cavatica project files that are mounted as a local file system.

Use SBFS to make project files available on a local file system and thus as accessible as any other locally available file. This eliminates the need for downloading complete files to a local machine, which is especially useful when working with large files exceeding the size of a local disk. With SBFS, parts of a file are accessible without necessitating a complete file download and users can perform interactive analyses on a local machine (or server instance) without needing to bring their tool to the Cavatica.

SBFS is available for Linux and macOS operating systems, and beta version is available for download from the new Data tools page.

Learn more from SBFS documentation.

DATASET IMPROVEMENTS

TARGET GRCh38 Dataset

The Therapeutically Applicable Research to Generate Effective Treatments (TARGET) dataset provides genomic, transcriptomic, and epigenomic data from patients representing several childhood cancers and serves as a valuable complement to the existing genomic and multi-omic datasets available on the Seven Bridges Cavatica via the CGC. The complete TARGET GRCh38 dataset, which includes both Open Data accessible to all researchers and Controlled Data, to which access is regulated by the Database for Genotypes and Phenotypes (dbGaP), is now available on the Cavatica. This dataset can be queried using the Data Browser to generate custom cohorts from within this dataset as well as cohorts derived from multiple similarly aligned datasets such as the TARGET GRCh38 and TCGA GRCh38 datasets.

TARGET GRCh38 dataset and its metadata.

CPTAC Dataset

Cavatica now provides access to mass spectrometry data that were generated by the Clinical Proteomic Tumor Analysis Consortium (CPTAC) as part of the TCGA initiative to characterize and quantify the proteome of cancer samples. This dataset represents 335 samples from patients with Breast Invasive Carcinoma, Colon Adenocarcinoma, Ovarian Serous Cystadenocarcinoma, and Rectum Adenocarcinoma for whom matched genomic data are available. The dataset can be queried using the Data Browser to generate custom cohorts from within this dataset as well as multi-omic cohorts across the TCGA GRCh38 genomic and CPTAC proteomic datasets.

Learn more about the CPTAC data, and CPTAC metadata.

Multiple-dataset Querying

To maximize the accessibility and value of the multi-omic datasets available on Cavatica, the Data Browser now enables cross-dataset queries for datasets with harmonized metadata. This allows researchers to use the Data Browser to identify cohorts of interest across multiple genomic datasets such as the GRCh38 alignments of TCGA and TARGET.

Learn more about cross-dataset queries through a sample query.