• >
  • >
  • >
  • >
  • >
Snorkel AI > Case Studies > Scaling Clinical Trial Screening at MSKCC with Snorkel Flow

Scaling Clinical Trial Screening at MSKCC with Snorkel Flow

Snorkel AI Logo
Technology Category
  • Sensors - Flow Meters
  • Sensors - Liquid Detection Sensors
Applicable Industries
  • Cement
  • Education
Applicable Functions
  • Product Research & Development
  • Quality Assurance
Use Cases
  • Tamper Detection
  • Virtual Training
Services
  • Testing & Certification
  • Training
About The Customer
Memorial Sloan Kettering Cancer Center (MSKCC) is the world’s oldest and largest private cancer center. It provides care to increase the quality of life of more than 150,000 cancer patients annually. In service of this, they use AI to speed the discovery of more effective strategies to prevent, control and ultimately cure cancer in the future. The data science team at MSKCC was tasked with the challenge of using AI/ML to classify patient records based on the presence of HER-2, a protein common to many cancers.
The Challenge
Memorial Sloan Kettering Cancer Center (MSKCC), the world’s oldest and largest cancer center, was faced with the challenge of identifying patients as candidates for clinical trial studies by classifying the presence of a relevant protein, HER-2. The process of reviewing patient records for HER-2 was laborious and time-consuming as it required clinicians and researchers to sift through complex, variable patient data. The data science team at MSKCC wanted to use AI/ML to classify patient records based on the presence of HER-2, but the lack of labeled training data was a significant bottleneck. Labeling data, especially complex patient records, required clinician and researcher expertise and was prohibitively slow and expensive. Even when experts were able to manually annotate training data, their labels were at times inconsistent, limiting model performance potential.
The Solution
MSKCC used Snorkel Flow to build an AI application to classify patient records across five classes categorizing the presence of HER-2. This application was used for a downstream clinical trial screening system to identify potential clinical trial participants. The team used 3,200 data points they’d labeled previously outside of the platform. They ingested the data and split it across training, validation, and test sets. The lead Bioinformatics Engineer developing the project wrote just eight noisy, imperfect labeling functions which Snorkel Flow combined to auto-label a training dataset. They used this to train an XGboost model within the platform. Using error analysis tools within the platform, the team used feedback from this model to learn where it was confused and how to correct. After a few rapid iterations, the team achieved an overall accuracy of 93% and an average F1 of 87% across all classes.
Operational Impact
  • The document classification AI application built by the team is now used downstream to power a clinical trial screening system. This system allows MSKCC to identify HER-2 among patient records without relying on human experts to review each record. The use of Snorkel Flow has significantly reduced the time to label complex, domain-specific text documents as training data by labeling programmatically. It has also increased explainability by encoding the labeling rationale for each training data point as labeling functions that can be inspected like code. The team was able to use model-guided error analysis to identify data quality issues and iterate rapidly to improve.
Quantitative Benefit
  • Achieved an overall accuracy of 93% and an average F1 of 87% across all classes
  • Auto-labeled thousands of patient records
  • Reduced time to build a document classification from months to weeks

Case Study missing?

Start adding your own!

Register with your work email and create a new case study profile for your business.

Add New Record

Related Case Studies.

Contact us

Let's talk!
* Required
* Required
* Required
* Invalid email address
By submitting this form, you agree that AGP may contact you with insights and marketing messaging.
No thanks, I don't want to receive any marketing emails from AGP.
Submit

Thank you for your message!
We will contact you soon.