Introducing OMG-Octo Atypical: A New Dataset for Atypical Mitosis Classification
We're excited to announce the release of OMG-Octo Atypical, a significant expansion of our original OMG-Octo database that now incorporates atypical mitotic figures. This new resource addresses a critical need in cancer prognostication and represents our continued commitment to advancing automated mitosis detection in computational pathology.
Why Atypical Mitoses Matter
Mitotic activity quantification is fundamental to grading numerous cancer types, including breast cancer, sarcomas, neuro-endocrine tumors, and melanoma. While our previous work focused on detecting conventional mitotic figures, distinguishing between typical and atypical mitoses provides additional prognostic value that pathologists rely on for cancer grading decisions.
The Challenge We're Addressing
Manual identification of mitotic figures remains laborious and subjective, with high variability between observers. The challenge becomes even more complex when differentiating atypical from typical mitotic figures—a distinction that carries important clinical implications but requires specialized expertise and consistent criteria.
Our Approach: Data-Driven Development
Following the "Bitter Lesson" principle that emphasizes data scale over algorithmic novelty, we've focused on creating comprehensive, high-quality datasets. The OMG-Octo Atypical database builds upon our existing foundation by adding carefully annotated atypical mitotic figures to enable robust machine learning model development.
Dataset Composition
For our atypical mitosis classification work, we combined multiple data sources:
- OMG-Octo Atypical (our new in-house dataset)
- AMi-Br dataset
- MIDOG 2025 Atypical Training Set
- LUNG-MITO dataset
- GBM-TCGA dataset
Together, these resources comprise 17,664 typical mitotic figures and 7,973 atypical mitotic figures, providing the scale and diversity needed for effective model training.
Technical Implementation
Our classification approach leverages modern deep learning architectures:
Model Architecture
We evaluated multiple state-of-the-art architectures including ConvNeXt, EfficientNet variants, and UNI (a vision transformer-based foundation model for pathology). Interestingly, our best results came from a ConvNext-tiny architecture trained from scratch, suggesting that for this specific task, domain-specific training on appropriately sized models may outperform foundation model approaches.
Data Augmentation Strategy
To enhance model robustness across different tissue preparation methods and scanning protocols, we implemented:
- Random horizontal and vertical flipping
- RandAugment with carefully tuned parameters
- Histology-specific color augmentation for H&E-stained images, including stain deconvolution and concentration perturbation
Ensemble Methods
Our final predictions combine:
- Test-Time Augmentation (TTA) across multiple image transformations
- Ensemble voting across the five best-performing models
- Optimal threshold selection using Youden's J-Statistic to maximize balanced accuracy
Performance Results
On the MIDOG++ test set, our approach achieved a balanced accuracy of 0.9107 for atypical mitotic cell classification. This strong performance indicates that atypical mitotic figures possess distinguishable features that deep learning algorithms can reliably identify—a promising finding for clinical translation.
Real-World Impact
The high accuracy we've achieved suggests that automated systems can effectively assist pathologists in identifying atypical mitoses, potentially:
- Reducing inter-observer variability in cancer grading
- Accelerating diagnostic workflows
- Improving consistency in prognostic assessments
- Supporting pathologists in handling increasing caseloads
Looking Forward
The OMG-Octo Atypical database is now publicly available, continuing our commitment to open science and collaborative advancement in computational pathology. We believe that by sharing these resources, we can accelerate progress across the field and ultimately improve patient outcomes.
This work was developed as part of our submission to the MIDOG 2025 challenge, which specifically targets the critical problem of domain generalization—ensuring that algorithms work reliably across different laboratories with varying staining protocols, scanning equipment, and tissue preparation methods.
Get Involved
We encourage researchers, pathologists, and data scientists to explore the OMG-Octo Atypical database and contribute to advancing mitotic figure detection and classification. Together, we can build more robust, generalizable tools that bring real value to clinical practice.
For technical details, dataset access, and implementation code, visit our resources page or contact us at c.fekete@ucl.ac.uk
Team: Zhuoyan Shen, Maria Hawkins, Esther Bär, Konstantin Bräutigam, and Charles-Antoine Collins-Fekete

