CLASSIFICATION OF MYELOPROLIFERATIVE NEOPLASMS BASED ON DEEP LEARNING ALGORITHMS AND MOLECULAR GENETIC MARKERS SUPPORT DISTINCTION OF CML, PV, ET, AND PMF
Author(s): ,
Manja Meggendorfer
Affiliations:
MLL Munich Leukemia Laboratory,Munich,Germany
,
Wencke Walter
Affiliations:
MLL Munich Leukemia Laboratory,Munich,Germany
,
Claudia Haferlach
Affiliations:
MLL Munich Leukemia Laboratory,Munich,Germany
,
Wolfgang Kern
Affiliations:
MLL Munich Leukemia Laboratory,Munich,Germany
Torsten Haferlach
Affiliations:
MLL Munich Leukemia Laboratory,Munich,Germany
EHA Library. Meggendorfer M. Jun 15, 2019; 267471; S886
Manja Meggendorfer
Manja Meggendorfer
Contributions
×
Abstract
This abstract is embargoed until Saturday, June 15, 08:30 local time.

Abstract: S886

Type: Oral Presentation

Presentation during EHA24: On Saturday, June 15, 2019 from 16:00 - 16:15

Location: Elicium 1

Background
The WHO classification (2016) defines myeloproliferative neoplasms (MPN) according to cytomorphology, bone marrow biopsy, grading of fibrosis, blood counts and several molecular markers. Chronic myeloid leukemia (CML), but also the BCR-ABL1 negative MPNs polycythemia vera (PV), primary myelofibrosis (PMF) and essential thrombocythemia (ET) are separated from each other. However, overlaps, borderline findings or transition into another MPN subtype occur. Furthermore, in some cases the clinical data are insufficient or incomplete and make the diagnosis difficult.

Aims

To overcome these obstacles by using deep learning algorithms to separate and stratify the MPN entities CML, PV, PMF, and ET preferably using molecular markers only.

Methods
As a training cohort we used a set of 372 well characterized samples diagnosed with either CML, PMF, PV, or ET based on morphology and JAK2, MPL, CALR mutation status following WHO criteria: 107 CML, 107 PMF, 79 PV, and 79 ET. Whole genome sequencing was performed for all samples and the mutation status of 73 genes, recurrently mutated in myeloid malignancies, assessed. Further, in 334/372 cases cytogenetics was available and the 12 most abundant cytogenetic aberrations in MPNs were included in the analysis. In addition, in 326/372 patients blood counts in terms of leukocytosis, erythrocytosis, and thrombocytosis were available.

Results

Based on these 372 cases we started to train an algorithm to separate CML, PV, ET, and PMF from each other based on molecular markers only. We used support vector machines with class probabilities output. 500 models were built with 10-fold cross-validation to stratify the patients with an accuracy range from 27% - 100% with a median accuracy of 72% (Fig. A). All the models differed slightly in their composition of selected features and, hence, the models with the highest accuracy (top 5%) were chosen, the selected features (=genes) assessed and genes that occurred in more than 60% of the models were kept for the next round of training. With those genes, expanded by the binary encoded mutational load of MPL (<35/>35%), CALR (<35/>35%), and JAK2 (<35/>35%; <60/>60%), the algorithm was trained again, 500 models were built, and the accuracy estimated. We observed a shift of the accuracy distribution increasing not only the median accuracy to 79% but also the amount of models with an accuracy >90%. For the next round of training and selection, we used the selected features of the models with the highest accuracy (top 5%) and included the 12 most abundant cytogenetic aberrations. In this scenario, we did not observe an increase in accuracy; therefore, the final model contained only 12 molecular markers (Fig. B). Using this final model to stratify patients we achieved an accuracy of 98.3%, showing only three cases with severe clinical discordant information (Fig. C). If blood counts were considered as well the accuracy increased slightly (98.9%) with a smaller set of four contributing factors.

Conclusion
Following WHO criteria the separation of CML, PV, ET, and PMF is based on clinical, morphologic, and molecular markers. However, clinical data is often insufficient. Deep learning algorithms can overcome these limitations by supporting the decision making using blood counts and molecular markers (n=4) with a high accuracy of 99%. Basing the model only on molecular markers results in a comparable high accuracy by using only slightly more genetic markers (n=12). Therefore, application of such algorithms might assist diagnostic procedures in the near future.

Session topic: 15. Myeloproliferative neoplasms - Biology & Translational Research

Keyword(s): Diagnosis, Molecular markers

By clicking “Accept Terms & all Cookies” or by continuing to browse, you agree to the storing of third-party cookies on your device to enhance your user experience and agree to the user terms and conditions of this learning management system (LMS).

Cookie Settings
Accept Terms & all Cookies