Presented at TELEMEDICON (National) on 4-Nov-2023

Presented at TELEMEDICON (National) on 4-Nov-2023

Presented at TELEMEDICON (National)

on 4-Nov-2023

Demystify: Al-Powered Diagnostic

Reports Digitisation Engine

Demystify: Al-Powered Diagnostic Reports Digitisation Engine

INTRODUCTION

INTRODUCTION

Diagnostic reports play a crucial role in monitoring patient health, treatment planning, and diagnosis.

Diagnostic reports play a crucial role in monitoring patient health, treatment planning, and diagnosis.

These, often containing sensitive Personally Identifiable Information (PII), are created and stored in diverse formats, including images and PDFs with scanned and embedded text, necessitating diligent masking measures for data privacy.

These, often containing sensitive Personally Identifiable Information (PII), are created and stored in diverse formats, including images and PDFs with scanned and embedded text, necessitating diligent masking measures for data privacy.

While this improves portability, it adversely affects access and use of the data.

While this improves portability, it adversely affects access and use of the data.

In order to provide user-friendly ways of accessing this information, the data contained in them needs to be digitized.

In order to provide user-friendly ways of accessing this information, the data contained in them needs to be digitized.

To address this issue, we have developed a lab report digitization engine that incorporates computer vision techniques such as Optical Character Recognition (OCR) and Natural Language Processing techniques such as Named Entity Recognition (NER) in order to extract standardized value, unit and reference range corresponding to subtests present in the lab reports.

To address this issue, we have developed a lab report digitization engine that incorporates computer vision techniques such as Optical Character Recognition (OCR) and Natural Language Processing techniques such as Named Entity Recognition (NER) in order to extract standardized value, unit and reference range corresponding to subtests present in the lab reports.

OBJECTIVE

OBJECTIVE

Creating a diagnostic report digitization engine that can extract semantically and medically accurate information from scanned diagnostic reports.

Creating a diagnostic report digitization engine that can extract semantically and medically accurate information from scanned diagnostic reports.

METHODOLOGY

METHODOLOGY

Each Pll masked document is pre-processed to determine if it is scanned or has embedded text.

Each Pll masked document is pre-processed to determine if it is scanned or has embedded text.

If embedded text is present, it is extracted using standard libraries, while scanned documents undergo Optical Character Recognition (OCR).

If embedded text is present, it is extracted using standard libraries, while scanned documents undergo Optical Character Recognition (OCR).

Subsequently, a custom Named Entity Recognition (NER) algorithm is applied to parsed text to identify and extract critical information such as components, methods, values, units, and reference ranges.

Subsequently, a custom Named Entity Recognition (NER) algorithm is applied to parsed text to identify and extract critical information such as components, methods, values, units, and reference ranges.

The NER model is based on a dataset compiled from 87,575 reports gathered from 652 lab partners across India, ensuring adaptability to diverse representations of entities.

The NER model is based on a dataset compiled from 87,575 reports gathered from 652 lab partners across India, ensuring adaptability to diverse representations of entities.

Continuous improvement is implemented through a dashboard where trained annotators correct the engine's outputs based on their expertise. Additionally, a daily verification process is implemented due to the sensitive nature of the operation.

Continuous improvement is implemented through a dashboard where trained annotators correct the engine's outputs based on their expertise. Additionally, a daily verification process is implemented due to the sensitive nature of the operation.

RESULTS

RESULTS

To test the capability of our system, we evaluated 800 diagnostic reports that were not previously included in the training phase and represented a random sample set.

To test the capability of our system, we evaluated 800 diagnostic reports that were not previously included in the training phase and represented a random sample set.

We measured true positives, false positives, true negatives and false positives across 28,992 component rows.

We measured true positives, false positives, true negatives and false positives across 28,992 component rows.

We found that the engine had an accuracy of 94.69%, precision was 0.9967.

We found that the engine had an accuracy of 94.69%, precision was 0.9967.

We also found the recall to be 0.9461 and the F1 score to be 0.9707.

We also found the recall to be 0.9461 and the F1 score to be 0.9707.

Performance Metrics

Performance Metrics

Confusion Matrix

Confusion Matrix

CONCLUSION

CONCLUSION

Similar studies on extracting information from lab reports in other countries have been conducted previously, and various Indian healthcare companies have also ventured into this area.

Similar studies on extracting information from lab reports in other countries have been conducted previously, and various Indian healthcare companies have also ventured into this area.

However, to the best of our knowledge, this study presents the first attempt to introduce a method for digitizing lab reports at this scale of data in India, as our dataset stands to be the largest of its kind in the country.

However, to the best of our knowledge, this study presents the first attempt to introduce a method for digitizing lab reports at this scale of data in India, as our dataset stands to be the largest of its kind in the country.

The strength of the study is in the dataset curation, which included diverse reports from multiple sources, locations, and diverse diverse patient profiles.

The strength of the study is in the dataset curation, which included diverse reports from multiple sources, locations, and diverse diverse patient profiles.

While this approach is useful for digitizing past records, unless there is widespread uptake of interoperability and adherence to reporting standards. highly accurate and reliable reporting will remain a challenge.

While this approach is useful for digitizing past records, unless there is widespread uptake of interoperability and adherence to reporting standards. highly accurate and reliable reporting will remain a challenge.

Though we hope learning codes can keep up with these challenges through access to varied reports spanning a spectrum of formats, with our datasets tracking the fast-changing reporting specifications from developments to technology.

Though we hope learning codes can keep up with these challenges through access to varied reports spanning a spectrum of formats, with our datasets tracking the fast-changing reporting specifications from developments to technology.

REFERENCES

REFERENCES

Kang YS, Kayaalp M. Extracting laboratory test information from biomedical text. J Pathol Inform. 2013;4:23.

Kang YS, Kayaalp M. Extracting laboratory test information from biomedical text. J Pathol Inform. 2013;4:23.

Hao T, Liu H, Weng C. Valx: A system for extracting and structuring numeric lab test comparison statements from text. 2017.

Hao T, Liu H, Weng C. Valx: A system for extracting and structuring numeric lab test comparison statements from text. 2017.

EXPLORE PUBLICATIONS

EXPLORE PUBLICATIONS

Contact

or

Please feel free to drop a mail for collaboration - research@medibuddy.in

© 2024 Phasorz Technologies Pvt.Ltd

Contact

or

Please feel free to drop a mail for collaboration - research@medibuddy.in

© 2024 Phasorz Technologies Pvt.Ltd

© 2024 Phasorz Technologies Pvt.Ltd

Contact

or

Please feel free to drop a mail for collaboration - research@medibuddy.in

Contact

or

Please feel free to drop a mail for collaboration - research@medibuddy.in

© 2024 Phasorz Technologies Pvt.Ltd