Converging Semantic Knowledge and Deep Learning for Medical Coding

Converging Semantic Knowledge and Deep Learning for Medical Coding

Nuria Garcia-Santa, Beatriz San Miguel, Takanori Ugai
DOI: 10.4018/IJPHIM.2019070103
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

The field of medical coding enables to assign codes of medical classifications such as the international classification of diseases (ICD) to clinical notes, which are medical reports about patients' conditions written by healthcare professionals in natural language. These texts potentially include medical terms that define diagnosis, symptoms, drugs, treatments, etc., and the use of spontaneous language is challenging for automatic processing. Medical coding is usually performed manually by human medical coders becoming time-consuming and prone to errors. This research aims at developing new approaches that combine deep learning elements together with traditional technologies. A semantic-based proposal supported by a proprietary knowledge graph (KG), neural network implementations, and an ensemble model to resolve the medical coding are presented. A comparative discussion between the proposals where the advantages and disadvantages of each one is analysed. To evaluate approaches, two main corpus have been used: MIMIC-III and private de-identified clinical notes.
Article Preview
Top

Introduction

Electronic Health Records (EHRs) store different clinical texts provided by healthcare professionals like discharge summaries, doctor’s notes, laboratory and radiology reports, etc. These include descriptions in natural language about the health status of the patients, such as symptoms, conditions, diagnosis, treatments and observations, containing real and empirical data from clinicians’ experience when observing a patient. This valuable information combined with scientific literature, clinical guidelines, clinical trials, medical classifications and other theoretical documentation enable the appearance of numerous Artificial Intelligence (AI) applications, such as clinical decision support systems, patients’ data analytics, patient cohort analysis, health status monitoring devices, drug discovery tools, or automatic encoding solutions (Hood & Flores, 2012), which can take medicine a step beyond.

In clinical practice, the classification of diagnoses and procedures according to medical classification standards is crucial and widely used and exploited for reporting, diagnostic, billing, reimbursement and research purposes. Traditionally, in this process, predefined codes are assigned manually or using systems that rely upon concept-based or rule-based methods (Denecke & van Harmelen, 2018). Automatic clinical encoding is one of the research topics focused on providing professionals with optimized recommendations about the nearest conditions and diseases that a patient suffers. This supports clinicians in their daily activities and contributes to improve the quality of patients’ treatments. For this task, medical reports written in natural language are analysed automatically in order to extract valuable knowledge and link those reports directly with medical classifications such as the International Classification of Diseases (ICD)1. ICD is the international standard for reporting diseases and health conditions. In the last years the official versions used in medical institutions were versions 9th and 10th (ICD-9 and ICD-102). The version 11th (ICD-113) was released on 18 June 2018, with 55,000 codes against the 14,400 codes in ICD-10, and 31 countries involved in its development and testing.

Nowadays, many EHRs collect patients’ information without data tagged or matched to medical standards due to the difficulty for clinicians to manually annotate in an accurate way. This issue restricts and presents obstacles regarding the usability of the data in order to be exploited on several AI techniques such as supervised machine learning. Besides, medical reports are usually unstructured texts written in natural language and informal grammar which increases the difficulty of its information extraction. Medical coding approaches need to deal with these challenges to resolve the automatic linking to medical classifications.

Based on the aforementioned challenges, we have researched on the following points:

  • How important are the data quantity and data quality on medical coding problems?

  • Which type of applicability limitations these approaches can present?

  • What are the most suitable and reliable approaches for this kind of classification problems?

  • Is it realistic to talk about current solutions out of the research context? i.e. How reliable would be to deploy these solutions in real hospital environments where quality of results is crucial for a right working of the healthcare system?

In this paper we describe our medical coding system, which improves the results of current solutions in automatic classification of medical reports (e.g. discharge summaries, clinical notes, etc.) to clinical codes of medical classifications.

We introduce a semantic-based approach and an ensemble model approach that surpass state-of-the-art proposals in automatic medical coding solutions. Our approach does not need pre-annotated data in the workflow to provide classification. Moreover, we implement several neural network architectures for the medical coding, showing an exhaustive comparative discussion between these architectures and our original approaches. We include an evaluation of our solutions over two different corpus.

Complete Article List

Search this Journal:
Reset
Open Access Articles: Forthcoming
Volume 8: 1 Issue (2020)
Volume 7: 2 Issues (2019)
Volume 6: 2 Issues (2018)
Volume 5: 2 Issues (2017)
Volume 4: 2 Issues (2016)
Volume 3: 2 Issues (2015)
Volume 2: 2 Issues (2014)
Volume 1: 2 Issues (2013)
View Complete Journal Contents Listing