Collaborative Computing-Based K-Nearest Neighbour Algorithm and Mutual Information to Classify Gene Expressions for Type 2 Diabetes

Collaborative Computing-Based K-Nearest Neighbour Algorithm and Mutual Information to Classify Gene Expressions for Type 2 Diabetes

Sura Zaki Al Rashid
Copyright: © 2022 |Pages: 12
DOI: 10.4018/IJeC.304044
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

The classification process is used in gene expression data on venous endothelial cells of umbilical cords in humans to reveal the concepts of regulation of insulin using dynamic gene expression data for two classes, namely, control and exposed to insulin. The mutual information statistical feature selection method is used on all available datasets to select these significant genes. The data reduction results are divided into training and testing, and further supplemented to the KNN classifier for diabetes classification. The results show that the mutual information in KNN reaches the highest ranked 10,000 genes and the test classification accuracy is 100%. Pathway analysis and gene ontology enrichment are used to evaluate the targeted genes. The results clearly exhibit the importance of finding the most informative genes in the database by using the statistical gene selection technique to achieve a reduction in time and cost and increase the efficiency of the classifier. This method exhibits these significant results that can be applied to other data and diseases.
Article Preview
Top

1. Introduction

Diabetes is a chronic disease that affects humans regardless of their age and its causes, many of which are genetic and related to illness, impact and cause shock symptoms such as thirst, persistent fatigue, mobility issues and sweating. Patients suffering from diabetes die because of nephropathy leading to long-lasting problems such as cardiovascular macroangiopathy because harmful effects of hyperglycemia are prolonged in tissues. In terms of the pathophysiology, the disease is a classic metabolic condition of insulin-resistance in patients of type 2 diabetes. It can lead to compensatory hyperinsulinemia, which brings about a proliferative influence in the cellular vascular wall component, increasing the risk of cardiovascular diseases (Kharroubi, 2015), (di Camillo et al., 2010). There are ways to treat this disease such as by injecting insulin and through pills or herbal aid. The disease can lead to infection and complications of the kidney, eyes, brain, and other organs. In the endothelium, the transcriptional modifications characterisation is a key stage of a well considerate the mechanism of insulin action as well as the relationship between insulin resistance and dysfunction of endothelial cells [2]–(Statnikov, Aliferis, Tsamardinos, Hardin, & Levy, 2005). Microarrays are a key tool for profiling the global gene expression patterns of tissues and cells. At present, such findings contain thousands of genes but few samples (Li, Weinberg, Darden, & Pedersen, 2001). A important challenge in biomedical studies in latest research concerns whether the data from samples can be classified and inferred into specific diseases [6]–(Babu & Sarkar, 2017).

Developing a suitable classifier and using training examples for genetic diagnosis is a problem in this area. herein this study, the challenge is to classify genes into control and exposed to insulin categories (Vanitha, Devaraj, & Venkatesulu, 2015a). Therefore, the k-nearest neighbours (KNN) approach of non-parametric pattern recognition is applied. Since the data set consists of several thousands of genes with few samples, for a specific dataset, many subsets of genes that can be classified under different sample classes may exist. Many subsets were found and the significance of genes was considered in the classification of the samples by examining the membership frequency of the genes in these near-optimal sets [5], [10]–(Bouazza, Hamdi, Zeroual, & Auhmani, 2015). While KNN is simple and clinically attractive, a large number of performance alternatives were found among groups for experienced data analysis (Sheela & Rangarajan, 2018), (Vanitha, Devaraj, & Venkatesulu, 2015b). The dimensionality reduction of the dataset variable space is an important and key pre-processing step for all the classification and clustering methods. Still, it is unknown whether increasing the specific genes’ transcription for cellular proliferation is due to insulin itself in the endothelium or not. In this work, the classifier makes decision either control or exposed.

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 7 Issues (2023)
Volume 18: 6 Issues (2022): 3 Released, 3 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing