A Hybrid Approach to Identify Code Smell Using Machine Learning Algorithms

A Hybrid Approach to Identify Code Smell Using Machine Learning Algorithms

Archana Patnaik, Neelamdhab Padhy
Copyright: © 2021 |Pages: 15
DOI: 10.4018/IJOSSP.2021040102
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Code smell aims to identify bugs that occurred during software development. It is the task of identifying design problems. The significant causes of code smell are complexity in code, violation of programming rules, low modelling, and lack of unit-level testing by the developer. Different open source systems like JEdit, Eclipse, and ArgoUML are evaluated in this work. After collecting the data, the best features are selected using recursive feature elimination (RFE). In this paper, the authors have used different anomaly detection algorithms for efficient recognition of dirty code. The average accuracy value of k-means, GMM, autoencoder, PCA, and Bayesian networks is 98%, 94%, 96%, 89%, and 93%. The k-means clustering algorithm is the most suitable algorithm for code detection. Experimentally, the authors proved that ArgoUML project is having better performance as compared to Eclipse and JEdit projects.
Article Preview
Top

1. Introduction

The primary cause of code complexityisthe time frame, mismanagement,unclean shortcuts during the software development process, lack of testing,documentation issues, lack of understanding, communication issues, lack of teamwork, monitoring issues,workloadand late refactoring. Lack of cooperation and coordination often cause these problems. Project transition even harmed the whole project due to nasty coding. Code smell refers to the deeper issue inside a program's source code. These problems occurred because code smell may not affect the result, but it still harms the source code's performance. The absolute violation of basics in developingsoftware results decreases code quality by increasing the technical debt to identify code smells automatically. Wekanose is a tool used to determine the code smell from any coding using weka software. Other code detection tools are PMD, iplasma, Jdeodrant, Decoder, Checkstyle, etc.

Figure 1.

Dirty Code

IJOSSP.2021040102.f01

Figure 1, illustrates the dirty code with data clump code complexity where groups of variables are combined to form objects at the class level. It increases the execution time of the program by allocating data values to the variables. In the above Figure datamembers like ccno,expmonth,expyear and amt consists of some random data values, which further leads to code complexity. It can be avoided by deleting the assigned values.

Feature selection is the automatic or manual selection of relevant features from the massive amount of data used to constructthe model. It is used to improve the accuracy of a model by reducing its complexity. It is a process of selecting a set of best features in the form of a subset before implementing any generalized algorithms.Various parameters involved for feature selectionare correlation, entropy, mutual information. Different types of feature selection methods are Recursive Feature Elimination,Chi-squared test, feature evaluation, etc.Machine learninginvolves a machine to learn from data by predicting things being programmed automatically.We have used different supervised, unsupervised and anomaly detection algorithms to identify the smelly data from the realtime datasets. In our research, the prime focus is on code smell detection using the identification of outliers. We have used different unsupervised anomaly detection methods like PCA, GMM, autoencoder, K-means clustering, and Bayesian network to identify outliers in the dirty code. We have also focused on the performance of the system by comparing its accuracy.Software quality is defined as the robustness or fitness of a software product's quality. It is analyzed by the following parameters reusability,correctness,portability and maintainability. Software quality assurance produces high-quality software by saving time and cost. Code smell affects the source code by violating the good program designing principles having a negative impact on the software quality. The primary solution to this problem is to develop the refactored code. Refactoring is used to change the internal structure of code without altering its external functionalities.Different types of techniques are replacing parameter, inline method, extract class etc.

  • RQ1: Which type of feature selection method is used for analyzing the open-source projects?

In this work, feature selection methods reduce complexity and increase the proposed model's efficiency. Recursive Feature Elimination(RFE) is used for selecting the relevant data by removing the weakest features of the dataset.

  • RQ2: Which type of anomaly detection is more preferable for analyzing the concept of code smell?

This work illustrated the anomaly detection technique for identifying outliers by comparing the dirty code with clean code. We have used five different algorithms to identify the extreme code point that slightly deviated from the original data samples. Cluster-based anomaly detection methods give the best results for code smell detection.

  • RQ3: What are the most commonly found code smell and suitable refactoring approach for developing clean code?

Complete Article List

Search this Journal:
Reset
Volume 15: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 14: 1 Issue (2023)
Volume 13: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 12: 4 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 1 Issue (2015)
Volume 5: 3 Issues (2014)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing