Enhancing the Software Clone Detection in BigCloneBench: A Neural Network Approach

Enhancing the Software Clone Detection in BigCloneBench: A Neural Network Approach

Amandeep Kaur, Munish Saini
Copyright: © 2021 |Pages: 15
DOI: 10.4018/IJOSSP.2021070102
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

In the software system, the code snippets that are copied and pasted in the same software or another software result in cloning. The basic cause of cloning is either a programmer‘s constraint or language constraints. An increase in the maintenance cost of software is the major drawback of code clones. So, clone detection techniques are required to remove or refactor the code clone. Recent studies exhibit the abstract syntax tree (AST) captures the structural information of source code appropriately. Many researchers used tree-based convolution for identifying the clone, but this technique has certain drawbacks. Therefore, in this paper, the authors propose an approach that finds the semantic clone through square-based convolution by taking abstract syntax representation of source code. Experimental results show the effectiveness of the approach to the popular BigCloneBench benchmark.
Article Preview
Top

Introduction

Code snippets that are copied and paste in the software code with or without change result in a code clone. Different authors provide different definitions of code clones. (Roy and Cordy, 2007; Koschke, 2007) presents a comprehensive review of the clone found in the software. Various studies provide information about the percentage of duplication in the source code. Commonly, there is a 10-20 percent cloning present in the software code (Baker, 1995; Baxter et al., 1998; Mayrand et al., 1996). And in the rare condition, it extends to 25-60 percent Ducasse et al. (1999). Due to advancements in technology, companies spend more cost on the maintenance of software.

For the maintenance of software, a code clone is detected and should be removed or refactored according to the requirements. Because if code fragments that containing the bug are copied or duplicated then each of its clones will also contain that bug. It will harden the task of developers or testers to discover the bug in the large software that contains thousands or millions of LOC. Primarily, four types of clones are found in the software as shown in Figure 1.

  • Type 1: Code snippet similar to other snippets with the only changes in whitespaces, comments lead to this clone. Type-1 can be detected easily with text-based and token-based techniques. The exact clone is another name for the type 1 clone.

  • Type 2: Type 2 clone occurs due to alteration in name of the identifier, literals, and variables, keywords. They can be detected with the token and metric-based technique. It is also mentioned as the parameterized clone.

  • Type 3: Results from addition, deletion of lines of codes. They can be detected with a tree-based and graph-based approach. It can further categorize into Strong and moderately strong type 3 clones. It refers to the near-miss clone.

  • Type 4: A Graph-based and hybrid approach can be used to detect a type 4 clone. This clone occurs when two code snippets have similar functionality but the difference in their structure. It also names the semantic clone or function clone.

Large numbers of techniques or methods are developed for the detection of clones using traditional approaches which broadly include text-based, token-based, metric-based, tree-based, and graph-based. These techniques primarily detect Type-1, Type-2, and Type-3 clones. Very few approaches detect Type-4 clones.

Complete Article List

Search this Journal:
Reset
Volume 15: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 14: 1 Issue (2023)
Volume 13: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 12: 4 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 1 Issue (2015)
Volume 5: 3 Issues (2014)
Volume 4: 4 Issues (2012)
Volume 3: 4 Issues (2011)
Volume 2: 4 Issues (2010)
Volume 1: 4 Issues (2009)
View Complete Journal Contents Listing