Sentence Extraction Using Machine Learning Algorithms for Bug Reports: Sent Extraction Bug Reports

Sentence Extraction Using Machine Learning Algorithms for Bug Reports: Sent Extraction Bug Reports

Som Gupta, Sanjai Kumar Gupta
Copyright: © 2022 |Pages: 16
DOI: 10.4018/IJSSOE.300784
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Automatic Summarization is one of the very important tasks that are performed to improve the searching experience in the internet world. Software Repositories are one of the greatest sources of information for the software development community as it contains varied information like the team behavior, intentions, emotions, the bugs, the project style, project management information, etc. The paper is an extension to the previous work where we have used just the feature-based technique to generate the summary for the Bug Reports. Here in this paper, we have used machine-learning approaches along with the Features to find out how the results vary. For the machine learning approaches, as there are many approaches which are available, we use the very popular approaches KNN, CART, NB and SVM for the observation. We observed that when the machine learning approaches are integrated with the feature-based approach, the results improve.
Article Preview
Top

Introduction

The exponential emergence of information in the World Wide Web platform has led to the rise of need of automatic summarization in order to improve the searching time by the users. Condensing the information without losing the essence of information is the summarization process and the automatic summarization helps reduce the time to find the relevant information quickly. The availability and the exponential increase of unstructured information in the databases have increased the necessity of reliable and good summarization systems. Summarization process involves the selection of important sentences and arranging them in the order. For identifying the important sentences, the system needs to understand the document. Automatic Summarization is not a new topic of research. The first systematic paper can be found online from year 1958 with the work of (Luhn et al. 1958) and then lot of research continued thereafter. The research in this field was popularized by DUC workshops where the focus was to create the summaries from various perspectives for the various datasets involving different types of data formats. But when these workshops stopped, a decline of research in this field was observed. But the exponential growth of information, popularity of web and emergence of web search engines again popularized the research in this field.

Even though a lot of research has taken place in this field but still the systems have not come to the level where the summaries are like human-summaries. The summarization process is of two types on the basis of what kind of summaries are generated by the system: Extractive and Abstractive. Extractive summarization involves the extraction of important sentences and then arranging them in the order they appear in the text whereas the Abstractive summarization process involves not just extracting the important sentences but also reformulating them so that they become the novel sentences. On the basis of the number of documents which the system considers for the summarization, they are divided into the single document summarization and multi-document summarization. As the names suggest, Single Document Summarization involves the analysis of only one document while Multi Document Summarization involves the analysis of more than one document to create the summaries. Even though lot of work has been done in the field of text summarization for both the type of summaries but for Bug Reports, most of the approaches involve the analysis of single document only.

Software Repositories are the popular way of archiving the software information produced by the software development companies. Software Development process involves lot of phases like requirement analysis, designing, software coding, testing, and maintenance phase. Out of all these testing and maintenance phases takes lot of time. Testing is the key phase of a software development as the increasing competition among the software community demands for the development of more robust and user friendly projects. In order to build the software of these requirements, it is important to continuously evolve the project according to the user requirements and create the bug free project. Bug Report is the document which is created by the user or the developer or the tester to document to the Bug. A Bug Report is usually expected to be according to the standards as it has to be consulted in later stages also by the development community but the overburden over the developers impact the quality of the Bug Report. For storing the Bug Reports mainly Bug Repositories are used. The Bug Reports involve the facility of conversations among the users which makes the Bug Reports informal and makes it resemble more like a conversational artifact. Mining the Bug Reports help gather lot of information like the new requirements of the users, improvement over the existing project, loopholes of the project, the resolution of the bugs. Bug Reports is one of the artifacts which is highly analyzed to gather important information. Also the regression testing is one of the most common testing done on the project as to keep their existence into the market, the product needs to be upgraded frequently which leads to the testing of the whole product again and again. Also many times, other similar projects also face same issues. Thus getting the Bug Reports information helps developers and testers a lot. Summarization of Bug Reports help create the summaries, gather the information quickly, helps document the best practices for the project and also helps triagers to quickly find out if the Bug Reports are duplicate or not.

Complete Article List

Search this Journal:
Reset
Volume 13: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 12: 2 Issues (2022): 1 Released, 1 Forthcoming
Volume 11: 2 Issues (2021)
Volume 10: 2 Issues (2020)
Volume 9: 2 Issues (2019)
Volume 8: 4 Issues (2018)
Volume 7: 4 Issues (2017)
Volume 6: 4 Issues (2016)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing