An Analysis on Task Migration Strategy of Big Data Streaming Storm Computing Framework for Distributed Processing

An Analysis on Task Migration Strategy of Big Data Streaming Storm Computing Framework for Distributed Processing

Xiling Hu
Copyright: © 2020 |Pages: 18
DOI: 10.4018/IJISMD.2020100102
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

In this modern era, a large volume of data is generated regularly, which needs to be processed for gaining profits of latent information. The processing of big data is composed of a challenge termed as communication overhead. In order to minimize communication overhead on the premise of various resource constraints, a task migration strategy under heterogeneous storm environment is proposed. The proposed strategy is based on the establishment and demonstration of storm resource constraint model, optimal communication overhead model, and task migration model. Where the source node selection algorithm adds the nodes beyond the threshold to the source node set according to the load of CPU, memory, and network bandwidth of each working node in the cluster and the priority order of various resources. The experiment shows that compared with the existing research, computing storm can effectively reduce the delay and communication overhead between nodes, and the execution overhead is small.
Article Preview
Top

1. Introduction

Big data leads information technology to a new heights and referred as information revolution after Internet technology (C. P. Chen & Zhang, 2014). In the context of big data, real-time streaming data (Guo, Kang, & Yuan, 2017) is a new data type, which has grown rapidly in recent years and becomes integral part of big data (Rathore et al., 2015). At present the mainstream of big data streaming, computing framework Storm (Peng, Hosseini, Hong, Farivar, & Campbell, 2015), is used to introduce its system architecture and related concepts. The four basic aspects of task scheduler are elaborated as reduction in communication overhead, optimization of resource allocation, load balancing and improving system flexibility. For task scheduling and optimization this research presents the situation and their respective advantages and disadvantages, and summarizes the problems in the present study.

The traditional Database Management System (DBMS), relying on relational algebra and related technologies, has played an important role in the data query and management of various commercial applications (Idreos et al., 2012). However, in the face of high-speed and concurrent streaming data, low latency and high throughput are the ultimate goals of the system design, and the traditional database management system can no longer meet this requirement. For this reason, the data stream processing system emerged and gradually became an important online data analysis and processing tool in academia and industry (Zhang et al., 2017). In recent years, the research and application of big data have become the focus of academic and business circles. Its computing modes include batch computing, streaming computing, interactive computing, graph computing, etc. In the past, most of them were used for computing. Batch computing is first stored and then calculated (such as Hadoop ecosystem), which is suitable for the application scene with low real-time performance and global data coverage (W. Chen, Yao, & Tan, 2018). Flow calculation has broken the graphs in the Hadoop framework the situation of unify the whole country, it does not need storage, as long as the data source is active, the data is generated, and the flow (by the time the infinite sequence of tuples) work in each node, in the form of memory to calculate, is suitable for real-time requirements strictly and only for window inside bureau department data processing application scenario. Streaming big data processing platform greatly improves the user experience of online data-intensive applications (C. P. Chen & Zhang, 2014). It can be widely used in many fields such as finance and banking, Internet and Internet of things, covering typical applications such as real-time analysis of the stock market, search engine and social network, real-time traffic alert and so on. The existing streaming big data processing framework is represented by the Storm system of Twitter (Fu, Zhao, & Ma, 2011). Storm is an open source distributed real-time computing platform based on master-slave architecture, with a simple programming model, support for a variety of programming languages including Java, and good horizontal scalability (Goetz & O'Neill, 2014). Compared with Flink and SparkStreaming, Storm has better real-time performance in big data streaming processing (Chan, Huang, & DeFries, 2001). Compared to the non-open source Puma and the community-cool S4, the Storm has a broader commercial future. With the addition of new features, support for more libraries, and seamless integration with other open source projects, Storm has gradually become a research hotspot in academia and industry, known as “the Hadoop of real-time processing” (Q. Li, Honglin, & Yan, 2009), (Norta, Othman, & Taveter, 2015).

Complete Article List

Search this Journal:
Reset
Volume 15: 1 Issue (2024)
Volume 14: 1 Issue (2023)
Volume 13: 8 Issues (2022): 7 Released, 1 Forthcoming
Volume 12: 4 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing