Article Preview
Top1. Introduction
Fireworks Algorithm (FWA) is a novel swarm intelligence algorithm (SIA) under active research. Inspired by the explosion process of fireworks, FWA was originally proposed for solving optimization problems (Tan, 2010). Comparative study shows that FWA is very competitive with respect to real-parameter problems (Tan, 2013). FWA has been successfully applied to many scientific and engineering problems, such as non-negative matrix factorization (Janecek, 2011), digital filter design (Gao, 2011), parameter optimization (He, 2013), document clustering (Yang, 2014), and so forth. New mechanisms and analyses are actively proposed to further improve the performance of FWA (Li, 2014; Zheng, 2014).
Although FWA, as well as other SIAs, has achieved success in solving many real-world problems where conventional arithmetic and numerical methods fail, it suffers from the drawback of intensive computation which greatly limits its applications where function evaluation is time-consuming.
Facing technical challenges with higher clock speeds infixed power envelope, modern computer systems increasingly depend on adding multiple cores to improve the performance (Ross, 2008). Initially designed for addressing highly computational graphics tasks, the Graphics Processing Unit (GPU), from its inception, has many computational cores and can provide massive parallelism (with thousands of cores) at a reasonable price. As the hardware and software for GPU programming grow mature (Kirk, 2010; Munshi, 2011), GPUs have become popular for general purpose computing beyond the field of graphics processing, and great success has been achieved in various applications, from embedded systems to high performance supercomputers (AMD, 2015; NVIDIA, 2015a; Owens, 2007).
Based on interactions within population, SIAs are naturally amenable to parallelism. SIAs’ such intrinsic property makes them very suitable to run on the GPU in parallel, thus gain remarkable performance improvement. In effect, GPUs have been utilized to accelerated SIAs from the first days of GPU computing, and significant progress has been achieved along with the emergence of high-level programming platforms such as CUDA (Compute Unified Device Architecture) and OpenCL (Open Computing Language) (Zhou, 2009; Zhu, 2009). In the past few years, different implementations of diverse SIAs were proposed. Targeting on GPUs of various architectures and specifications, many techniques and tricks were introduced (Tan, 2015).