Article Preview
TopIntroduction
In recent years, there has been a notable and rapid development in information and communication technologies, with a significant emphasis on the domains of Cloud-Computing, Big-Data (Eachempati et al. (2022)), Artificial intelligence (Wang et al. (2019)), and the advent of 5G networks. The continuous expansion of the volume of the data and the enrichment of data types have resulted in increasingly diverse and rapid changes in database workload. Conventional operational methods are no longer adequate to fulfill the requirements of modern database systems. According to Chen et al. (2019), Kraska et al. (2019) and Li, Zhou & Li (2019), With the recent advancement of artificial intelligence technology AI-based database operation and maintenance methods (M'barek et al. (2016)) are gradually replacing traditional database operation and maintenance methods.
Improving the performance of the database is a main concern of the intelligent operation and maintenance. The tuning of database generally refers to increasing the throughput per unit time of the database or reducing the latency of a single database operation. There are a number of configurable parameters which is significantly important for the performance of the database operations. Oh and Sang (2005) and Weikum et al. (2002) showed in their work that database parameter tuning has always been a key concern of DBA. In recent years, the academic community has also made a significant improvement on database parameter tuning which impact on DBA performances (Aken, Pavlo, Gordon & Zhang, 2017; Cai et al., 2022; Cereda et al., 2021; Fekry et al., 2020; Gur et al., 2021; Ishihara & Shiba, 2020; Li, Zhou, Li & Gao, 2019; Kanellis et al., 2020; Kanellis et al., 2022; Zhang et al., 2019; Zhang et al., 2021; Zhang et al., 2022). However, researches have ignored the following three issues:
- 1.
Database performance failures are rare and difficult to obtain in online environments. It is relatively easy to construct performance problem samples in an offline database environment. However, due to the differences between offline simulation environments and real online environments, models trained solely on sample data from offline simulation environments may perform poorly in online environments. The above two reasons require the combination of samples from both online and offline simulation environments to jointly train the model.
- 2.
Both online and offline environments ignore the impact of hardware environment information on database tuning when generating samples. If hardware environment information is not included in the model training process, it cannot better characterize database performance characteristics, thereby affecting the accuracy of the model.
- 3.
Not all samples in the database can be used to train the model. Some samples are invalid and need to be filtered those outliers. This part of the sample does not refer to manual input errors, null values, and other anomalies. The main factors affecting database performance are not only database parameters but also workload. When the workload is relatively low (performance has not reached the performance bottleneck), the performance of the entire database is only related to the strength of the workload, and is not related to the database parameters. Therefore, when the database system does not reach the performance bottleneck, the samples generated need to be filtered to avoid errors in evaluating the model's output parameters.
At present, the most cutting-edge intelligent parameter tuning is based on reinforcement learning (Li, Zhou, Li & Gao, 2019; Lillicrap et al., 2016; Zhang et al., 2019). The authors introduce a method for identifying whether the recommendation result is effective after configuring recommendation induced from the reinforcement learning, and proposes a new generation of database intelligent tuning system. DBtune used reinforcement learning to solve the problem of parameter tuning. The main work and contributions are as follows: