A Technique for Securing Big Data Using K-Anonymization With a Hybrid Optimization Algorithm

A Technique for Securing Big Data Using K-Anonymization With a Hybrid Optimization Algorithm

Suman Madan, Puneet Goswami
DOI: 10.4018/IJORIS.20211001.oa3
Article PDF Download
Open access articles are freely available for download

Abstract

The recent techniques built on cloud computing for data processing is scalable and secure, which increasingly attracts the infrastructure to support big data applications. This paper proposes an effective anonymization based privacy preservation model using k-anonymization criteria and Grey wolf-Cat Swarm Optimization (GWCSO) for attaining privacy preservation in big data. The anonymization technique is processed by adapting k- anonymization criteria for duplicating k records from the original database. The proposed GWCSO is developed by integrating Grey Wolf Optimizer (GWO) and Cat Swarm Optimization (CSO) for constructing the k-anonymized database, which reveals only the essential details to the end users by hiding the confidential information. The experimental results of the proposed technique are compared with various existing techniques based on the performance metrics, such as Classification accuracy (CA) and Information loss (IL). The experimental results show that the proposed technique attains an improved CA value of 0.005 and IL value of 0.798, respectively.
Article Preview
Top

1. Introduction

The advancements in big data led to several opportunities for research in the upcoming years. The Big data is adapted for discovering knowledge using different sectors of society. The big data contains vast data, which is generated through the digital processes and shared among several individuals through webs. The big data has spanned the way for making the decisions in a right way. The decision support has motivated several users to keep the data online (Xuezhen, et al., 2014). Due to the sharing of data, several concerns related to security are generated. The ability to store the personal information is a major issue in the context of privacy-preservation (Karle & Vora, 2017). As the big data handles the data of a large number of users, the privacy is an important task, which needs to be accomplished for protecting the data (Yang, et al., 2014), (Youke, et al., 2020). Numerous applications are designed for allowing the users to access the data with trust management (Denglong et al., 2020). The privacy and security is a major challenge in big data. The big data is not accepted if privacy and security are not addressed. The scalability (S. Atiewi et al., 2020) is another major issue when the conventional preservation technique is adapted in big data. In spite of several techniques developed for privacy preservation, most of them cannot efficiently preserve the privacy as they fail to handle different attacks (Antony & Antony, 2016). The big data requires large storage and computational power for preserving the data. Hence, it adapts a large distributed system for storing data at various locations and for easy retrieval (Geetha, et al., 2017).

As preserving the privacy is an important issue in processing the big data, it affects academia as well as the IT industry. The important aspect while sharing data is to preserve the privacy and simultaneously provide the data utility. The purpose of extracting the useful data from large datasets is to obtain a data, which is not misused. Several techniques are devised for privacy preservation, but most of them are ineffective for addressing the problems related to security while privacy preservation (Thanamani, 2017). The priory used privacy preservation techniques can be categorized into two phenomena. The first phenomenon is hiding the identity of the user and the second phenomenon is the preservation of user’s important data. The big data needs to consider communication overhead and computational cost due to its large volume, velocity, and variety (Guan & Si, 2017). The information transfer can be secured if the privacy of the database is preserved. The parameters considered for the privacy preservation while processing big data are categorized as integrity, controllability, preservability, and confidentiality. The performance of various algorithms based on privacy preservation is increased due to its outstanding behavior to protect the big data. However, the technique based on privacy preservation ignores the accessing of data by untrustworthy users due to data loss, and data leakage. The privacy can be preserved using input privacy and output privacy. The performance of the anonymization based algorithms can be improved if optimization based algorithms are adapted (Tang, et al., 2016).

Complete Article List

Search this Journal:
Reset
Volume 15: 1 Issue (2024)
Volume 14: 1 Issue (2023)
Volume 13: 2 Issues (2022)
Volume 12: 4 Issues (2021)
Volume 11: 4 Issues (2020)
Volume 10: 4 Issues (2019)
Volume 9: 4 Issues (2018)
Volume 8: 4 Issues (2017)
Volume 7: 4 Issues (2016)
Volume 6: 4 Issues (2015)
Volume 5: 4 Issues (2014)
Volume 4: 4 Issues (2013)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing