Article Preview
Top1. Introduction
The Semantic Web has been proposed by Tim Berners-Lee to provide a common framework for information sharing across multiple domains (Crasso et al., 2012). With the Semantic Web, data are provided with data semantic meaning (through metadata), and concepts and entities in the real world can be represented in a machine-readable and structured form. The Resource Description Framework (RDF) proposed by the World Wide Web Consortium (W3C) is a model of representing metadata of resources on the Web. RDF Schema (RDF(S)) as well as Web Ontology Languages (OWL) are the description of vocabulary semantics used in RDF datasets. RDF and RDF Schema (collectively known as RDF(S)) are the core of the Semantic Web. Nowadays, RDF(S) have been increasingly applied in a wide range of Web-based application scenarios, such as semantic data integration (Arsic et al., 2019), semantic search (Xiong, Power and Callan, 2017; Zheng et al., 2019), semantic analysis of Big Data (Smiatacz, 2018; Shen, Hu and Tzeng, 2017), decision making (Rubio-Largo et al., 2017; Zhou et al., 2017) and so on. Currently, RDF(S) has become the de-facto standard of representing and handling data semantics. In particular, knowledge graphs (KGs) mostly adopt RDF mode to represent massive instances, and now are widely investigated and applied in diverse domains for the semantic and intelligent processing of massive data (Song et al., 2019).
With the rapid increase in the number of RDF(S) on the Web, it has become increasingly important to efficiently store massive amounts of RDF(S). The storage of RDF(S) (Ma, Capretz and Yan, 2016) often supports efficient queries of RDF data, mainly because the storage structure of RDF(S) not only directly determines the integrity of storage semantics, but also greatly affects its query efficiency (Ma et al., 2016; Ma, et al, 2018). At present, there have been many studies on RDF(S) storage methods, which can be roughly divided into the following three categories:
- 1)
Memory-based storage (e.g., Sesame (Broekstra, et al., 2002) and BitMat (Atre, et al., 2008)). With this category of methods, memory space is directly allocated for RDF data and indexing technology is generally utilized for quick data process. Note that these methods are limited by the size of computer memory and are only suitable for storing a small number of RDF datasets;
- 2)
Disk-based storage (e.g., YARS2 (Harth, et al., 2007) and System II (Wu, et al., 2009)). With this category of methods, the storage location is transferred from memory to hard disk. These methods meet the storage requirements of large-scale RDF datasets in space, but frequent reads and writes to disks greatly reduce storage performance;
- 3)
Database-based storage (e.g., Jena-TDB (Wilkinson, et al., 2003), 4Store (Harris, et al., 2009), Virtuoso (Erling and Mikhailov, 2007), BigOWLIM/OWLIM-SE (Bishop et al., 2011), SPARQLcity/SPARQLverse1, MarkLogic2, and Clark and Parsia3). This category of methods uses database technology to store RDF data. In addition to some commercial systems, there are some developed prototypes such as RDF-3X (Neumann and Weikum, 2010), SW-Store (Abadi, et al., 2009) and RDFox4.