Article Preview
TopIntroduction
Public health emergencies are characterized by their suddenness, speed, and unpredictability, presenting significant challenges to emergency management (An et al., 2018). Governments and voluntary relief organizations should strive to collect and understand relevant disaster information to aid emergency response operations (Fu et al., 2020). Situational Awareness (SA) (Huang & Xiao, 2015), which involves gathering and comprehending relevant crisis information (i.e., what is occurring in impacted communities during an event), is critical to this process. Social media has become a primary mode of disseminating information online, owing to its speed, versatility, and interactivity. It also serves as a substantial communication channel, particularly for situational awareness during emergencies like natural calamities.
However, due to the diversity of online user communities, online news content varies widely, posing challenges for relevant agencies in swiftly gaining situational awareness of events. The key to enhancing the speed of emergency response to unforeseen events and minimizing associated losses is efficiently collecting pertinent information related to situational awareness from vast amounts of data in the shortest possible time frame. The use of social media for situational awareness during unforeseen events typically involves tasks such as social media text classification and semantic mining, which includes parsing concise and informal messages, managing information overload, and prioritizing different types of information identified within these messages. These tasks can be mapped to classical information processing operations, such as filtering, categorization, sorting, aggregation, extraction, and summarization (Imran et al., 2015; Liang & Li, 2020).
In recent years, an increasing number of scholars have explored the implementation of techniques, including natural language processing, machine learning, and deep learning, for the automated processing of social media breaking news messages (Xia et al., 2021). However, there still needs to be a framework to identify and classify event-specific information, primarily due to the complexity of the task and the dynamic nature of online information (Nan et al., 2022). Such a framework is necessary for the ability of relevant agencies to efficiently process and make sense of the vast amount of data generated during unforeseen events. This gap in existing methodologies arises from the following two factors: the complexity of information and unbalanced data issues.
On the one hand, the diverse and dynamic nature of online content, particularly during emergencies, poses challenges in developing a comprehensive framework. The sheer volume of information and the rapid evolution of events demand a sophisticated approach to extracting relevant details. On the other hand, dealing with unbalanced data sets, where informative and non-informative data may be unevenly distributed, adds another layer of complexity. A robust framework should account for these imbalances to ensure accurate and unbiased results.
In light of these challenges, developing a comprehensive and adaptable framework becomes imperative. This paper proposes an automated and comprehensive framework for identifying informative information. Using situational awareness, our approach aims to comprehend the concepts, features, and categories of informative information related to public health emergencies on social media. First, we define the concepts, characteristics, and components of informative information regarding public health emergencies based on situational awareness on social media. Second, we employ statistical methods to extract traditional features, including linguistic, numeric, punctuation, and source-based features. Third, we enhance our framework by extracting topic-based features using a category-based latent Dirichlet allocation to vector (LDA2vec) model, tailored explicitly for addressing unbalanced data sets. Finally, we introduce a fuzzy support vector machine (FSVM) classifier designed to handle unbalanced and noisy data, utilizing a kernel based on Mahalanobis distance rather than the traditional Euclidean distance kernel. The effectiveness of our framework is assessed through comparisons with traditional machine learning models and state-of-the-art methods. To further validate our approach, we leverage a BERT pre-trained model to cluster the classification results, demonstrating that informative information has a superior situational awareness effect.
The main contributions of our work could be summarized as follows: