Multimodal Sentiment Analysis Method Based on Hierarchical Adaptive Feature Fusion Network

Multimodal Sentiment Analysis Method Based on Hierarchical Adaptive Feature Fusion Network

Huchao Zhang
Copyright: © 2024 |Pages: 23
DOI: 10.4018/IJSWIS.335918
Article PDF Download
Open access articles are freely available for download

Abstract

The traditional multi-modal sentiment analysis (MSA) method usually considers the multi-modal characteristics to be equally important and ignores the contribution of different modes to the final MSA result. Therefore, an MSA method based on hierarchical adaptive feature fusion network is proposed. Firstly, RoBERTa, ResViT, and LibROSA are used to extract different modal features and construct a layered adaptive multi-modal fusion network. Then, the multi-modal feature extraction module and cross-modal feature interaction module are combined to realize the interactive fusion of information between modes. Finally, an adaptive gating mechanism is introduced to design a global multi-modal feature interaction module to learn the unique features of different modes. The experimental results on three public data sets show that the proposed method can make full use of multi-modal information, outperform other advanced comparison methods, improve the accuracy and robustness of sentiment analysis, and is expected to achieve better results in the field of sentiment analysis.
Article Preview
Top

Introduction

Social media, as a network platform for users to create, share, and communicate, can make it more convenient for users to access information and also provide them with more choices and editing rights. Unlike paper media, such as newspapers, social media has a variety of content forms (Sahoo & Gupta, 2021; Ahmed et al., 2022; Almomani et al., 2022). In addition to text mode, social media can also provide users with more intuitive and three-dimensional information content through modes such as voice and image. Images, speech, and text constitute the most common scenes in daily life (Su et al., 2023; Gao et al., 2022; Balcilar et al., 2021).

Sentiments play a crucial role in our daily lives, helping us communicate, learn, and make decisions. For a long time, researchers have been dedicated to using machines to analyze human sentiments (Tiwari et al., 2021; Schneider et al., 2023; Singh & Sachan, 2021). Early MSA often focused on single modality information such as sound, text, visual, and biological signals. However, using a single modality for MSA users often did not accurately analyze their sentiments (Salhi et al., 2021; Mohammed et al., 2022; Garcia-Garcia, 2023). Because the same text may express opposite meanings in different contexts, it is difficult to accurately predict users' sentiments based solely on one modality. Due to the inability of single-mode MSA technology to effectively process data and fully utilize the diversity of information, it is no longer suitable for the current complex environment (Sun et al., 2020; Yuan et al., 2021; Zhang et al., 2022).

As research deepens, researchers have found that information can more effectively analyze human sentiments than single-modal information. The way people communicate and express sentiments in daily life is usually through the fusion of sound, text, and visual modalities. MSA is research on mining user perspectives and emotional states from data, such as text, vision, or speech, based on single-mode MSA (Chen et al., 2022; Niu et al., 2021; Poria et al., 2023).

Multimodal emotion recognition can be used to analyze user emotions on social media. By combining text, image, and video data, users' emotional tendencies and emotional states can be more accurately understood.In the field of education, multimodal plays an important role. Multimodal emotion recognition analyzes students' speech, facial expression, and gesture data and can understand students' emotional state and learning effect and provide personalized teaching and feedback. When integrating a multi-modal emotion recognition system, it can be integrated with an existing system or platform through an API interface or SDK. The results of multimodal emotion recognition can be used as input for decision-making, personalized recommendation, emotion analysis, and other functions.

Complete Article List

Search this Journal:
Reset
Volume 20: 1 Issue (2024)
Volume 19: 1 Issue (2023)
Volume 18: 4 Issues (2022): 2 Released, 2 Forthcoming
Volume 17: 4 Issues (2021)
Volume 16: 4 Issues (2020)
Volume 15: 4 Issues (2019)
Volume 14: 4 Issues (2018)
Volume 13: 4 Issues (2017)
Volume 12: 4 Issues (2016)
Volume 11: 4 Issues (2015)
Volume 10: 4 Issues (2014)
Volume 9: 4 Issues (2013)
Volume 8: 4 Issues (2012)
Volume 7: 4 Issues (2011)
Volume 6: 4 Issues (2010)
Volume 5: 4 Issues (2009)
Volume 4: 4 Issues (2008)
Volume 3: 4 Issues (2007)
Volume 2: 4 Issues (2006)
Volume 1: 4 Issues (2005)
View Complete Journal Contents Listing