Article Preview
Top1. Introduction
Over the last few years, mobile phones have become a compelling platform for location and information-based services in historical attractions. There are currently several mobile applications (e.g., Stonehenge1, and Acropolis Interactive 3D2) that visitors can download in advance of their visit to a historical attraction. These applications typically offer cultural content in a full multimedia format and 2D/3D maps to assist visitors with route finding. Some applications even use gamification (Gamar, 2019) to provide a more playful and engaging experience for visitors. However, using a map alone to assist with route finding cannot ensure an optimal visitor experience. This is because visitors are usually familiar with navigator applications (pedestrian or for driving) which provide turn-by-turn voice instructions on top of a map. Only recently projects (Rubino et al., 2013) (Kyubark Shim, 2015) have started exploring the use of this type of assisted route finding into prototypes for historic places. As no general positioning solution works well for both indoor and outdoor environments, different applications use different approaches to facilitate navigation. A method of particular interest is the use of landmark-based navigation (Basiri et al., 2014). This is the kind of navigation service in which users are provided with navigational instructions such as “go straight,” “then turn left” and so forth, whenever they approach a landmark (Fang et al., 2012). As this type of navigation does not need additional infrastructure to identify the user’s location, mobile applications are generally more cost-effective to develop and deploy. Then, although gamification (e.g., points, badges, and leader boards) can provide the necessary motivation for visitors to engage with the application (and routes of cultural interest) (Chou, 2019) alone, it cannot guarantee that visitors will not get lost. To better serve this goal there is a need to use a multimodal interface to maximise the impact of gamification. A multimodal interface can enable users to interact with a mobile application using multiple communication channels (e.g., multi-touch, speech recognition, natural language processing, etc.). These modalities can work in synergy with game-based design elements to enhance the comprehension of navigational instructions. Enabling multimodality in mobile tour guide applications can open up new opportunities for assisted visitor navigation more effectively and efficiently over existing methods alone.
We present a gamified mobile tour guide prototype with a fully multimodal interface featuring an Embodied Conversational Agent (ECA). An ECA is a computer-generated animated character usually with a human-like form and capable of using multiple communication channels (e.g., gestures, natural language processing, etc.) with users (Justine Cassell, 2000). Our ECA uses Natural Language (NL) and embodiment to disambiguate navigation instructions which may be challenging to interpret. For example, our ECA may use iconic gestures (movements of hands and arms) to resemble the shape of a particular characteristic of a landmark to help visitors better make sense of the provided instructions. The benefits of landmarks in navigation have been well studied in the literature. Several projects have developed prototypes that utilise pictures of landmarks to guide users in urban environments (Hile et al., 2008) (Goodman-Deane et al., 2004). However, we are not aware of any projects which use ECAs as navigation aids in a landmark-based navigation system. The use of ECAs to assist route finding has been studied extensively in virtual reality (VR) environments (Kuijk et al., 2015) (Dijk et al., 2003). Some projects have also developed close-to-market prototypes (e.g., High Fidelity3), which support the fast implementation of such ECAs. The relevant literature in the use of ECAs to assist human navigation in physical environments focuses on the development of Augmented Reality (AR) platforms and prototypes (Amazon Web Services, 2019; Schmeil & Broll, 2007). However, these prototypes require accurate location-tracking for the ECA to provide proper navigation instructions. Also, given that users continuously change position and orientation, it is not always possible for the system to render the ECA in the field of view of the user.