Abstract
Robot perception is the ability of a robotic platform to perceive its environment by the means of sensor inputs, e.g., laser, IMU, motor encoders, and so on. Much like humans, robots are not limited to perceiving their environment through vision-based sensors, e.g., cameras. Robot perception, through the scope of this chapter, encompasses acoustic signal processing techniques to locate the presence of a sound source, e.g., human speaker, within an environment for human-robot interaction (HRI), that has gained great interest within scientific community. This chapter will serve as an introduction to acoustic signal processing within robotics, starting with passive acoustic localization and building up to contemporary active sensing methods, such as the usage of neural networks and spatial map generation. The origins of active acoustic localization, which finds its roots in biomimetics, are also discussed.
TopIntroduction
The detection and localization of acoustic reflectors such as walls, objects or people within an environment is a popular topic within the area of robotics. Traditionally, camera and laser-based technologies are used to detect the presence of these landmarks to generate a spatial map of a 3D space and aid these robotic platforms in navigating within their environment. However, these light-based sensing modalities often face challenges such as a lack of light, overexposure (glare), the inability of detecting transparent surfaces such as windows, false reflections, or their sensitivity to occlusion. These issues can be addressed when incorporating sound-based sensing modalities. Research in animal auditory system has inspired researchers to develop technologies to locate the presence of sound sources within an environment.
This chapter will serve as an introduction to robot audition, starting with the subdomains of robot audition and building up to more contemporary active sensing methods, such as the usage of Artificial Intelligence (AI) on recorded data to detect and track acoustic sources and for spatial map generation. The origins of active acoustic sensing, which finds its roots in biomimetics, are also discussed. This chapter is written as a reference for people working on robot perception using sound and wants to contribute to future works by bringing new challenges to the field of robot perception. The chapter will begin with an introduction to biomimicry in robotics, which aims to mimic an animal’s auditory system to localize the position of sound sources in the nearby environment. Biomimicry facilitates intelligent designs in robots to achieve high performance and robustness when navigating between and localizing acoustic sources in a dynamic environment. Designers of such robots make use of new materials, sensors and actuators to provide high capabilities that allow robots to mimic biological processes such as hearing.
Furthermore, this chapter will review techniques in scientific literature associated with passive acoustic localization and active acoustic localization, which are the two important sub-domains of robot audition for SSL. Passive acoustic localization involves detecting sound generated by objects present in an environment while active acoustic localization techniques probe an environment with a known sound to detect the position of objects within an environment. Both sub-domains have their fair share of advantages and disadvantages. For example, active acoustic localization is useful in a quiet environment. This is normally the case when a robot explores an underground environment, such as caves, tunnels, and sewers. Bats, rats and even some aquatic mammals are known to use these techniques to navigate and hunt in complete darkness. These animals probe the environment with a unique sound, or call, and use acoustic echoes to distinguish flora and fauna, different types of animals/prey, and everything needed for their survival. Therefore, a discussion on the different types of probe signals that can be used in robotics to acquire spatial information from the environment is also an important highlight of this chapter. More specifically, analysis of additive white Gaussian noise (AWGN), coded emissions, and chirp signals will be discussed in detail. The application of spatial mapping using echolocation is also an important highlight of this chapter, which incorporate spatial filtering techniques, such as, beamforming techniques.
Finally, the chapter will review data-driven approaches to using contemporary methods such as neural networks for perceiving the environment in an artificially intelligent way. This is relatively a newer approach that combines physics-based model of sound with machine learning to teach robotic platforms to learn to classify and predict their surroundings.
Key Terms in this Chapter
ML: – Machine learning
DOA: – Direction of arrival
MISO: – Multiple input and single output
ROV: – Remotely operated vehicle
MVDR: – Minimum variance distortionless response
SONAR: – Sound navigation and ranging
RIR: – Room impulse response
IPD: – Interaural phase difference
ITD: – interaural time difference
CASA: – Computational auditory scene analysis
MIMO: – Multiple input and multiple output
SIMO: – Single input and multiple output
AIR: – Acoustic impulse response
CFAR: – Constant False Alarm Rate
DL: – Deep learning
DSB: – Delay and sum beamformer
RNN: – Recurrent neural network