Binaural Rendering of Recorded 3D Soundfields

This work is sponsored by the Sony Corporation of America

Introduction

Binaural recordings are inherently limited in that the 3D localization cues embedded in the binaural signals are only ideally suited for playback to the recording individual, as that individual's unique morphology has already filtered the incoming sound waves in a highly idiosyncratic and direction-dependent manner. However, by using an array of microphones (such as the Eigenmike by mh acoustics), the incident soundfield (i.e., the soundfield that would exist in the absence of the microphone array) can be extracted. This soundfield can then be processed using specific information about the intended listener to generate an individualized binaural rendering of the recorded soundfield. An additional consequence of this approach is that the listener may navigate the soundfield, techniques for which are the focus of another ongoing research project.

It is the aim of this research project to develop tools and techniques to generate individualized binaural renderings of recorded 3D soundfields. In the following write-up, fundamental aspects of soundfield capture and binaural rendering, as well as current avenues of research, are described.

1. How is the soundfield captured?

A commonly-used device to capture a soundfield is an array of microphones flush-mounted on the surface of a rigid sphere. Examples of such a device include the Eigenmike by mh acoustics and the Realspace Audio Visual Camera by VisiSonics. The scattering (i.e., reflection and diffraction) of incoming sound waves off of a rigid sphere is a well-understood phenomenon, so the original incident soundfield can be computed using only the pressure on the surface of the sphere (see, for example, Theoretical Acoustics by Morse and Ingard).

Higher-order ambisonics (HOA) provides a framework for representing a measured soundfield by its spherical harmonic expansion, in which each HOA signal represents a different term of the expansion. The maximum expansion order that can be computed from a given recording is limited by the number of microphones on the recording array. Since spherical microphone arrays can only provide a spatial sampling of the pressure on the sphere, rather than the pressure everywhere on the surface of the sphere, there will necessarily be errors in the analysis of the soundfield. Strategies to mitigate these errors as well as the perceptual consequences of such errors are currently being investigated.

At the 3D3A Lab, we are currently using the 32-channel Eigenmike by mh acoustics to record and encode soundfields up to fourth order. The precise way in which the 32 microphone signals are combined to generate the HOA signals is an active area of research.

2. How is the binaural rendering generated?

One approach to generating a binaural rendering of a soundfield is by simulating playback over a real HOA loudspeaker array. Research has shown that listeners inside a real HOA loudspeaker array experience a realistic impression of the measured soundfield since the reconstructed sound waves are able to interact with the listener's morphology naturally. Therefore, a realistic binaural rendering of the soundfield can be obtained by filtering each loudspeaker signal by the corresponding head-related transfer function (HRTF) of the listener. The accuracy of the resulting binaural signals depends on having accurate models of the listener's HRTFs, the estimation of which is the focus of another ongoing research project. This binaural rendering approach has been implemented by Matthias Kronlachner in his ambiX plug-in suite. In order to customize or individualize the ambisonics-to-binaural rendering of the ambiX binaural plugin, we have developed an open-source collection of MATLAB functions, referred to as the SOFA/ambiX binaural rendering (SABRE) toolkit, which enables a user to generate custom binaural rendering configurations for the plug-in from any SOFA-formatted HRTFs.

An alternative approach, based on a parametric plane-wave decomposition of first-order ambisonics signals, is employed by Harpex. In this approach, plane-wave components of the soundfield are estimated in the time-frequency domain and subsequently filtered by a database of HRTFs, to yield binaural signals.

Other approaches to binaural rendering of ambisonics have been developed, several of which are summarized in this paper. The development of alternative methods for binaural rendering of ambisonics is an active area of research