Individualization of 3D Sound

This work is sponsored by the Sony Corporation of America


We perceive sound in three dimensions in everyday life. That is, without looking at a sound source, we can tell with reasonable precision, its location in space relative to us. We can do this because our brains process the sound signals that reach our eardrums in a manner that is unique to each of us. This should not be surprising as everyone's morphology is different (especially that of the outer ear), and this affects the sound reaching our eardrums in a highly idiosyncratic way. The processing that our brains do is tuned to our unique morphologies, and so swapping ears with someone else for instance would lead to a disorienting listening experience. To enable certain types of 3D sound reproduction systems (discussed subsequently), one of the tasks is to devise mathematical models that describe the effects our individual morphologies have on the sound we hear. The current project focuses on this task.

3D Sound Reproduction

It is possible to artificially recreate a 3D sound field using a finite number of loudspeakers or a pair of headphones. The common objective of the various reproduction techniques is to produce sound in such a way that, upon reaching the eardrums, is exactly the same as it would be in the "actual" listening experience. For example, suppose the actual listening experience consisted of you hearing a firecracker explode in the sky; the sound leaving the exploding firecracker would change considerably before it stimulates your eardrums. Now if, somehow, the sound from a pair of loudspeakers located a few feet from you could stimulate your eardrums in exactly the same way as the sound from the firecracker did, you would perceive the explosion as originating from way up in the sky, and not from anywhere near the pair of loudspeakers. The idea, therefore, is that the brain would get "tricked" into believing that the sound isn't necessarily originating from the loudspeakers, but rather from any point in 3D space.

One set of techniques focus on recreating an accurate sound field in a given region of space so that anyone who enters that region will experience sound in 3D. In such a system, the sound is first reproduced using a system of loudspeakers, following which natural interactions with an individual's morphology occurs, thereby enabling 3D perception of sound for that individual. Clearly, for such a system, modeling the sound's interactions with our individual morphologies is not required; this occurs naturally. Another set of techniques focus on reproducing sound as it exists just before it excites our eardrums. This means that the natural interactions that the sound would have made with our bodies must be "encoded" into the electrical signals that are sent to the loudspeakers or headphones used to generate the sound. This also means that the sound that is generated must reach the eardrums without any additional interactions. This is typically done in practice by compensating for such subsequent interactions. Both sets of techniques have their advantages and challenges. We shall not discuss them here, but suffice it to say that this project focuses on enabling sound reproduction using the latter of the two kinds of techniques.

One of the fundamental requirements for the 3D sound reproduction techniques that this project is concerned with, is to figure out how sound originating from a given point in space changes, due to interactions with a listener's morphology, before it reaches and stimulates the eardrums. This information is commonly contained in a pair of head-related transfer functions (HRTFs) for that listener. A HRTF pair for a given listener is a set of two mathematical functions, one for each ear, that describes how sound originating from a given point in space, relative to the listener, changes as it interacts with the listener's morphology alone (i.e. assuming there are no interactions with any other objects) before stimulating each of the listener's eardrums. Theoretically, then, every point in space has a different HRTF pair for a given listener. It is, therefore, necessary to find a way to obtain these HRTF pairs for any given source location, and any given listener morphology, in order to enable an accurate rendering of a 3D sound field for anyone. Furthermore, to maintain practicality, this must be done in a convenient, cost effective, and rapid way. The goal of the current research project can therefore be restated more accurately as the rapid modeling of head-related transfer functions for individualized 3D sound.