The 3D3A Lab Head-Related Transfer Function Database

This work was sponsored in part by the Sony Corporation of America

Introduction

An individual's head-related transfer function (HRTF) describes the idiosyncratic filtering of incident sound waves by the individual's body (primarily head and upper torso), and is used, for example, to synthesize binaural signals for spatial audio reproduction. An HRTF is typically acquired via acoustical measurements, by modeling from anthropometric data, or by performing numerical computations on 3D head and torso scans. Acoustically-measured and numerically-computed HRTFs can:

  • be very accurate but are typically time-consuming and difficult to obtain
  • serve as benchmarks for validating modeled HRTFs
  • serve as training data for developing HRTF models

Publicly available databases provide measured HRTFs for many human subjects and mannequins. However, as of 2017, few such databases also included corresponding anthropometric data (especially in the form of 3D head and torso scans). This was the motivation for the publicly-available 3D3A Lab HRTF database which, as of 2018, contained HRTFs for 29 subjects measured in the 3D3A Lab's anechoic chamber and 3D head and torso scans of 23 of those subjects captured in-house using state-of-the-art 3D scanners. Since the release of our database, additional databases have been released that also contain data similar to ours. Some of these databases have also released HRTFs that are numerically computed from the 3D scans. Therefore, starting in 2021, we also decided to update our database to include numerically-computed versions of the HRTFs when possible. We have also made many additional updates (summarized below) to the database since its original release in 2017. Detailed information regarding these changes will be provided in an upcoming technical report (see Publications list at the bottom of this page).

VERSION HISTORY:

6 Nov 2021

Update all variants of measured HRTFs for all 38 subjects
Add consumer-grade 3D head and torso scans for 1 more subject (Total: 32)
Add reference-grade 3D head and torso scans for 1 more subject (Total: 32)
Add 3D head and ear scans for 1 more subject (Total: 32)
Add 3D head-only scans for 1 more subject (Total: 32)
Add computed HRTFs for 1 more subject (Total: 32)
Add anthropometric data for 1 more subject (Total: 32)

17 Oct 2021

Add measured HRTFs for 9 more subjects (Total: 38)
Add measured HRTFs with modeled low-frequency extension for 38 subjects
Add measured HRTFs with diffuse-field equalization for 38 subjects
Add consumer-grade 3D head and torso scans for 8 more subjects (Total: 31)
Add reference-grade 3D head and torso scans for 8 more subjects (Total: 31)
Add 3D head and ear scans (i.e., no torso) for 31 subjects
Add 3D head-only scans (i.e., no torso or ears) for 31 subjects
Add computed HRTFs based on each of the above 4 scan types for 31 subjects
Add anthropometric data for 31 subjects

13 Aug 2018

Add consumer-grade 3D head and torso scans for 23 subjects
Add reference-grade 3D head and torso scans for 23 subjects

02 Aug 2018

Add measured HRTFs for 14 more subjects (Total: 29)

02 Nov 2017

Add measured HRTFs for 3 more subjects (Total: 15)

23 Oct 2017

Add measured HRTFs for 1 more subjects (Total: 12)

19 Oct 2017

Add measured HRTFs for 2 more subjects (Total: 11)

17 Oct 2017

The first release of the database with measured HRTFs for 9 subjects

DATABASE ACCESS: As this is an ongoing project, our database is continually evolving as we accumulate more data. Click here to access the latest version of the database. Please consider citing all available publications listed in the Publications section at the bottom of this webpage if the data is used for your work.

CC-BY Logo.

This database is made available to the public under a Creative Commons Attribution 4.0 International License.

Measured HRTFs

The figure below illustrates the HRTF measurement setup in the anechoic chamber. Briefly, the subject is seated in the chamber in front of a vertical arc, which holds 9 loudspeakers. Binaural microphones are inserted into the subject's ears, and binaural impulse responses (BIRs) are measured for each loudspeaker. The subject is then rotated in 5° increments using a computer-controlled turntable upon which the seat is affixed. The HRTF measurement procedures are described generally in this video by AudioStream, and in more detail in this paper.

hrtf_measurement_setup

The table below summarizes information about the HRTF measurements.

Measurement room

Anechoic chamber

Measurement positions

Radius: 76 cm
72 azimuths: [0°, 5°, 10°, …, 355°]
9 elevations: [–57°, –30°, –15°, 0°, 15°, 30°, 45°, 60°, 75°]
(72 × 9 = 648 positions in total)

Turntable

Outline ET-250 3D

Loudspeakers

9 × Genelec 8010A

Binaural microphones

Theoretica Applied Physics BACCH-BM Pro

Excitation signal

Multiple exponential sine sweeps (Majdak et al., 2007)
Sweep type: phase-controlled (Vetter and di Rosario, 2011)
Sweep duration: 500 ms (200 ms inter-sweep delay)
Frequency range: 20 Hz to 48 kHz

Sampling rate

96 kHz

Data export format

Spatially-oriented format for acoustics (SOFA)

 To obtain the subject's HRIRs (the spatially discrete, time-domain representation of the subject's HRTF), the combined free-field responses of the loudspeakers and microphones, which we refer to as the reference impulse responses (RIRs), are deconvolved out of the measured BIRs. The RIRs are measured for each loudspeaker using the same binaural microphones, which are placed at the origin of the measurement arc (i.e., at the position of the center of the subject's head). Inverse filters for the measured RIRs are then designed and applied to the BIRs, yielding the HRIRs. Some of the signal processing details are described in this paper, with updates to the processing methodology published in an upcoming technical report (see Publications list at the bottom of this page). Versions of the HRIRs with modeled low-frequency extension and diffuse-field equalization are also made available (SOFA file names suffixed with “_lfc” and “_dfeq”, respectively).

3D Head and Torso Scans

The diagram below illustrates the scanning setup. Briefly, the subject is seated, puts on a wig cap, and has colorful adhesive markers placed around the subject's face. The subject's head and torso are then scanned using PrimeSense Carmine 1.09 sensor (which we refer to as a “consumer-grade” 3D scanner), followed by a high-resolution scan of the individual pinnae using the Artec Space Spider, a state-of-the-art, structured-light, 3D scanner. The 3D scanning procedures are described in more detail in this paper.

Head scan setup

The table below summarizes information about the 3D head and torso scans.

Head and torso scanner

PrimeSense Carmine 1.09

Pinna scanner

Artec Space Spider

Scanning software

Skanect ProArtec Studio 12 Pro

Data export format

Polygon file format (PLY)

The consumer-grade head-and-torso scans are converted into watertight meshes and aligned such that the subject's interaural axis coincides with the y-axis of the scan. A copy of each of these scans is then made, following which the “consumer-grade” pinnae scans are manually replaced by the corresponding high-resolution pinnae scans, yielding the “reference-grade” scan. The 3D scan processing procedures are described in more detail in this paper, with updates to the processing methodology published in an upcoming technical report (see Publications list at the bottom of this page). Additional variants of the reference-grade scans namely, head and ear scans (i.e., torso removed) and head-only scans (i.e., torso and pinnae removed), are also made available. We also provide MATLAB *.mat files containing select anthropometric dimensions (specified in meters) extracted from the 3D scans.

Numerically-Computed HRTFs

We use the fast-multipole boundary-element method (FM-BEM) implemented in the open-source software, Mesh2HRTF to compute HRTFs from our 3D scans. The computations, which are performed in the frequency domain, are discretized in steps of 100 Hz up to a maximum frequency of 16 kHz for a total of 160 discrete frequencies. For a left- or right-ear HRTF corresponding to a single spatial location, we first set the value of the frequency response at 0 Hz to the magnitude of the value computed at 100 Hz. The resulting 161 complex values are then transformed into a real-valued HRIR of length 321 samples using the inverse fast Fourier transform (FFT) after forcing the frequency response to be conjugate symmetric. Finally, we apply a delay of approx. 1 ms (rounded to the nearest sample) to ensure causality followed by a Tukey window. For a given 3D scan, once all HRIRs have been computed in this way, they are exported in the SOFA format at a sampling rate of 32.1 kHz. Additional details regarding the processing of the HRIRs prior to exporting in the SOFA format are described in an upcoming technical report (see Publications list at the bottom of this page). Diffuse-field equalized versions of the HRIRs are also made available (SOFA file names suffixed with “_dfeq”).