Neural modeling and interpolation of binaural room impulse responses with head tracking
Type
The use of neural networks for modeling and interpolating binaural room impulse responses (BRIRs) is investigated for facilitating spatial audio applications that require head tracking in multiple degrees of freedom. A deep neural network model is adopted from an architecture originally proposed for neural representation problems to predict unknown BRIRs that contain salient early reflection peaks, given head coordinates. Instead of its original time-domain formulation, a frequency-domain formulation is proposed to enhance the model efficiency and flexibility for band-limited BRIRs. Both model formulations are evaluated with measured and simulated BRIRs in terms of modeling accuracy and interpolation performance, respectively. It is shown that the frequency-domain formulation is more effecient at modeling band-limited BRIRs than its time-domain counterpart as the former only learns the partial frequency spectrum, and that models with both formulations significantly outperform conventional methods for interpolating sparse BRIRs.