How introducing an app for recording and processing spatial audio within interactive video conferences allowed a Swedish video streaming provider to take the lead in the local market of VR audio solutions.
Home Success stories VR Audio Recorder For a Video Streaming Provider
The main idea is that participants could be placed randomly in front of the video camera. This would significantly improve the usability of the entire solution. Thus, there is no need to build complex schemes of multiple cameras and additional equipment to place all the participants.
In addition to the particular 360-degree camera, our customer needed to implement a component that allows the user to detect the position of the speaker in live-mode and to transmit this data to the other side for which the playback is being performed. This is necessary so the participants on the other side can see the person who is currently speaking.
To implement the component for detecting the speaker, together with a 360-degree video camera, we decided to use a special microphone that allows the recording of spatial audio in Ambisonics A-format.
Then we needed to investigate and build a software solution (algorithm) for processing Ambisonics A-audio, which allows us to detect and calculate the direction vector to the loudest audio source. For that, we used the algorithms FFT, Convolution, AGC, HRTF, as well as a number of algorithms for signal processing of the OpenCV library. The main idea of this approach was to build a sound field map in polar coordinates. Further, using algorithms for digital image processing (Threshold, Erode, Dilate, contour detection), the coordinates (the direction vector) of the loudest sound source are analyzed and calculated.
The app allows the user to:
analyze Ambisonics A-audio from various sources (sound card, audio file, HLS stream)
implement metadata containing direction data into H.264 video stream using Wowza WMS
perform vector detection and calculation in live-mode
generate various debugging information, and visualize the 360-degree map of sound field levels
Digital Signal Processing
HRTF
FFT
Convolution
AGC