Front Row
Automatically generating immersive audio representations of tennis broadcasts for blind viewers

A study participant, who is congenitally blind, using Front Row to watch a tennis match together with their sighted friend. Front Row is a system that automatically generates an immersive audio representation of a tennis broadcast video, allowing BLV viewers to more directly perceive what is happening in a tennis match. Front Row first recognizes gameplay from the video feed using computer vision, then renders players' positions and shots via spatialized (3D) audio cues. Front Row works with a standard pair of headphones.
Blind and low-vision (BLV) people face challenges watching sports due to the lack of accessibility of sports broadcasts. Currently, BLV people rely on descriptions from TV commentators, radio announcers, or their friends to understand the game. These descriptions, however, do not allow BLV viewers to visualize the action by themselves.
We present Front Row, a system that automatically generates an immersive audio representation of sports broadcasts, specifically tennis, allowing BLV viewers to more directly perceive what is happening in the game. Front Row first recognizes gameplay from the video feed using computer vision, then renders players' positions and shots via spatialized (3D) audio cues.
User evaluations with 12 BLV participants show that Front Row gives BLV viewers a more accurate understanding of the game compared to TV and radio, enabling viewers to form their own opinions on players' moods and strategies.
The Project
Formative study
To inform the design of Front Row, we conducted semi-structured interviews and observation sessions with five BLV participants to answer two questions: (1) What challenges do BLV people face when watching sports? and (2) What are BLV viewers' information preferences for achieving a better understanding of the gameplay?
Regarding BLV people's challenges, we found that watching with friends and family was important but difficult for BLV people because they could not participate in the conversations about the game. Participants shared frustration about the amount of information they received, which resulted in loss of immersion and engagement when watchings sports. Specifically, all partcipants concurred that TV commentators were not descriptive enough, whereas radio descriptions were too dense and thus, could be overwhelming to decipher.
Regarding BLV people's information preferences, participants expressed a strong desire to understand the spatial aspects of gameplay, such as where the players are on the field. Participants also expressed preference for receiving neutral, objective information about the gameplay which was not colored by the commentators' opinions.
Based on our findings, we synthesized four design goals for Front Row:
- G1: Facilitating spatial understanding of the gameplay.
- G2: Providing an appropriate amount of information to facilitate immersion.
- G3: Providing a single format that both BLV and sighted viewers can enjoy.
- G4: Supporting agency in gameplay understanding.

Front Row's 3D soundscape. The tennis court is displayed on a 2D plane orthogonal to the BLV viewer. Players' positions are represented by continuous humming sounds, and players' shots are represented by bell sounds similar to those in blind tennis. These sounds are blended with the TV broadcast's original audio to incorporate ambient noises and the announcers' commentary
Front Row: Immersive Audio Design
Front Row is a system that generates an immersive audio representation of a tennis broadcast video in order to enable BLV viewers to more directly perceive what is happening in a tennis match. The audio rendering consists of three sound cues that together help BLV viewers to gain a spatial understanding of the gameplay (G1), to feel more immersed within the game (G2), to enable co-watching with sighted peers (G3), and to form their own opinions on players' strategies (G4).
The first sound cue allows viewers to visualize and follow players' positions on the court. The second sound cue allows viewers to understand players' shots, including when players make shots and whether those shots are forehands or backhands. The third sound cue is the ambient game sounds from the broadcast video, such as audience cheers and umpire's calls, that provide a more realistic viewing experience to BLV people
Front Row renders the sound cues via spatialized (3D) audio on a 2D plane that represents the "birds-eye view" of the court. This 2D plane is orthogonal to the viewer but several feet in front of them in the 3D soundscape.

Overview of the Front Row pipeline. The pipeline automatically generates spatialized (3D) audio representation of tennis directly from the source broadcast video using computer vision.
Front Row: Computer Vision Pipeline
To create audio representations, Front Row takes as input only the source broadcast video feed and uses computer vision to extract the necessary gameplay information.
Front Row's pipeline first segments the source video feed to get separate parts of the video where the game is in play (i.e., rallies) from lull periods in between, such as commercial breaks and players changing sides. For each rally segment, Front Row then recognizes gameplay by extracting information about the court, players, and the ball. Using this information, it detects when players make shots and what types of shots —forehands vs. backhands— they make. Finally, spatialized (3D) audio representations are generated to provide BLV viewers spatial information about players' positions, shots, and shot types. These audio representations are blended with the original audio from the broadcast video to provide a realistic viewing experience to BLV viewers.

A study participant —who is congenitally blind— viewing a tennis match using Front Row. They are wearing a standard pair of headphones.
User Study
We evaluate Front Row in a user study with 12 BLV participants to understand how well Front Row allows BLV viewers to comprehend tennis gameplay compared to the status quo of listening to TV and radio broadcasts. Participants listened to several audio clips of tennis rallies in three different formats: Front Row, TV, and radio broadcasts; and then answered questions about the gameplay in those rallies.
We found that Front Row provides BLV viewers with a signifcantly more accurate understanding of the gameplay compared to TV and radio. For instance, Front Row reduced BLV participants' comprehension errors compared to TV by over 90% in recognizing the type of shots players hit and around 85% in identifying when players approach the net during the play. We also found that Front Row facilitates more immersion, with many participants valuing how Front Row afords them the ability to visualize the gameplay and to form their own opinions about the players' moods and strategies during the game.

Future applications for Front Row. (a) Front Row plug-in for video streaming platforms. (b) Making recreational tennis games accessible to BLV audiences.
Future Applications
We also illustrate applications that Front Row could enable in the future. We envision integrating a Front Row plug-in for video streaming platforms streaming platforms, such as YouTube, ESPN+, and Hulu, to make tennis videos accessible across the Web. This could work similarly to how closed captions are implemented on YouTube. Front Row could also make recreational tennis games at high schools, parks, and universities accessible to BLV audiences. By processing a camera feed captured behind one of the players, Front Row can enable BLV audience members to follow the game in real time.
Videos
30 Second Preview (UIST 2024)
UIST 2024 Talk
Publications

Front Row: Automatically Generating Immersive Audio Representations of Tennis Broadcasts for Blind Viewers

Towards Accessible Sports Broadcasts for Blind and Low-Vision Viewers
CEAL Team
Sponsor
This research was funded in part by National Science Foundation Grants 2051053 and 2051060. The opinions, findings, conclusions, and/or recommendations expressed are those of the authors and do not necessarily reflect the views of the National Science Foundation.