Hearing emotion: Redefining mental health monitoring via voice-based mood detection

Researchers at U-M have received a $3.6 million NIH grant to support their development of new digital phenotyping tools to better detect and measure symptoms of bipolar disorder via audio monitoring.
Round blue logo with a silhouette of side profile of a person's face on the left and on the right are three curved bars of increasing size, indicating that the person is speaking.

A voice can reveal a lot about a person: their personality, their age or health status, and even their emotional state. But could changes in voice alone indicate when someone is experiencing potentially dangerous mental health symptoms? Researchers at the University of Michigan have developed new technology that demonstrates this potential – their technique leverages machine learning and signal processing to study how changes in emotion are associated with changes in mood for people with bipolar disorder by analyzing individuals’ voices and their self-perception.

In the latest iteration of their decade-long research in this area, Prof. Emily Mower Provost of Computer Science and Engineering and her colleagues in the Department of Psychiatry, Prof. Melvin Mcinnis and Prof. Sarah Sperry, have received a $3.6 million grant from the National Institutes of Health (NIH) to further develop digital phenotyping tools that will lead to the identification of early warning signs of elevated mood symptoms via audio recording.

Through their research in the Heinz C. Prechter Bipolar Research Program, the team will develop an enhanced system for detecting early warning signs of mood shifts in individuals with bipolar disorder, helping them and their care providers better manage symptoms and intervene in a timely manner.

Bipolar disorder is a chronic mental illness that affects more than 40 million people worldwide. Most commonly manifesting as transitions between periods of mania and depression, bipolar symptoms can have debilitating effects. Without proper treatment, the condition can even be life-threatening; bipolar disorder has the highest rate of suicide of any mental health condition.

“Bipolar disorder can be very difficult to manage, especially for those whose mood symptoms are unstable and difficult to anticipate,” said Mower Provost. “It would be a major breakthrough to be able to reliably predict risk for impending episodes of mania or depression and alert providers and trusted confidants to initiate preventative measures and mitigate the impact and severity of these mood swings.”

One particularly challenging aspect of bipolar disorder is that the mood shifts that characterize the condition can be difficult to detect, even by the individuals experiencing them. For loved ones of those with bipolar, however, impending mood shifts may not be as much of a mystery. In fact, many can actually hear them coming. “There’s something wrong – I can hear it in their voice,” said one family member of an individual with bipolar disorder.

“Many caregivers, friends, and family members report that they’re able to hear in the voice of their loved one when they’re heading for an episode,” said Mower Provost. “For the past 12 years, this fundamental insight has driven our research. If the human mind can detect this, can we design computational algorithms that also do this?”

And blue and white logo
The logo of the Heinz C. Prechter Bipolar Research Program

Mower Provost and her collaborators have a long history of exploring the application of machine learning tools to parse and analyze the human voice, with a particular focus on emotion detection. This project began with pilot funding from the Michigan Institute for Clinical & Health Research (MICHR) and has been sustained with longstanding support from the Prechter Family and the Tam Foundation, as well as with recent support from Baszucki Group.

These philanthropic partnerships have provided Mower Provost and her colleagues with the resources to develop new techniques to achieve accurate representations of human emotion. Together, they have integrated machine learning and multimodal signal processing methods with clinical insights to derive and interpret data representations of human emotion based on voice recordings, resulting in unprecedented insights into emotion expression and perception.

The process of deriving meaningful information from data as complex as human speech is not without its challenges, however. From differences in microphone acoustics across devices to the nuances of language and how different individuals express emotion, Mower Provost and her team have faced numerous obstacles along the way in collecting and decoding audio data from participants.

“It can be difficult to characterize what is a risk factor for one person versus another,” said Mower Provost, “and the manner in which one person expresses their emotion or mood may not necessarily be the same as someone else.”

Overcoming these challenges has meant delving deeply into the complexities of language as well as human psychology and designing representations that can control for variation across individuals and contexts.

“To tackle this, we have moved away from the idea that mood should be predicted directly from raw speech acoustics,” said Mower Provost. “Instead, we are studying patterns of emotions, range of expression, anomalous patterns such as outbursts associated with mood variability as well as their trajectory, and how we can collect and distinguish these symptoms in the data.”

To accomplish this, the research team will recruit 160 participants with bipolar disorder to participate in the study. With their consent, the participants will install an app on their smartphone that periodically records ambient audio, and then parses this audio using the models that Mower Provost and her team have developed to look for variations in emotion. 

With the data collected via this app, the researchers hope to be able to develop person-specific models that take into account variation across individuals in order to accurately identify and predict mood shifts.

Closeup view of someone holding and using a smartphone
Audio recorded via an app on participants’ smartphones will allow researchers to analyze their voice for signals that could indicate they are at risk for bipolar disorder symptoms. Source: Pexels.

While this method will enable researchers to glean important information about the manifestations and warning signs of symptoms of bipolar disorder, the team is very aware of the privacy concerns that this type of data collection might raise and emphasize that the safety of the individuals they serve is their central priority.

“We take the privacy and security of our study participants extremely seriously,” said Mower Provost. “Everything we are doing is with the consent of our participants. All the data the app collects is sent to the cloud encrypted and is decrypted on HIPAA-compliant servers, all under the oversight of our IRB.”

With the critical input of these participants, the researchers hope to come away with an effective method for anticipating mood variation in bipolar disorder, which could ensure timely interventions and potentially save lives. 

This model of symptom detection and prediction has powerful implications, not just for the treatment of bipolar disorder but for health monitoring more broadly. The research team is hopeful that their work will help introduce a new paradigm in health tracking, one that is able to leverage the latest data technologies and methods in highly complex, real-world contexts.

“The overarching goal of this project and the work of our lab in general is to try to understand how we can bring measurement from clinical or laboratory environments to the real world,” said Mower Provost, “and use that measurement to support long-term, longitudinal tracking of health.”