Etd

Beyond the Spectrum: Custom MFCC Processing For Acoustic Health Monitoring

Público Deposited

Contenido Descargable

open in viewer

Mel-Frequency Cepstral Coefficients (MFCCs) are a critical feature in audio signal processing and have wide applications in systems that require audio classification, including the field of health monitoring and human activity recognition. The importance of MFCCs lies in their ability to mimic the human auditory system's response, making them particularly useful for analyzing audio signals in ways that are meaningful for classification tasks. The paper introduces a method for enhancing audio classification by developing custom Mel-Frequency Cepstral Coefficients (MFCC) for health monitoring and classification of human activities. A diverse audio dataset, focusing on a balanced subset to examine their periodograms, which involves calculating the magnitude squared of the frequency response. Identifying the predominant frequency in each audio file—defined as the frequency with the highest power—leads to the creation of an ordered array of these frequencies. This array, devoid of redundancies, is utilized to establish frequency bins for the MFCC algorithm, laying the groundwork for a customized filter bank automatically tailored to the dataset's specific characteristics. The limitation of current MFCCs in handling extremely high or low frequencies becomes evident. This limitation is due to the logarithmic scale distribution of frequency bins, which results in a denser concentration of bins at lower frequencies, starting from 20Hz, and becoming progressively sparser towards the higher frequency limit of 20kHz. This characteristic of MFCCs underscores the need for careful consideration when employing this tool in sound analysis, particularly for health monitoring purposes. To address the issue, a custom MFCCs approach is proposed. This customized approach significantly enhances audio classification system performance, as proven through extensive testing on various human-related audio datasets. These datasets include gender identification, environmental sounds, health-related sounds (like breath and vocal sounds for disease detection), and emotional speech analysis. The comparison between the traditional MFCC and the custom version shows a notable increase in classification accuracy, particularly with pitch-shifted audio samples. In order to test the results of Custom MFCC, we divided the dataset into train and test dataset in a form of 80% of train and 20% test randomly. Across 4 dataset, the average improvement of custom MFCC is 7.2% and with pitch-shifting is 6.7%. This indicates the custom MFCC's superior ability to handle human sound variations, highlighting its potential to improve audio classification tasks and its application in complex audio scenarios. Such advancements benefit a range of technologies that rely on sound analysis, marking a significant step forward in the field.

Creator
Colaboradores
Degree
Unit
Publisher
Identifier
  • etd-121456
Palabra Clave
Advisor
Orcid
Committee
Defense date
Year
  • 2024
UN Sustainable Development Goals
Date created
  • 2024-04-24
Resource type
Source
  • etd-121456
Rights statement

Las relaciones

En Collection:

Elementos

Elementos

Permanent link to this page: https://digital.wpi.edu/show/q811kp90r