Auditory Grouping: Using Machine Learning to Predict Locations of Groups in Music Clips

Jan, Sean; Chen, Yang

Student Work

Auditory Grouping: Using Machine Learning to Predict Locations of Groups in Music Clips

Public

Humans perceive a variety of features from an auditory stream, such as our acoustic sensors can detect frequency, pitch, dynamics, etc. We can process music in several different ways based on these features. It’s tough for machines, however, to do the same. Some previous research models already can obtain state-of-the-art performance in predicting acoustic boundaries, but machine perception for audio segmentation based on a human perspective remains to be accomplished. Our project aims to use machine learning algorithms to build a model that makes machines able to separate music into segments as humans do. The machine learning model we built allowed for clear grouping distinction for audio clips of the same musical genre we trained the data on, but generalized poorly to other genres. We believe that the model can be improved by having more training data of a larger scope and increasing the quality of grouping boundaries labels for the data.

This report represents the work of one or more WPI undergraduate students submitted to the faculty as evidence of completion of a degree requirement. WPI routinely publishes these reports on its website without editorial or peer review.

Creator

Publisher