
Abstract: While there is a breadth of research in mapping Western musical features to perceived emotion within research in music and emotion, a critique of the field is that this breadth of methodologies lacks in inter-communication, which may reduce the generalizability of findings across the field. We consolidate previous research in this area to construct a parameterized composition guide that maps musical features to their associated emotional expression. We then use this guide to compose the “IsoVAT” dataset, a collection of symbolic MIDI clips in a variety of popular Western styles. This dataset contains a total of 90 clips of music, with 30 clips per affective dimension, organized into 10 sets of 3 clips. Each clip within a set is composed to express a low, medium, or high level of an affective dimension when compared to the other clips within the same set. We empirically evaluate the validity of our affective composition guide, and to establish the ground-truthed emotional expression of the dataset. The ground-truthing reveals 19 sets that match the composed labels, 10 sets that have ground-truthed labels that disagree with composed labels, and 1 clip that does not have clear agreement across the three study designs.
The IsoVAT guide provides a set of musical parameters for Western music that are often manipulated in music composition or performance, with the changes in emotional expression that are associated with changes in the parameter. If a composer wishes to increase the amount of emotional valence that is perceived in their music, they can use more notes in major modes, with more consonant harmonies, use larger intervalic leaps, and follow hierarchical tonal relationships more.
The IsoVAT guide is constructed from aggregating the findings of several surveys on music perception and modeling emotions. Primarily used as a research tool, the IsoVAT guide is intended to be interpreted by human composers, while providing specific and empirically grounded guidelines for affective composition. We use the IsoVAT guide to control the emotional expression of music that serves as the input for the Multi-track Music Machine (MMM) transformer model. MMM outputs music that is based on its input. Therefore, by following the IsoVAT guide while composing input music for MMM, we produce music with similar affective expression as the output of MMM.
The IsoVAT guide is also itself empirically evaluated, by composing a corpus of music following the guide. We compare ground-truth rankings and ratings for the IsoVAT guide, and generally find it to be effective.