LazyVoice: A multi-agent approach to smooth voice leading

LazyVoice is a generative music system that realizes chord progressions into flexible, smooth voice leading.

An annotated screenshot of the LazyVoice application

Abstract: We outline and describe the interactive LazyVoice system for realizing chord progressions into individual voices with fluid voice leading, inspired by choral voice leading techniques. Polyphonic music consists of multiple musical lines that, when taken together, form an implicit or explicit harmonic progression. While generative music systems exist that create harmonic progressions, these systems lack a means to translate the harmonic progression into individual polyphonic musical lines. We apply a technique used to improvise multiple-part harmony in choral settings to generate fluid musical lines from a harmonic progression. LazyVoice is a flexible voice leading system that translates abstracted harmonic progressions into multiple fluid musical lines.


LazyVoice targets the compositional transformation from chord symbols to individual voices. Chord symbols indicate common collections of musical notes, and there are many models to represent relationships between chords. Repeating sets of chords are called “progressions”, and chord progressions are often used as an organizational and compositional tool in music. Chord progressions can represent similar structures across a variety of songs and genres – Pachabel’s Canon in D and Green Day’s Time of Your Life both make heavy use of the I-V-vi-IV chord progression, among many other examples. There are several music generation systems that generate chord progression by using chord symbols, such as “C Major”, or Roman numeral notation such as “I-vi-IV-I”.

Transforming a chord progression into individual notes is called “voicing”. Most generative systems that use symbolic chord representation voice their chords as root position triads: the root is what the symbol refers to, and is placed at the bottom of the chord. The other two notes are stacked on top in thirds. In actual music, these voicings are very rarely used. LazyVoice uses a multi-agent approach, where each voice selects notes based on the lowest distance between pitches that fall within the chord.

To represent a wide variety of chords, we use a modal representation of chords that does not assume any tonal hierarchies. Chords are represented as a root pitch and a set of extensions – the difference between a C7 chord and a C13 chord is mostly in how many extensions there are, but the pitches are drawn from the same scale. The depth of the chord is then selectable in real-time, so the harmonies can be as deep as a full 11-note scale, or as simple as triads, while still following the same progression. Inversions can be allowed or disallowed, and if allowed, the bass line simply responds like any other voice and seeks the smoothest path.