ICCC'24 update:
Cultivating Open-Earedness with Sound Objects discovered by Open-Ended Evolutionary Systems
Presented at EvoMUSART 2024, International Conference on Computational Intelligence in Music, Sound, Art and Design (Part of EvoStar):
All sounds have become admissible, in the recent history of music, at least since Russolo worked on the Art of Noise, seeking to substitute for the limited variety of timbres provided by traditional instruments with the infinite variety of timbres in noises, “reproduced with appropriate mechanisms”. And all sounds are in principle accessible, with modern technology, especially digital sound synthesis. But all sounds are not yet equally accessible, where access to sounds can for example depend on technical expertise and dedication.
Maybe you don't know what you're looking for, so you can't prompt for specific results.
Maybe you would rather like to be surprised, by serendipitous discoveries, and go with the flow of those and see where they lead your creative process.
We're interested in enabling further access to all sounds, with automatic exploration through the space of sounds, and hopefully expanding the horizon towards all sounds.
Our approach to this is based on Quality Diversity optimisation algorithms, which keep track of many different classes of solutions and check the performance of offspring from one class in other classes, which may lead to the discovery of stepping stones through many classes on path to interesting discoveries.
To automate the exploration, Innovation Engines combine such QD algorithms with a model that is capable of evaluating whether new solutions are interestingly new.
A long term goal of innovation engines is to learn to:
- classify the things they have seen so far
- and seek to produce new types of things
Unsupervised, without labeled data.
But to start our Innovation Engine explorations with sound, we start with a pre-trained model:
- YAMNet, a DNN classifier, to define the measurement space:
- by using the classification to define diversity
- and confidence levels for each class as quality
Compositional Pattern Producing Networks are a part of our approach to synthesising sounds:
CPPN networks abstract unfolding development during evolutionary processes, by composing different functions at each node.
This can be compared with the process of timbral development, where musical expression depends on changes and nuances over time.
For synthesising sounds with CPPNs we use the classic synthesiser waveforms as potential activation functions:
- sine, square, triangle and sawtooth
A corresponding DSP graph can contain a variety of nodes, such as filters, noise, distortion, reverb, and specialised wavetable and additive synthesis nodes.
Each DSP node can receive audio and control signals from the CPPN, at any frequency
- each unique frequency requires a separate CPPN activation, for reach sample
Linear ramp and periodic signals are input to the CPPN.
We use MAP-Elites as our QD algorithm:
It usually divides the solution space, or container, into a grid, or cells:
In this experiment the classes from YAMNet define the cells. So each cell in the container which MAP-Elites works on contains a sound genome, which is rendered into a sound for evaluation, and if it performs better on another class, then it becomes the elite in that class.
To feed the Innovation Engine with Sound Objects, we use CPPN networks to emit patterns as waveforms
- they can be used as audio signals, either raw or fed through a Digital Signal Processing (DSP) graph
- when combined with a DSP graph, the CPPN patterns can also be used as control signals for various nodes in the DSP graph
The classification model - YAMNet - evaluates the Sound Objects, for diversity and quality, and based on that evaluation, the QD algorithm - MAP-Elites - potentially declares the new Sound Object as an elite in a new class, if it’s the highest performer there.
What did we find?
One quantitative measure is the QD-score, which summarises the performance from all classes in the container / map:
On a plot below we see the the red line of our baseline experiment reaching a QD-score of around 300, out of a maximum of 500:
- 500 classes and maximum classifier confidence of 1 for each
But the question came up: why we were coupling CPPNs with DSP graphs, why not just let the CPPNs alone emit the pattern of a sound waveform?
So we tried that:
Instead of wiring multiple CPPN outputs to multiple DSP graph nodes, we tried using just one CPPN output for the sound, with no DSP:
We can see that the performance of that configuration, in terms of a QD-score, is less
- also at the cost of higher CPPN network complexity and resulting longer rendering times, as we’ll see later
We observed more diversity when coupling CPPNs with DSP graphs:
The set of unique elites at the end of CPPN-only runs is smaller than when co-evolving the DSP graphs.
- The distribution of CPPN activation function types is quite uniform in all variants of our runs
- Apart from the stock Web Audio API nodes, custom DSP nodes, for wavetable and additive synthesis are prominent
- Indicates that it might be worthwhile to implement other classic synthesis techniques for the DSP part
- CPPN-only runs resulted in more complex function compositions, likely to compensate for the lack of a co-evolving DSP graph
- High scores across most classes
- CPPN + DSP higher overall
- The synthesiser struggles with scoring high on classes for musical genres, such as Flamenco, Jazz and Funk (the blue cluster on the heatmaps below)
- Understandably? - it may be too much to ask of a sound synthesiser to represent whole musical genres, rather than specific sound events and instruments
How did evolution leverage the diversity promoted by our classifier?
We measured the stepping stones across the classes by counting goal switches, which have been defined as "the number of times during a run that a new class champion was the offspring of a champion of another class”, and found a mean of 21.7±3.6 goal switches, which is 63.2% of the 34.3±4.5 mean new champions per class.
This can be compared to the 17.9% goal switches in previous Innovation Engine experiments with image generation.
We were also curios to see how a single objective search would compare in the domain of sound, where all effort spent on one goal.
So we selected 10 classes as single objectives of separate runs and compared the performance and genome complexity with the performance from the QD runs on those same classes.
The single objective runs scored similarly to the QD runs, though with a high variance, but at the cost of much higher genome complexity.
The unexpected result of higher performance from the single-class runs may be attributed to the narrow set of chosen classes.
In addition to numbers and plots, it’s also interesting to hear the sounds discovered:
Instead of showing a visual phylogenetic tree of the evolutionary paths, we offer an online explorer which offers sonic access to the results from all our experiments:
There you can scrub through evoruns and their classes, and for each class you can scrub through the generations throughout the run, which can in many cases reveal the goal switching behaviour that we’ve measured.
Here's a short screen recording of interaction with the explorer, where some goal switching behaviour can be observed:
It’s also interesting to observe if the discovered sounds are of any use in music composition or sound organisation.
To test that applicability, the sounds have been loaded into the experimental sampler AudioStellar, which is then configured to drive evolutionary sequences through the sound objects.
Several live streams with such sequencing have been broadcast. Many hours of recordings…
Some of those organised evolutionary sounds can be found on main-streaming services.
So we’ve tried to have silicone based machine compose stuff with those sounds.
But what can meat based machines - we - do with those sounds?
Here’s an opportunity to give that a try: by navigating to the dataset accompanying this publication, you can click the Tree tab to easily find the evoruns-render folder, full of WAV files rendered from genomes discovered during our evolutionary runs. Maybe you can come up with interesting compositions of those sounds - AudioStellar is great for getting an overview of the sounds and starting to sequence them; Blockhead is also an interesting DAW for playing with soundfiles.
In conclusion, this work demonstrates that
- it is a viable approach to apply diversity-promoting algorithms
with classifier reward signals for sound discovery - our current sound synthesis approach can achieve high confidence from a DNN classifier
- the diverse set of sounds generated suggests further explorations in this system