Sound Innovation Engine 1.0

Steps towards enabling access to all sounds with evolutionary algorithms: ways to find sounds you might not have heard before, and might also like, with evolutionary algorithms.

Journal extension (in review):
Quality-Diversity Search in Sound Generation: Investigating Innovation Engines for Audio Exploration
ICCC'24 update:
Cultivating Open-Earedness with Sound Objects discovered by Open-Ended Evolutionary Systems

Presented at EvoMUSART 2024, International Conference on Computational Intelligence in Music, Sound, Art and Design (Part of EvoStar):

Sound Innovation Engine 1.0 Towards Sound Innovation Engines Using Pattern-Producing Networks and Audio Graphs

All sounds have become admissible, in the recent history of music, at least since Russolo worked on the Art of Noise, seeking to substitute for the limited variety of timbres provided by traditional instruments with the infinite variety of timbres in noises, “reproduced with appropriate mechanisms”. And all sounds are in principle accessible, with modern technology, especially digital sound synthesis. But all sounds are not yet equally accessible, where access to sounds can for example depend on technical expertise and dedication.

Luigi Russolo's Noise Machines. — All sounds admissible as material for making music. All sounds theoretically possible. All sound is not equally accessible:
L. Wyse,“Free music and the discipline of sound,”Organised Sound, vol.8, no.3, pp. 237–247, Dec. 2003, doi: 10.1017/S1355771803000219

Maybe you don't know what you're looking for, so you can't prompt for specific results.

recognising sounds you’ve never heard

Maybe you would rather like to be surprised, by serendipitous discoveries, and go with the flow of those and see where they lead your creative process.

finding sounds recognised as pleasing but unfamiliar

We're interested in enabling further access to all sounds, with automatic exploration through the space of sounds, and hopefully expanding the horizon towards all sounds.

Our approach to this is based on Quality Diversity optimisation algorithms, which keep track of many different classes of solutions and check the performance of offspring from one class in other classes, which may lead to the discovery of stepping stones through many classes on path to interesting discoveries.

Quality Diversity to discover stepping stones with goal switching on a path to greatness

To automate the exploration, Innovation Engines combine such QD algorithms with a model that is capable of evaluating whether new solutions are interestingly new.

Innovation Engines. Automate QD exploration with a model capable of: distinguishing novelty, evaluating quality — A. M. Nguyen, J. Yosinski, and J. Clune, “Innovation Engines: Automated Creativity and Improved Stochastic Optimization via Deep Learning,”
in Proceedings of the 2015 Annual Conference on Genetic and Evolutionary Computation, in GECCO ’15. New York, NY, USA: Association for Computing Machinery, Jul. 2015, pp. 959–966. doi: 10.1145/2739480.2754703

A long term goal of innovation engines is to learn to:

classify the things they have seen so far
and seek to produce new types of things

Unsupervised, without labeled data.

A. Nguyen, J. Yosinski, and J. Clune, “Understanding innovation engines: Automated creativity and improved stochastic optimization via deep learning,” Evol. Comput., vol. 24, no. 3, pp. 545–572, Sep. 2016, doi: 10.1162/EVCO_a_00189

But to start our Innovation Engine explorations with sound, we start with a pre-trained model:

YAMNet, a DNN classifier, to define the measurement space:
- by using the classification to define diversity
- and confidence levels for each class as quality

Behavioural Descriptor: YAMNet

Compositional Pattern Producing Networks are a part of our approach to synthesising sounds:

CPPN networks abstract unfolding development during evolutionary processes, by composing different functions at each node.

This can be compared with the process of timbral development, where musical expression depends on changes and nuances over time.

Compositional ?Pattern Producing Networks — K. O. Stanley, “Compositional pattern producing networks: A novel abstraction of development,” Genetic Programming and Evolvable Machines, vol. 8, no. 2, pp. 131–162, Jun. 2007, doi: 10.1007/s10710-007-9028-8

For synthesising sounds with CPPNs we use the classic synthesiser waveforms as potential activation functions:

sine, square, triangle and sawtooth

A corresponding DSP graph can contain a variety of nodes, such as filters, noise, distortion, reverb, and specialised wavetable and additive synthesis nodes.

Each DSP node can receive audio and control signals from the CPPN, at any frequency

each unique frequency requires a separate CPPN activation, for reach sample

Linear ramp and periodic signals are input to the CPPN.

Signal Composition: CPPN + DSP

We use MAP-Elites as our QD algorithm:

It usually divides the solution space, or container, into a grid, or cells:

In this experiment the classes from YAMNet define the cells. So each cell in the container which MAP-Elites works on contains a sound genome, which is rendered into a sound for evaluation, and if it performs better on another class, then it becomes the elite in that class.

QD algorithm: MAP-Elites — J.-B. Mouret and J. Clune, “Illuminating search spaces by mapping elites.” arXiv, Apr. 19, 2015. doi: 10.48550/arXiv.1504.04909

To feed the Innovation Engine with Sound Objects, we use CPPN networks to emit patterns as waveforms

they can be used as audio signals, either raw or fed through a Digital Signal Processing (DSP) graph
when combined with a DSP graph, the CPPN patterns can also be used as control signals for various nodes in the DSP graph

The classification model - YAMNet - evaluates the Sound Objects, for diversity and quality, and based on that evaluation, the QD algorithm - MAP-Elites - potentially declares the new Sound Object as an elite in a new class, if it’s the highest performer there.

Sound Innovation Engine evolutionary loop

What did we find?

One quantitative measure is the QD-score, which summarises the performance from all classes in the container / map:

On a plot below we see the the red line of our baseline experiment reaching a QD-score of around 300, out of a maximum of 500:

500 classes and maximum classifier confidence of 1 for each

But the question came up: why we were coupling CPPNs with DSP graphs, why not just let the CPPNs alone emit the pattern of a sound waveform?

So we tried that:

Instead of wiring multiple CPPN outputs to multiple DSP graph nodes, we tried using just one CPPN output for the sound, with no DSP:

We can see that the performance of that configuration, in terms of a QD-score, is less

also at the cost of higher CPPN network complexity and resulting longer rendering times, as we’ll see later

Results: Signal Processing Graph

We observed more diversity when coupling CPPNs with DSP graphs:

The set of unique elites at the end of CPPN-only runs is smaller than when co-evolving the DSP graphs.

Results: Elite Populations

The distribution of CPPN activation function types is quite uniform in all variants of our runs
Apart from the stock Web Audio API nodes, custom DSP nodes, for wavetable and additive synthesis are prominent
- Indicates that it might be worthwhile to implement other classic synthesis techniques for the DSP part
CPPN-only runs resulted in more complex function compositions, likely to compensate for the lack of a co-evolving DSP graph

Results: Genome Complexity

High scores across most classes
CPPN + DSP higher overall
The synthesiser struggles with scoring high on classes for musical genres, such as Flamenco, Jazz and Funk (the blue cluster on the heatmaps below)
- Understandably? - it may be too much to ask of a sound synthesiser to represent whole musical genres, rather than specific sound events and instruments

Results: Performance Against Pre-trained Reward Signals

How did evolution leverage the diversity promoted by our classifier?

We measured the stepping stones across the classes by counting goal switches, which have been defined as "the number of times during a run that a new class champion was the offspring of a champion of another class”, and found a mean of 21.7±3.6 goal switches, which is 63.2% of the 34.3±4.5 mean new champions per class.

This can be compared to the 17.9% goal switches in previous Innovation Engine experiments with image generation.

goal switches

We were also curios to see how a single objective search would compare in the domain of sound, where all effort spent on one goal.

So we selected 10 classes as single objectives of separate runs and compared the performance and genome complexity with the performance from the QD runs on those same classes.

The single objective runs scored similarly to the QD runs, though with a high variance, but at the cost of much higher genome complexity.

The unexpected result of higher performance from the single-class runs may be attributed to the narrow set of chosen classes.

Abandoning Diversity: Similar performance at the cost of significantly higher complexity

In addition to numbers and plots, it’s also interesting to hear the sounds discovered:

Instead of showing a visual phylogenetic tree of the evolutionary paths, we offer an online explorer which offers sonic access to the results from all our experiments:

There you can scrub through evoruns and their classes, and for each class you can scrub through the generations throughout the run, which can in many cases reveal the goal switching behaviour that we’ve measured.

Here's a short screen recording of interaction with the explorer, where some goal switching behaviour can be observed:

It’s also interesting to observe if the discovered sounds are of any use in music composition or sound organisation.

To test that applicability, the sounds have been loaded into the experimental sampler AudioStellar, which is then configured to drive evolutionary sequences through the sound objects.

Several live streams with such sequencing have been broadcast. Many hours of recordings…

Sound Objects and their Application: Evolutionary Sequences, live-stream playlist — Playlist of live-streams from evolutionary sequences through sounds disovered with a Sound Innovation Engine

Some of those organised evolutionary sounds can be found on main-streaming services.

Sound Objects and their Application: Evolutionary Sequences on streaming services

So we’ve tried to have silicone based machine compose stuff with those sounds.

But what can meat based machines - we - do with those sounds?

Here’s an opportunity to give that a try: by navigating to the dataset accompanying this publication, you can click the Tree tab to easily find the evoruns-render folder, full of WAV files rendered from genomes discovered during our evolutionary runs. Maybe you can come up with interesting compositions of those sounds - AudioStellar is great for getting an overview of the sounds and starting to sequence them; Blockhead is also an interesting DAW for playing with soundfiles.

Sound Objects and their Application: meat machines vs silicone machines — Sound Innovation Engine 1.0 dataset

In conclusion, this work demonstrates that

it is a viable approach to apply diversity-promoting algorithms
with classifier reward signals for sound discovery
our current sound synthesis approach can achieve high confidence from a DNN classifier
the diverse set of sounds generated suggests further explorations in this system

conclusion

Tags: evolutionary algorithms, quality diversity search, innovation engine, sound innovation engine, CPPN, NEAT, DSP, Sound Synthesis

Published Mar. 20, 2024 2:40 PM - Last modified Jan. 17, 2025 1:37 AM