Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

Thermodynamics of harmony: Extending the analogy across musical systems

  • L. Nasser ,

    Contributed equally to this work with: L. Nasser, A. Tillotson, X. Hernandez

    Roles Conceptualization, Formal analysis, Funding acquisition, Investigation, Supervision, Writing – original draft

    lnasser@colum.edu

    Affiliation Department of Science and Mathematics, Columbia College Chicago, Chicago, Illinois, United States of America

  • A. Tillotson ,

    Contributed equally to this work with: L. Nasser, A. Tillotson, X. Hernandez

    Roles Formal analysis, Writing – review & editing

    Affiliation Department of Physics, NYU, Abu Dhabi, United Arab Emirates

  • X. Hernandez

    Contributed equally to this work with: L. Nasser, A. Tillotson, X. Hernandez

    Roles Formal analysis

    Affiliation Instituto de Astronomía, Universidad Nacional Autónoma de México, México, D.F., México

Abstract

It is common for most people to think of science and art as disparate, or at most only vaguely related fields. In physics, one of the biggest successes of thermodynamics is its explanation of order arising from disordered phases of matter through the minimization of free energy; In 2019, Berezovsky showed [1] that the mechanism describing emergent order from disorder in matter can be used to explain how ordered sets of pitches can arise out of disordered sound, thus bridging the gap between science and the arts in a powerful way. In this paper we analyze his method in detail, generalizing it beyond the 12 tone system of intonation of Western music by explicitly considering Gamelan instruments and clarifying some details in the hope of strengthening it and making it better known and recognized.

Introduction

In Schoenberg’s classic treatise “Theory of Harmony” [3], he states “Art in its most primitive state is a simple imitation of nature. But it quickly becomes imitation of nature in the wider sense of this idea, that is, not merely imitation of outer but also of inner nature”. Provocative as that statement is, it’s unlikely Schoenberg ever imagined that one day it would be possible to trace the origins of harmony to thermodynamics by direct analogy to the mechanisms that give rise to ordered phases in nature. To our knowledge, the first investigation in this direction was by Berezovsky [1, 8]. In this paper we extend the ideas explored in [1], clarifying and exploring in detail what are the essential elements needed for the method to work and generalizing some of the results needed to understand harmony as the result of an ordered phase that arises from interacting sounds. Before we outline how this was done, it is useful to review some basic ideas of music theory.

Harmony refers to the sound of two or more notes played together e.g. [30]. Its rules are based on certain relationships among notes that the human ear will either intuitively accept or reject. These rules are also expressible mathematically and have been the subject of scientific investigation for centuries e.g. [11, 23]. In the Pythagorean tradition it was known that by comparing the sound made by plucking strings of different lengths, the distances between notes (or musical intervals) that the ear found agreeable had lengths in specific ratios, leading to the expression: “There is geometry in the humming of the strings.” Examples of these intervals are the octave corresponding to lengths in the ratio , the perfect fifth for lengths in the ratio , the fourth for lengths in the ratio , and so forth. In essence, Western harmony is built upon chords that are purposefully constructed upwards from their bass or fundamental note, using other notes whose intervals are perceived as agreeable; a succession of chords is then defined by the distance, or intervals, between their roots.

The use of the word “agreeable” is worthy of note: By necessity it requires human perception which by no means is a mathematically exact standard. Indeed, music can be seen as a language that employs the subtle interplay between the perception of consonance and dissonance to convey an emotional response in the listener. Consonance refers to the combination of notes accepted as “agreeable” or restful. Dissonance refers to combinations of notes that are perceived as tense, and it is a key element in music; dissonance creates movement and gives the music flow between states of tension and relaxation as the dissonance is followed by consonance; without dissonance, music would be static and boring e.g. [31]. Given its crucial role, it is essential to bear in mind that dissonance is a culturally-shaped construct, and thus, it helps us understand why different cultures employ different musical styles and structures to express themselves.

The octave is of very special interest in the construction of a musical system. In physical terms, an octave is an interval defined by a note of fundamental frequency f and another with fundamental frequency 2f. These two notes are essentially perceived as “the same note”, differing only by their rate of vibrations which manifests in the perception of one being “higher” than the other [11]. A scale is a succession of notes that form a progression from a note to its octave. In Western music, it was established that the octave should be divided into 12 steps. Initially the size of the interval between successive notes was not constant, and this led to problems with intonation [7, 14] that complicated polyphony. To understand why, consider the Pythagorean scale [7]: starting from a base note of frequency f, and raise it by a fifth, which means multiplying it by . You then raise the new note by a fifth again, giving a note with frequency . Because , You bring it down by an octave, or dividing by 2 which yields a note of frequency , and so forth, until you have 12 notes spanning the octave. To get the notes in the octave above, multiply all your frequencies by 2. To get the ones on the octave below, you divide by 2. However, what happens when you increase by 7 octaves? You get a note with frequency 27f = 128f. You ought to be able to get the same note by starting with your root note f and raising it by fifths 12 times. Unfortunately, . This small difference is known as the Pythagorean comma and it creates a serious intonation problem. As instruments became more sophisticated, new forms of intonation were sought. Still enamoured by the whole number ratios of Pythagoras, the Just Intonation used approximations to the Pythagorean ratios that used smaller integers, but this still did not fully solve the issue. For example, the Pythagorean ratios that define a major scale are [20]:

which, in spite of having many consonant intervals gives rise to the Pythagorean comma and the difficulty of polyphony. If we now try to build a scale that maximizes the number of consonant intervals having exact rational ratios, the result is what is known as “Just Temperament”. In this case, the major scale is defined by the following ratios:

On the surface, this intonation is just as harmonious as the Pythagorean (same perfect 4th, 5th and octave). It also uses only rational numbers, but they use smaller integers: , and . However, let’s compare the scales of C and D major in just intonation:

The problem is clear: we can see that the notes E, A and B don’t have the same frequencies! This means it would be impossible for transposing instruments to play simultaneously. This persistent difficulty was eventually solved by a brute force compromise: The size of every interval in the 12 note octave was forced to be equal. This solution, known as 12 tone equal temperament (TET) was developed independently by Zhu Zaiyu (1584) [5] and Simon Stevin (1585) [6] and lies at the heart of modern Western Harmony. In this intonation, only the octaves remain exact, and every other interval is slightly flat or sharp, as we can see by comparing a scale of C Major in Just Intonation and in 12 TET:

This 12 TET system is what allows musicians to modulate (change keys) freely and polyphony - as we know it in the western culture - to exist and flourish, even if it comes at a loss of the consonant frequency ratios. The percentage differences may seem small and fussing about them could therefore be seen as pedantic, but they are significant when we consider timbre; every harmonic present in a note will also be shifted, and this can cause beats to become more or less prevalent in the perception of sound when chords are played.

There is a central question that remains: why 12 notes to span the octave? Why not 13, 19 or 36, say? The answer is that it doesn’t have to be 12 notes to the octave at all! As we discuss in Sect 2, when we formulate harmony as an ordered phase that arises from the interaction of sounds, all manner of octave divisions arise. The key distinction rests on how we define dissonance, and that will vary from culture to culture. This is a tantalizing insight: when students learn “music theory” they are essentially learning the harmonic style of 17th century European composers. Thinking of harmony in the same way we think about ordered phases in nature brings all intonations observed across cultures to sit at the same table.

The outline of this paper is as follows: In Sect 1 we will delve into the mathematical formulation of dissonance, which leads to the concept of “roughness” and discuss the importance of timbre. In Sect 2, we investigate the claims made by Berezovsky [1] that it is possible to obtain harmony via the interplay of a musical internal energy (our dissonance function) and entropy, mediated by a constant parameter that is analogous to temperature. We give new analytical arguments, confirmed by numerical computation, to explain why the ordered phase has the number of notes that it does, and clearly show that some of the conditions Berezovsky identified as necessary for this formulation to work in fact are not, meaning the model is more robust than previously stated. We also extend his choice of timbre (Sawtooth with no further comment) to generalize the method. In Sect 3 we present our new results and analyze Indonesian intonation using the thermodynamic analogy.

1. Timbre and dissonance

Whenever we hear a “tone” produced by an acoustic instrument we are never only hearing a single frequency. Each note consists of a fundamental frequency f1 of amplitude A1, and a number of harmonics of that fundamental, each with an amplitude and a frequency with n>1. The details of both a(n) and are what specify the timbre of a given note. Timbre explains why the “same” note (meaning a note of a given fundamental frequency f1) sounds different when played by different instruments. In other words, each instrument has its own unique timbre.

We can think of music as a collection of notes or tones that are played sequentially and purposefully in time. How this music is perceived will depend on how a listener perceives the interactions between these notes. In this context, it becomes useful to define the dissonance between notes. Let’s begin by considering only two, pure sinusoidal oscillations. These can only be obtained from an electronic instrument. For those two sounds we can say that “dissonance” quantifies the extent to which they will sound unpleasant or “rough” when played together. Indeed, Helmholtz attempted to define dissonance entirely based on the beat frequency that results from overlapping two pure sinusoidal waves [11]. For those not familiar, a quick review is in order: consider two waves, and , moving to the right. The waves have equal amplitude A, but different wavelengths () and frequencies ():

(1)(2)

where, for brevity we have introduced the wave number and the angular frequency . When these two waves overlap, the principle of superposition tells us that the resulting wave . Using the well known angle difference identity , it is straightforward to show that

(3)

Eq (3) is illustrated in Fig 1 and the bottom plot corresponds to what a microphone would capture if two overlapping pure sinusoids were recorded. We can see from Eq (3) there are two basic frequencies that characterise : First, we can see a fast oscillation at a frequency coming from the sine term, which would be the frequency captured by a microphone. It is important to note that, because the ear functions as a frequency spectrum analyzer, F is not necessarily the frequency a human being will hear. F will only be be the frequency heard provided that |f1f2| is smaller than the resolution of the cochlea. When that happens, a “fused” tone F will be heard. However, as |f1f2| increases above cochlear resolution, f1 and f2 can be interpreted by the ear. This is what allows a listener to hear the multiple notes that form a chord. For greater insight into how perception goes beyond superposition, see [23].

thumbnail
Fig 1. Superposition of two pure sinusoids.

This is a time plot of two plane waves and of different frequency and wavelength but equal amplitude, as well as their superposition taken at the point x = 1 in arbitrary units.

https://doi.org/10.1371/journal.pone.0322385.g001

In addition, Fig 1 shows there is a slower variation in amplitude coming from the cosine term at a frequency . Now, because every cycle of the amplitude envelope contains a crest and a trough, and they both represent a maximum in amplitude (and thus intensity) of the wave, the loudness of the wave will pulse at twice the value of F. Indeed, 2F is known as the beat frequency, fB, between two pure tones of frequencies f1 and f2, defined as

(4)

where the absolute value ensures we are only considering the magnitude of the difference in frequencies. If the beat frequency is not a whole number multiple of either one of the frequencies being overlapped, and yet it is large enough to be perceived as an audible tone, this beat frequency will be perceived as “dissonance”. It should be noted that this is not exactly the same thing as musical dissonance, which is more nuanced and has a more complex meaning and use in music e.g. [23]. The Helmholtz definition refers to roughness; a phenomenon due to the beats that occur when the two sounds are played together. Based on this definition, one would expect that when the two notes are of the same frequency (unison) roughness is zero. As the difference in frequency between the notes increases, so does roughness, peaking not far past unison and then decreasing to zero as the difference in frequency between the two pure sinusoids increases.

The perception of dissonance between two real notes is more nuanced and depends on a number of factors:

  1. The physical superposition between the pure harmonics that comprise the notes - their spectral interference and how the notes beat against each other - and thus the amount of roughness in the Helmholtz sense.
  2. Even then, surprises may arise because beat frequencies may give rise to the perception of sound with little to no roughness. For example, if you play two pure sinusoidal frequencies of f1 = 220 Hz (A3 as it is referred on a piano) and f2 = 330 Hz (a minutely slightly sharp version of E4, which in modern, 12-tone equal temperament corresponds to a frequency of 329.6 Hz ), the beat frequency will be f2| = 110 Hz. This is exactly half the frequency of f1. In music, we would say that fB is exactly one octave below f1, and would create the perception of a bass note of A2. This phenomenon is also known as the “missing fundamental”.
  3. The notes produced by acoustic instruments are never pure sinusoidal oscillations; each one is accompanied by a harmonic series of overtones of different amplitudes which constitute the timbre of the note, and the perception of dissonance when they are played together will depend on the specific timbre of the notes. If we were to play an A3 and an E4 on a guitar tuned to modern 12 tone equal temperament, there would be multiple harmonics beating against each other - including a very good approximation to the missing fundamental at 109.6 Hz - and not all of the resulting beat frequencies would create a sensation of "roughness".
  4. Last and by no means least, the perception of musical dissonance will depend significantly on the upbringing, culture and listening habits of the individual.

We will have to consider the effect of cultural conditioning a bit later; for now, we will only consider the first of the above points as our measure of dissonance, in accord with Plompt and Levelt [2].

If we consider two notes with fundamental frequencies fi and fj, we can define a strength of dissonance interaction between them that we call Dij, and can then write the total dissonance Dtotal as

(5)

where we will be taking the sum over all the tones in the music being considered. In order to calculate this total dissonance we must first find a way to calculate the various Dij coefficients. This, it turns out, can be very subtle and will depend crucially on cultural factors that determine whether or not two notes played together are perceived as dissonant or not.

We can begin by seeking a function that quantifies dissonance between two notes of fundamental frequencies fi and fj. It will be important to know the timbre of these notes, and how strong the interaction of these notes is perceived. If we define a dissonance interaction coefficient , we can write

(6)

where, if two notes overlap in time, the corresponding value of would be higher than if they don’t. In the literature one finds different ways in which we can compute . Following [4, 12] we first calculate the pure tone roughness between two pure frequencies fk and fl of amplitudes A = 1 that have no harmonics, i.e. an = 0 for n>1; these two frequencies would have to be generated electronically. We define this roughness function for pure tones as ,

(7)

where we define the pitch difference, , as

(8)

This definition of is convenient to identify octaves; in music, an octave is a span of frequencies between f and 2f that are perceived as being the “same” note because every harmonic of 2f is contained in the harmonics of f. Here, although we are dealing with pure tones that have no harmonics we can still count octaves conveniently: if , we have that . We can see in the literature that the pure tone roughness function will increase as a function of up to a critical value , after which it will decrease for increasing values of . It is essential that the two-tone roughness satisfies the condition . Otherwise, the implication is that roughness will be perceived for a perfect octave, which is unphysical given that roughness is a consequence of beats. There are many different ways in which authors parametrize the two-tone roughness function in the literature, and it pays to be careful. Plomp and Levelt [2] correctly plot a two-tone consonance function which is zero at both and . They do not give an analytical expression for their function but in various studies several analytical expressions have been fit to the qualitative expectations found in the literature [2]. The following expression for the two-tone roughness was used as a fit in [1]:

(9)

Our aim is to use Eq (9) to calculate the dissonance between two real notes, taking into account their timbres. However, to do so we need to have the value of wc, and there is a subtlety in [1] worth clarifying. Human hearing ranges from a minimum audible frequency Hz to a maximum audible frequency Hz. In [1], it is reported that for frequencies in the range of fmin, Eq (10) yields , while for the higher audible frequencies . It is then proposed that

(10)

However, in the range from 10,000 to 20,000 Hz, Eq (10) yields . We believe this is a typo, and the parametrization for wc that accurately gives the range of observed values reported in [1] is given by

(11)

We remark on this discrepancy because as we will see, how we quantify dissonance plays a crucial role in the analogy we draw between thermodynamics and harmony. We can now finally calculate the dissonance between two real notes by summing over all pairs of pure partials:

(12)

where the Ikl is a parameter given by

(13)

that estimates the perceived “loudness” of the pair of harmonics [4]. It is useful to recall that a given note of frequency fi will have many audible partials, characterized by parameters of amplitude a(n) and frequency which together specify the note’s timbre. As an example, we can use Eq (12) to calculate the dissonance between two real notes. We assume the timbre of the notes is that of a sawtooth waveform, with , and add up to 10 partials to obtain D(x) as shown in Fig 2.

thumbnail
Fig 2. Dissonance between two real notes with a sawtooth timbre.

Plot of the dissonance function D(x) as a function of the pitch difference x. In this case, wc = 0.03, and we see that D is zero for the unison, and zero again at the octaves (x = 1 and x = 2). Vertical dashed lines represent the twelve-tone equal temperament (12 TET) octave division.

https://doi.org/10.1371/journal.pone.0322385.g002

Other minima are clearly visible in Fig 2. For example, the second largest minimum occurs at x = 0.585. This means that , which corresponds to a perfect fifth interval, which is commonly used in music. Other minima occur at other common music intervals, for example when x = 0.322 which corresponds to a major third where , or when x = 0.415 which happens when and corresponds to a perfect fourth, etc.

One interesting result that can be obtained from Fig 2 is that for the sawtooth timbre, there are a total of 12 minima within an octave, each one corresponding to pitches with simple, rational frequency ratios (). We can use the minima in the dissonance function to understand why the octave is divided into 12 pitches in Western music. Other cultures have a different concept of dissonance, and therefore as we will see later on, the minima distribution of the corresponding D(x) is different, leading to octaves that can be divided in 5, 7 and other numbers of pitches.

One final remark is in order: Eq (9) satisfies the necessary condition . However, it does not satisfy . For small values of wc, , but as the value of wc increases this approximation may no longer valid. For example, if wc is small, say wc = 0.03, Eq (9) satisfies the condition that . However, if we increase to wc = 0.22, the condition is no longer satisfied, as shown in Fig 3. This deviation from zero at is something worth investigating to make sure it bears no effect on the results presented in [1]. To do so, we can multiply Eq (9) by an arbitrary factor q(x) chosen to force the zero at :

(14)
thumbnail
Fig 3. Roughness between two pure sinusoidal tones.

Plot of the two pure tone roughness function as a function of the pitch difference for wc = 0.03 and wc = 0.22, where we see that as the pitch difference increases, the two pure tone roughness decreases to zero only for the smaller value of wc.

https://doi.org/10.1371/journal.pone.0322385.g003

If we use Eq (12) to calculate D(x) for the pure two-tone dissonance forced to be equal to zero at x = 1 and compare top the result obtained without this condition being met i.e. using Eq (9), shown in Fig 3, we obtain Fig 4, from which we can see that there is no inherent problem with the fact that Eq (9) is not exactly equal to zero at x = 1.

thumbnail
Fig 4. Roughness between two notes.

Plot of D(x) for both (blue line) and (orange line).

https://doi.org/10.1371/journal.pone.0322385.g004

2. Music as a thermodynamic system

Thermodynamics is a powerful framework that allows us to understand how order arises from disordered states of matter via the minimization of the Helmholtz free energy F given by:

(15)

where U is the internal energy of the thermodynamic system, and TS is the amount of that energy that is disordered, where S is what we call the entropy and T denotes the thermodynamic temperature, which is a parameter that mediates the tradeoff between the decreasing U and the increasing S which yields a minimum value for F. The tantalizing idea explored by Berezovsky was that it might to be possible to quantify harmony in a similar way: ordered phases of sound arising from disordered sound. In this case, the key point was to introduce a musical entropy calculated in terms of information - the number of notes per octave, and to allow it to grow against a decreasing musical energy he identified as the total dissonance, calculated using Eqs (6) and (12). In this calculation, we are therefore minimizing the value of a “musical” free energy FM given by

(16)

where T is a parameter we call temperature by analogy with thermodynamics, which in this case mediates the tradeoff between Dtot and S.

Clearly, the thermodynamic formulation presented above follows from an analogy between music and physics, not form any formal derivation. While the identification of entropy is consistent with information content formulations e.g. Shannon entropy (see also [28, 29]), the identification of energy as total dissonance can only be justified heuristically. As shown in the following sections, this analogy results in the emergence of evenly spaced frequency intervals, which hints at the relevance of the approach in the context of music. Presently however, the underlying foundations of the analogy remain the subject of speculation. A formal derivation of the exact interpretation of dissonance (in the sense of music) as energy will necessarily require a more thorough understanding of the problem in terms of the physiology of musical perception and the conscious perception of the problem, which lie beyond our current study. We note that broad correspondences and affinity of structure between disciplines as apparently remote as arts and sciences have been evidenced before, e.g. the universality of rank-ordering distributions shown by [25], the compatible spectral density of fluctuations in music and various physical phenomena studied in [26], the departures from such spectral densities presented in [27], or the rank-ordering distribution in various musical styles analysed in [24].

Calculation of the probability distribution function of relative pitches

The idea is to calculate P(x) - the distribution function of relative pitches x that occur in music - such that the free energy given by Eq (16) is minimized, subject to the constraint that P(x) is normalized. The relative pitches x in music, are defined as

(17)

We calculate the total dissonance Dtot as

(18)

where Dp is for now defined by adding over all octaves as:

(19)

We will explore altering this definition later. Eq (19) is illustrated in Fig 5. Similarly, the entropy is written as:

(20)
thumbnail
Fig 5. Total Dissonance between two real notes with a sawtooth timbre.

Plot of the total dissonance function DP(x) as a function of the pitch difference x. In this case, wc = 0.03, and we see that D is zero for the unison, and zero again at the octaves (x = 1 and x = 2). We also note the symmetry about the point x = 0.5

https://doi.org/10.1371/journal.pone.0322385.g005

The functional we must minimize then becomes

(21)

subject to the normalization condition

(22)

Using a Lagrange multiplier , we want to calculate an extremum of a new functional given by

(23)

This gives

(24)

where

(25)

putting Eq (25) into (24), we obtain

(26)

setting the Lagrange multiplier we then obtain that

(27)

where P0 is a normalization constant given by

(28)

and therefore, the equilibrium distribution of relative pitches that minimizes the musical free energy, P(x), is given by

(29)

Eq (19) ensures that a given musical system is preserved across octaves so we can define  +  , where n is an integer, i.e. a melody will be recognized as the same when played in any octave. Now, because Dp(x) is an even function of x, it follows that its Fourier expansion may be written as

(30)

If we insert Eq (30) into (29), we have:

(31)

where

(32)

and

(33)

3. Results

Periodicity Prediction for P(x)

Both of the integrals given in Eqs (32) and (33) are seen to be approximately constant and of O(1) over a wide range of values of n. The presence of an audible upper limit of Hz implies that in practice, the sum in Eq (31) will not have an infinite number of terms. Intuitively, one would expect that the behavior of the entire P(x) will be dominated by the terms in the sum with the largest amplitudes. Given the exponential function in Eq (29), the largest term present in Eq (30)will significantly dominate over all others. Indeed, at sufficiently high temperatures, all terms in the sum (Eq (31)) will be much smaller than 1, and P(x) will tend to a constant. As the temperature decreases, the largest term in the sum, nmax set by DP(x) will cross the unity threshold and dominate over all other terms, leading to an nmax-periodic P(x) solution. It is then expected that below a certain critical temperature, Tc2, to very good approximation over the entire range , we can write

(34)

where dmax is the maximum Fourier coefficient of Dp and nmax tells us for which harmonic this maximum occurs. Above Tc2, P(x) will tend to a constant.

This simple analytical interpretation has been borne out by various numerical experiments shown in Figs 6, 7, and 8.

thumbnail
Fig 6. Fourier spectrum of as a function of .

Plot of the Fourier representation of DP(x) as a function of k with a sawtooth timbre structure and harmonic amplitudes decreasing as 1/n, for three values of wc, and we see that largest Fourier coefficient is the 19th when wc = 0.018, the 12th when wc = 0.03 and the 5th when wc = 0.07.

https://doi.org/10.1371/journal.pone.0322385.g006

thumbnail
Fig 7. P(x) for selected temperatures for 0.018, 0.03, and 0.07.

Note that as temperature increases, the solution tends toward periodic with a single dominant Fourier coefficient. In the wc = 0.018 case, . In the wc = 0.03 case, . In the wc = 0.07 case, .

https://doi.org/10.1371/journal.pone.0322385.g007

thumbnail
Fig 8. Lowest twelve Fourier harmonic amplitudes from the decomposition of P(x) over a range of temperatures using 0.03.

Between the two critical temperatures and , the dominant component contributing to P(x) is kP = 12, leading to the periodic solution seen in Fig 7.

https://doi.org/10.1371/journal.pone.0322385.g008

These plots illustrate that the number of peaks of P(x) corresponds to nmax, and therefore depends on the value of wc. Indeed, the width of the peaks of DP(x) is directly proportional to the value of wc, and therefore the value of wc also determines how many notes will fill an octave.

As we have shown numerically, the number of peaks in the resulting P(x) sensitively depends on wc. This is clear from the analysis presented, as a small wc implies narrower features in both D(x) and DP(x). Indeed, the width of the leading features near x = 0 and x = 1 of these functions scale directly with wc. Narrow features in D(x) and DP(x) imply higher frequency spectral features and hence larger dominant Fourier components in DP(x). The above in turn leads to P(x) solutions with a higher frequency P(x) periodicity. Conversely, going to larger wc values results in lower Fourier frequency DP(x) decompositions, and hence lower frequency P(x) periodicity, as shown explicitly in the examples above. See reference [1].

Confirming Berezovsky’s, results

To confirm the claim that P(x) will be dominated by the predicted Fourier coefficient as described above and shown in Fig 7, we performed temperature scans for low, intermediate and high wc values. The results are summarized in the Fig 7.

Once P(x) has been numerically calculated, we performed a Fourier decomposition of P(x), arriving at a series of amplitudes for the integer harmonics kP = 1 − 12. Fig 8 shows the amplitudes of these amplitudes for wc = 0.03 over a range of temperatures. As also demonstrated by Berezovsky, we see the emergence of two critical temperatures, and . For temperatures lower than , all kP contribute to P(x). As the temperature increases above , we see the dominance of , as predicted by the analysis in Results Sect 1. For , , so kP = 0 (the constant coefficient, not plotted in the Fig 7) is the only remaining nonzero component.

Is the model robust to timbre variations?

We next investigate whether varying the timbre affects the results significantly. In addition to the sawtooth timbre already discussed, we run the same model for three alternate timbres (triangle, square and an example of human voice), keeping the periodicity over one octave, as prescribed by Eq (19) .

The square and triangle wave harmonic series notably contain only odd harmonics. Disregarding constant multiplicative factors, these harmonic amplitudes can be written:

In all these tests, we include harmonics only up to n = 10, as we did earlier with the sawtooth wave. As a final timbre test, we measured the spectrum of a male human voice singing the long-“e" vowel sound and measured the first 10 harmonic amplitudes to be

taking a(1) to be unity by definition. This vocal sample was sung by Roey Ben-Yoseph in the album “A Sky Full of Ghosts" by the band Sonus Umbra where both A. Tillotson and L. Nasser performed and produced.

In fig 9 we show the calculated P(x) for all four timbres at two temperatures, one slightly below critical temperature , and another slightly above . We can see from these comparisons that the model is fundamentally robust to a variety of integer harmonic timbres. However, we note that the transition rates around the critical temperatures can vary significantly with timbre, and if more than 10 harmonics are used in the calculation of DP(x), we do start to see some significant qualitative departures from the results shown in Fig 9.

thumbnail
Fig 9. Calculated P(x) for four different harmonic timbres at two temperatures, one slightly below critical temperature (left), and another slightly above (right).

From top to bottom, the timbres used were sawtooth, triangle, square, and human voice.

https://doi.org/10.1371/journal.pone.0322385.g009

Relaxing the summation constraint

We next investigate what happens with this model if we relax the constraint described in Eq (19). Note that since we are summing over positive and negative octaves, this definition of DP(x) is symmetric about the midpoint of its domain, x = 0.5. That is, . For this test we simply set , which means the symmetry of DP(x) about the domain midpoint is now no longer present. Since we would no longer expect that the resulting P(x) would necessarily be periodic over the octave, we extend the domain of our calculation arbitrarily to three octaves, in order to investigate whether the octave periodicity deteriorates.

We once again return to the sawtooth waveform and perform a temperature scan over the three octave domain, setting wc = 0.03. Fig 10 shows a few of these temperatures. These results show that even if we relax the octave constraint in the definition of DP(x) and in the chosen domain size, we still see a tendency toward 12 pitches per octave. We can also see that P(x) remains larger at the domain boundaries than in the middle, even for high temperatures where we usually expect to find a flat P(x). This is possibly because at those extreme x values, pitches can no longer “interact over the periodicity." That is, pitches just above x = 0 simply do not interact with pitches just below x = 3, and so less dissonance “piles up" there. By this logic, it also makes sense that P(x) tends toward a global minimum near the middle of the domain.

thumbnail
Fig 10. P(x) over 3 octaves for the sawtooth waveform and wc = 0.03.

In this case, we do not impose periodicity or symmetry over the octaves, and instead simply define .

https://doi.org/10.1371/journal.pone.0322385.g010

To justify this claim, we perform another test where we only force DP(x) to be symmetric about the midpoint of the domain by defining

over the domain from x = 0 to 3. The resulting P(x) are shown in Fig 11. Note that these are exactly the same results as the full octave summation (Eq (4) in [1]), shown in Fig 7, just repeated over three octaves. As predicted, by reintroducing only this particular DP(x) symmetry, we no longer see P(x) increasing close to the domain boundaries. Both of these results seem to support that the octave simply emerges as “natural" for an integer harmonic series, even when we do not sum over the octaves or impose a pure octave domain.

thumbnail
Fig 11. Same conditions as Fig 10, but forcing to be symmetric about the domain midpoint.

https://doi.org/10.1371/journal.pone.0322385.g011

Gamelan

We will now explore a special group of instruments from the Indonesian Gamelan tradition, which primarily features metallic percussion instruments, where unlike the case of instruments built to exploit standing waves in pipes and strings, non-integer harmonics arise. The bonang and saron are central to this ensemble, with the bonang made up of small gongs and the saron consisting of metal bars, both arranged horizontally on racks. Each gong or bar is precisely tuned to a specific pitch and played by striking it with either padded or hard mallets. The saron typically plays the “balungan", or core melody, which serves as the foundation of a gamelan piece, while the bonang and other instruments add embellishments around this central structure. By contrast with traditional Western musical instruments, largely dominated by strings and pipes, the spectra of Gamelan instruments reveal peaks that do not occur in integer ratios, as the vibrating element is not a 1-D system, but a more complicated 2 and 3-D structure [7].

The prevailing musical scales of the Gamelan system are the five-pitch Slendro and the seven-pitch Pelog, both of which are significantly different in character to the Western chromatic divisions. Both of these scales are unevenly spaced within the span of a single octave, and the periodicity of the scale is usually not even marked by standard octave divisions (integer multiples, or 1200 cents), but rather an amount that is usually slightly larger (around 1210 cents). See [4], page 213. In addition, the precise values of scale pitches sometimes vary from octave to octave.

We wish to investigate whether our model can reproduce these scales. For the bonang harmonics, we have used

For the saron,

both of which are based on data reported in [4]. Since we do not wish to impose the octave as a natural musical division, we will once again allow our domain to span three octaves, and we will do no summing over the octave. In other words, once again we will simply set .

We can see in Fig 12 the predicted maximum negative Fourier coefficient per octave, of DP(x) for both of these harmonic series as a function of wc. The Slendro 5-pitch scale is associated with the bonang, and so we will choose a wc value for which (wc = 0.05). The results of this P(x) calculation for the bonang harmonic series are shown in Fig 13. We can see clear evidence for something similar to the 5-pitch Slendro scale, the standard intervals of which we have indicated with dashed vertical lines. Interestingly, we can also see that the octave is stretched by about 18 cents, consistent with observations of typical Slendro tunings [4].

thumbnail
Fig 12. Maximum negative Fourier coefficient, , of , as a function of wc, and thus a prediction of the periodicity of P(x) over the octave for each harmonic series, bonang and saron.

https://doi.org/10.1371/journal.pone.0322385.g012

thumbnail
Fig 13. Select regions of the three-octave P(x) for for each harmonic series, bonang and saron.

Dashed lines indicate five pitch Slendro scale divisions on the left, and seven pitch Pelog scale divisions on the right.

https://doi.org/10.1371/journal.pone.0322385.g013

For the saron, which are typically tuned to the 7-pitch Pelog scale, we can see there are no values of wc that correspond to . Fig 12, right, shows that the majority of wc values predict . When we choose one of these values (wc = 0.04), we do indeed see the emergence of 9 peaks in the P(x) prediction, as shown in Fig 13. Typical tunings of the uneven Pelog 7-pitch scale are shown with dotted lines, and it seems that these lines align fairly well with a selected 7 of them. It tantalizingly suggests that the tuning of the Pelog scale instruments may be based on a subset of a chromatic 9-pitch scale, similar to the way that the major scale in Western harmony is a 7-pitch subset of a 12-pitch chromatic scale, as has been previously hypothesized by Braun [22], see also [21].

Conclusion

In this paper, we have investigated in detail various aspects of the method proposed by Berezovsky [1] where the tools of statistical mechanics that are used to describe emergent order in phase transitions can also be used to show how harmony arises as an ordered phase of discrete pitches of sound. We have sought to clarify that the choice of timbre is important to determine the efficacy of the method (in [1] only a sawtooth timbre is used without further comment or explanation for the choice). We have corrected some typos in [1] that are relevant to the calculation of the Dissonance function, and given intuitive analytical arguments to predict how the emergent ordered phase depends on largest Fourier coefficient of Dp(x). We have further extended the model beyond the summation constraint over octaves, and have shown that the octave simply emerges as “natural" for an integer harmonic series, even when we do not sum over the octaves or impose a pure octave domain, when considering instruments with integer harmonics such as pipes or strings.

While have generalized and explored further the ideas of Berezowsky, we have not reached a full understanding of the basis of the thermodynamical musical analogy, particularly regarding the identification of dissonance with energy. It is one of the purposes of this paper to help bring these ideas to a wider research community in the hope of fostering progress precisely on this point.

We have also applied the method to see how it can be applied to Gamelan musical systems that are explicitly non-periodic in octaves, and which therefore appear to fall beyond the purview of the method, and have shown it can still accurately capture these systems of tuning, allowing the possibility of interpreting that the 7-note pelog instruments could be understood as a subset of a larger 9 note scale, much in the same way that the major scale in Western intonation is a subset of a larger 12 note partition of the octave. This result is particularly interesting because traditionally, what students learn in a Harmony class is strictly speaking the harmonic style of 17th century European composers. It is our hope that this first paper allows us to bring attention to the fact the thermodynamic framework proposed in [1] is much more robust and powerful than it may seem at first; it allows us to understand multiple tuning systems used across human history and culture as a natural outcome that only depends on the details with which said cultures perceive dissonance, and therefore brings all forms of music culture to have an equal seat at the table. After all, the results show they are in essence no different from the natural processes that give rise to order in chemistry and biology that have long been understood in terms of the minimization of a free energy.

Supporting information

S1 Computer codes.

Zip file containing all the code used to generate the figures presented.

https://doi.org/10.1371/journal.pone.0322385.s001

(ZIP)

Acknowledgments

The authors wish to thank tow anonymous referees and the editor of this paper for abundant constructive criticism leading to a more clear and complete final version of our work.

References

  1. 1. Berezovsky J. The structure of musical harmony as an ordered phase of sound: A statistical mechanics approach to music theory. Sci Adv. 2019;5(5):eaav8490. pmid:31114802
  2. 2. Plomp R, Levelt WJ. Tonal consonance and critical bandwidth. J Acoust Soc Am. 1965;38(4):548–60. pmid:5831012
  3. 3. Schoenberg A. Theory of harmony. 100th anniversary ed. ed. University of California Press. 2010.
  4. 4. William A. Sethares, tuning, timbre, spectrum, scale, 2nd edn. Springer-Verlag London Limited; 2005.
  5. 5. Xu F. Zhu Zaiyu and the equal temperament. In: Jiang X, editor. The high tide of science and technology development in China. Singapore: Springer. 2021.
  6. 6. Van de Stevin S. Spiegheling der singconst. In: Rudolf R. editors. The Diapason Press; 30 June 2009. Archived from the original on 17 July 2011 [cited 1212 Mar 20]. diapason.xentonic.orgorg
  7. 7. Benson D. Music: a mathematical offering. Cambridge University Press. 2006.
  8. 8. Buechele R, Berezovsky J. Renormalization-group approach to ordered phases in music. Phys Rev E. 2024;110(1–1):014145. pmid:39161006
  9. 9. Gill K, Purves D. A biological rationale for music scales. PLoS One. 2009;4.
  10. 10. Marjieh R, Harrison PMC, Lee H, Deligiannaki F, Jacoby N. Timbral effects on consonance disentangle psychoacoustic mechanisms and suggest perceptual origins for musical scales. Nat Commun. 2024;15(1):1482. pmid:38369535
  11. 11. Helmholtz H. On the sensations of tone. New York: Dover Publications. 2013.
  12. 12. Sethares WA. Local consonance and the relationship between timbre and scale. The Journal of the Acoustical Society of America. 1993;94(3):1218–28.
  13. 13. Lerdahl F, Jackendoff R. A generative theory of tonal music. MIT Press. 1983.
  14. 14. Barbour JM. Tuning and temperament: a historical survey. Counter Corporation. 2013.
  15. 15. Surjodiningrat W, Susanto A, Sudarjana P. Tone measurements of outstanding Javanese gamelans in Jogjakarta and Surakarta. Gadjah Mada University Press. 1972.
  16. 16. Morton D, Duriyanga C. The traditional music of Thailand. University of California Press. 1976.
  17. 17. Tymoczko D. The geometry of musical chords. Science. 2006;313(5783):72–4. pmid:16825563
  18. 18. Youngblood JE. Style as Information. Journal of Music Theory. 1958;2(1):24.
  19. 19. Acouturier J, Miranda E. The hypothesis of self-organization for musical tuning systems. Leonardo Music J. 2008;18:63–9.
  20. 20. Rigden JS. Physics and the sound of music. Wiley; 1972. p. 243–7.
  21. 21. Rahn J. Javanese Pélog Tunings Reconsidered. Yearb Int Folk Music Counc. 1978;10:69–82.
  22. 22. Braun M. The gamelan pelog scale of central java as an example of a non-harmonic musical scale. NeuroScience-of-Music.se. 2006.
  23. 23. Heller E. Why you hear what you hear: an experiential approach to sound, music and psychoacoustics. Princeton University Press; 2013.
  24. 24. Martínez-Mekler G, Alvarez Martínez R, Beltrán del Río M, Mansilla R, Miramontes P, Cocho G. Universality of rank-ordering distributions in the arts and sciences. PLoS One. 2009;4(3):e4791. pmid:19277122
  25. 25. Beltrán del Río M, Cocho G, Naumis GG. Universality in the tail of musical note rank distribution. Physica A: Statistical Mechanics and its Applications. 2008;387(22):5552–60.
  26. 26. Voss R, Clarke J. 1/f noise in music: music from 1/f noise. Acoust Soc Am. 1978;63:258–63.
  27. 27. González-Espinoza A, Martínez-Mekler G, Lacasa L. Arrow of time across five centuries of classical music. Phys Rev Res. 2020;2:033166.
  28. 28. Useche J, Hurtado R. Melodies as Maximally Disordered Systems under Macroscopic Constraints with Musical Meaning. Entropy (Basel). 2019;21(5):532. pmid:33267246
  29. 29. Sakellariou J, Tria F, Loreto V, Pachet F. Maximum entropy models capture melodic styles. Sci Rep. 2017;7(1):9172. pmid:28835642
  30. 30. Ottman RW. Elementary harmony; theory and practice. Englewood Cliffs, NJ: Prentice-Hall. 1970.
  31. 31. Poudrier È, Bell BJ, Lee JYH, Sapp CS. The Influence of Dissonance on Listeners’ Perceived Emotions in Rhythmically Complex Musical Excerpts. Auditory Perception & Cognition. 2024;7(4):291–318.