How Your Voice Conveys Your Gender
One of the more researched areas of voice and speech is the distinction between male versus female voices and speech patterns. Much work has gone into both analyzing the differences and attempting to apply these findings into practice to change the perceived "sex" of the listener. Let's talk a little about what has been discovered so far!
Sex-Based Differences
Determined From Conception
Biology plays the primary role in determining how our voice and speech sound, and it starts at conception. Once the human embryo's sex is determined as either "male" or "female" based on chromosomes, a cascade of changes occur at around 8 weeks gestational age (in the womb), including development of genitalia as well as dormant developmental instructions stored within our DNA that will later direct the development of secondary sex characteristics (including sex-based differences of the upper airway and vocal tract).
The precise timing of an embryo's determination of which primary sex characteristics it will have is still not fully understood. Embryos are identical up until about 8 weeks of gestation. If the egg was fertilized with the X chromosome then the embryo continues to develop with female structures (the "default"); however, if it was fertilized with the Y chromosome a hormonal "switch" happens and male structures develop in place of female--at least if nothing goes awry.
Childhood Vocal Development
The vocal tract matures at about the same rate as the rest of the body until puberty. Interestingly, both sexes' vocal tracts develop at generally the same rate/size, although males may have slightly wider vocal tracts (Vorperian et al., 2009). Males, however, tend to have higher fundamental frequencies (F₀) than do females of the same age (until puberty)! An infant's fundamental frequency at birth is around 500 Hz. By age 8 it drops to about 275 Hz.
Puberty is when things really start to diverge between the sexes. The female vocal tract continues to grow in proportion to the rest of the body, reaching an average maximum length of 144mm, with vocal folds about 12-21mm in length. Males, on the other hand, reach a vocal tract length of around 156 mm (about 15% longer than females) and vocal fold length about 17-29mm (about 60% longer than females) (Fitch & Giedd, 1999). That's a big difference when it comes to sound!
The growth of the vocal fold length decreases the F₀ of adult females to an average of 220 Hz, with males dropping all the way down to 130 Hz by age 18! (Titze, 1994).
The Adult Vocal Tract
By adulthood the larynx rests lower in the throat, and the size and shape of the cavities above it are altered, affecting the resonance of the voice. The vocal folds in males have become thicker and allow better closure when voicing, which results in less breathiness than females and also produces a richer sound.
As I mentioned in What is Pitch?, pitch (F₀) and resonance are the two biggest cues listeners use when labeling the sex of the speaker (Whiteside, 1998), so with the thickening of the vocal folds and enlargement of the resonating cavities above them, the speech signal is primed to be differentiated as a "male" versus "female" voice.
A Little More About Resonance
Once the vocal tract has matured, the differences in its length and shape alter the resonance accordingly. Kreiman & Sidtis (2013) provide a good summary of the science behind this. A longer vocal tract (e.g., of an adult male) has resonances of a lower frequency than does a shorter vocal tract (e.g., of an adult female). The resonances of the vocal tract are called formants.
When the vocal folds vibrate, they set the air above them in the vocal tract into vibration as well. The resonances of the vocal tract become excited by this, enhancing and dampening the source energy (vibration of the vocal folds).
We can change the formants, or tuning of the resonating cavities, by changing the shape of the vocal tract--protruding the lips, opening the jaw, tensing the muscles of the throat, etc. As I said above, a longer vocal tract has lower resonant frequency, so to raise that frequency we would need to shorten the vocal tract, for instance by smiling.
When the vocal folds vibrate, they set the air above them in the vocal tract into vibration as well. The resonances of the vocal tract become excited by this, enhancing and dampening the source energy (vibration of the vocal folds).
We can change the formants, or tuning of the resonating cavities, by changing the shape of the vocal tract--protruding the lips, opening the jaw, tensing the muscles of the throat, etc. As I said above, a longer vocal tract has lower resonant frequency, so to raise that frequency we would need to shorten the vocal tract, for instance by smiling.
"Male" versus "Female"
Females have formants that are about 20% higher in frequency than males. Whiteside (1999) also found many characteristics in speech patterns that differentiated males and females. Women tend to have more exact pronunciation. They omit sounds less often (e.g., "noth-ing" versus "noth-in'"). A review by Smith (1979) discusses differences in use of grammar, vocabulary, and speech rate between males and females.
Although the above has been shown to distinguish the sexes, evidence suggests that only altering the above without addressing pitch (F₀) and formant frequencies does not change listener perception from one sex to the other. Similarly, only altering F₀ does not change perception of the sex of the speaker (Coleman, 1971). So, F₀ is important, but by itself is insufficient. There needs to be a combined change in:
- F₀ (fundamental frequency)
- Formants
- Articulation
- Prosody (the patterns of stress an intonation in speech)
Avery & Liss (1996) asked women to rate the "masculinity" of males voices, and found that a higher F₀ by itself was not enough to make speech sound less masculine (or more feminine). Instead, a combination of higher F₀, larger and faster pitch variations (inflection), rising intonation contouring (posody), and more precise articulation were found to be more feminine sounding, adding weight to Coleman's (1971) findings.
Perception in Sex of Transgendered Speakers
Transgender males benefit from the changes to the vocal folds brought on by horomones/androgen, which thicken the folds and result in lowering the F₀. As we've talked about, lowering the F₀ by itself won't necessarily affect perception of the voice as being from a "male", but it absolutely helps! As of right now we have seen no change in the length of the vocal tract from hormone/androgen use which would cause formant frequencies to naturally drop.
There is a bit more work involved for transgender women; female hormones have not been shown the alter the size or shape of the vocal tract. It has been reported that a F₀ of at least 155 Hz is necessary to achieve a feminine perception (Wolfe, Ratusnik, Smith, and Northrop, 1990), in conjunction with the other variables discussed above.
In Conclusion
So we can summarize our discussion by saying the the shape and size of the vocal tract is determined at conception, but the changes that distinguish a "male" versus "female" voice don't arise until puberty. At this time the male larynx and vocal folds grow out of proportion to the rest of the body, and become larger than in females. The structures of the vocal tract above the vocal folds also grow longer and larger in size, which affects resonance, or more precisely formant frequencies.
These changes in conjunction with variations in articulation and prosody are some of the most important cues listeners pick up on when determining the sex of the speaker.
I hope you enjoyed reading and learned a thing or four! Please post comments or questions below! In the future we'll discuss more specifics regarding altering speech and the voice to change the perception of the voice as "male" or "female". See you next time.