(adapted from section 6, "Respiratory, postural and spatio-kinetic motor stabilization, internal models, top-down timed motor coordination and expanded cerebello-cerebral circuitry: a review".)
Humans are
unique vocalizers
The art of vocalization is
widespread amongst animals. Birds are, in
particular considered, to be exquisite songsters. While speech, is
appreciated
to be biologically unique to humans, human song, in an albeit hidden
way, is
also biologically unique. Bird song, unlike humans song, is done with
minibreaths between each syllable (with the exception of high frequency
"trills" at ca.30s1 in
canaries, and 16s-1
in cardinals (Suthers, Goller et al. 1999)). Birds can do this because
they use
a different respiratory apparatus to mammals. This employs anterior and
posterior air sacs to create a unidirectional airflow through their
lungs, and
this allows for the insertion of such minibreaths in between their song
notes (Suthers,
Goller et al. 1999).
Such minibreaths suggest bird
vocalization is built upon low-level
reflexive processes that ensure adequate concurrent respiration.
Humans,
whether in speech or song, in contrast, produce multiple vocalization
upon prolonged
single out-breaths, a phenomena called “thoracic breathing”, that in
terms of
normal respiration is distinct from the everyday nonvocal and reflex
controlled
form of “quiet” respiratory breathing (Ladefoged 1960; Hixon 1973;
Proctor
1986; Provine 1996; MacLarnon and Hewitt 1999; Ghazanfar and Rendall
2008). As
with the human uniqueness in bipedality and dexterity, this respiratory
phenomena is argued here to link to an unique competence in accurately
timed
motor stabilization (in this case the stabilization of subglottal
pulmonary
pressure) that results from expanded cerebello-cerebral circuits and
internal
modeling that overrides a lower level of preflex and reflex motor
control.
Further, this capacity for such timed control in vocalization, even
more than
for dexterity and bipedality, allows for the construction of the novel
kinds of
complex motor executions; in this case, the interarticulator actions in
the
vocal tract that create the phonetic features that provide different
phones
with their distinct phonetic identities (Lofqvist and Gracco 1999).
There exist several unique
related traits in human vocalization.
Hierarchically
stringing of units
Humans generate vocalizations
strung together at several levels of
hierarchical organization. Such vocalization can be made up of speech
phones
(syllables, words, clauses, sentences), or song notes (beats, meter,
phrases,
melodies).
Diverse
recombinable units
Human vocalization is done in
regard to a large set of recombinable
phone/note units (most languages contain 20 to 45 vowel, consonant, and
diphthong phones; there are 12 semitones in an octave and most singers
can
range across several). The International Phonetic Alphabet
(International
Phonetic Association 1999) lists for consonants 12 places and nine
types of
articulation that can be either voiced or unvoiced (plus five types of
anterior
release clicks); for vowels it lists five positions and seven manners
(plus
being rounded or not). In addition, it notes the existence of three
kinds of
suprasegmentals stress (seven types), tone (15 types), and intonation
(four
types). Such features create a large pool of potential phones: for
example, in
one sample of 317 human languages, there were 757 different kinds of
phones (Maddieson
1981).
Diverse uses
and modes of production
Humans modify and use their
vocalizations in diverse mastered ways as
distinct as falsetto, esophageal speech (after laryngectomies),
yodeling,
whistle speech, throat singing, and entertainment ventriloquism.
Further, some
human hunters learn to imitate the vocalizations of their prey to stalk
them (Willerslev
2004).
The various components of human
vocalization provided by the lung
(pulmonary), glottis (vocal cords), larynx, and supralaryngeal vocal
tract can
be isolated, omitted, or used for other purposes. Humans can, for
example,
speak without glottal phonation, as in whistle speech, or without
normal pulmonary
pressure and phonation as in esophageal speech or buccal-source speech
(also
called ‘Donald Duck’ talk). (In this, the vocal tract is partially
blocked by
the back of the tongue, and teeth, cheeks, and oral pressure is created
by the
tongue that causes the arches in the back of the mouth to vibrate
(Smith 1994:
p. 4221). Following spinal injuries, the pulmonary control of thoracic
breathing can shift to being based upon the diaphragm without employing
the
normally used abdominal and intercostal muscles (Meyer 2003).
Respiration
control is used for nonvocalization activities such as playing woodwind
and
brass instruments. In addition to such respiratory control,
saxophonists and
clarinetists can modify their instrument’s sound in the altissimo
register by
changing their vocal trait resonance (Fritz and Wolfe 2005; Chen, Smith
et al.
2008).
Unique
amongst primates
Human vocal capacities are of
particular biological of interest because
no nonhuman primate makes any comparable vocalizations. This is in
spite of
nonhuman primates already having many of the required competences: they
can
produce singularly some of the phonetic units of human speech (Richman
1976),
hear them (Steinschneider, Arezzo et al. 1982), and if trained, can
comprehend
the pronunciation of spoken words (Savage-Rumbaugh and Lewin 1994), and
intersperse vocalizations with human and other conspecies interactors
in a
conversational manner (Savage-Rumbaugh, Fields et al. 2004). However,
even with
these vocal-related advantages, while nonhuman primates can be tutored
to
communicate with gesture and sign-board based languages, they cannot be
tutored
to talk (Hayes 1951). The language tutored, Kanzi, for example, is no
more able
in his vocal interactions than to contextually modulate the spectral
and
temporal features of his vocalizations—a notable contrast to his
considerable
abilities to communicate manually with a sign board (Taglialatela,
Savage-Rambaugh et al. 2003). This is odd since gesturing and sign
board
pointing would seem of comparable motor complexity to speech, and,
nonhuman
primates already use vocalization (unlike sign boards) to communicate.
Indeed,
evolution has enhanced nonhuman ape vocalization in a manner not found
in
humans in the form of vocal sacs (Nishimura, Mikami et al. 2007;
Ghazanfar and
Rendall 2008). The shape of the hyoid bone in a partial Australopithecus
afarensis skeletal suggests interestingly that preHomo
hominins also
might have possessed such vocal sacs (Alemseged, Spoor et al. 2006, p.
300).
Such vocal sacs enable chimpanzees (and perhaps other apes) to produce
very
loud piercing calls that in the case of chimpanzees are made of two
simultaneous tones that are three octaves distant from each other
(Yerkes and
Learned 1925, pp. 61-62).
Much less is understood about
motor stabilization in human vocalization
than for bipedality and dexterity, (it is known though that the vocal
articulators
adjust quickly after perturbation (Gracco and Löfqvist 1994)). Research
upon
attempts to teach higher apes to make voluntarily vocalizations suggest
a link
to an unique human ability to control the respiratory/vocal tract
musculoskeletal system. There are two such accounts (Furness 1916;
Hayes 1951);
both report difficulties in directly controlling the vocal apparatus.
The
account provided upon Viki is most detailed.
Viki could create some speech
sounds but this depended upon her first
being prompted with external help (Hayes 1951). Keith (her human speech
tutor)
trained Viki by positioning his fingers in her mouth to open and shut
them to
form speech syllables. This was because Viki could make an “asking
sound” but without
such external help she could not modify it on her own into other
sounds. As his
wife Catherine Hayes noted in her book upon Viki (1951: p. 67.): "She
soon
got the idea and began to inhibit her asking sound until Keith’s
fingers were
on her lips. If he was too slow in getting ready, Viki often took his
hand and
put it in the helping positions". Much earlier William Furness (1916)
reported upon his attempts to teach an orangutan. In order to say
“cup”, he
used a spatula to push her tongue make to the /k/ phone: “after several
lessons
.. she would draw back her tongue to the position even before the
spatula had
touched it, but she would not say ka unless I place
my finger over her
nose. The next advance was that she herself would place my finger over
her nose
and then said it without any use of the spatula” (Furness 1916, p.284).
To take the case of Viki, she
could create the pulmonary pressure and
phonation needed for a particular “asking” vocalization, and she could
also manipulate
her lips to create a different one (as evidenced when triggered to do
so by
Keith’s hand). What she could not do, or found very difficult, was
combine them
as independent motor elements so she could pronounce on her own a new
type of
nonevolved vocalization. The nearest she could do was use another part
of her
motor system (her hands) to get hold of Keith’s hand to reshape her
mouth, and
so use this indirect and external means to control her vocal
articulation. A
similar phenomena seems also to have characterized the attempts of
Furness’
orangutan to vocalize. This suggests that nonhuman apes have problems
unlocking
the separate musculoskeletal elements that make up the vocalization
chain to
create the motor coordination that underlies the motor production of
human
speech. That the nonhuman vocal chain should be locked in this way
makes
evolutionary sense in the view of the critical importance of the links
of
respiration to cardiovascular and locomotion (Lee and Banzett 1997),
and that
the larynx is involved not only in phonation but also in several
survival
critical reflexive actions such as swallowing, respiration and cough
(Ludlow
2005).
Reflecting this innate locking,
while breathing is under voluntary
control in humans (Loucks, Poletto et al. 2007; Simonyan, Saad et al.
2007), it
is difficult to train in nonhuman primates such as chimpanzees (Hayes
1951: p.
69). Humans also seem unique in related voluntary respiratory abilities
such as
suppressing and voluntarily activating (in the absence of sensory
triggers)
coughing and sniffing (Simonyan, Saad et al. 2007). Nonhuman
vocalizations,
when made, moreover, are nearly always done in emotional contexts and
performed
in a highly stereotypically and a genetically determined manner. This
is
evidenced in the strong correlations that exist between the
vocalizations of
chimpanzees and bonobos (in spite of them being two species), a
correlation
that does not exist, in contrast, for their manual gestures (Pollick
and de
Waal 2007). The human brain control needed for voluntary respiratory
such as
that for exhalation and the production of sound syllables also seem to
be
closely related in that they involve similar cerebello-cerebral circuit
activations (except for the auditory cortices) (Loucks, Poletto et al.
2007).
Subglottal
pressure stabilization
To control pulmonary pressure
requires that thoracic muscles can
stabilize lung exhalation as a separate motor control element in a time
sensitive manner from the later ones in the vocal chain involved in
phonation
(voicing), vocal resonance change (vowels), and its gestural
modification
(consonants). There is here a direct parallel with anticipatory
adjustment used
in human bipedality and dexterity, but in regards to the stabilization
of the
motor parameter of pulmonary pressure below the glottis (vocal cords).
This,
for functional speech, needs to be maintained at a constant level (for
a given
degree of loudness) throughout successive strings of vocalizations in
spite of
this producing considerable decrease in lung volume (Ladefoged 1960;
Hixon
1973; Proctor 1986). For this pulmonary pressure stability to exist
requires
that the muscles controlling it are anticipatorily adjusted in regard
to each
upcoming vocalization and its particular subglottal pressure needs
(which might
vary, for example, in regard to its individual phones, vocalization
loudness,
and prosodic stress and emotional emphasis). There also needs to be in
regard
to forthcoming speech and song pauses action planning of thoracic
muscles as to
when to refill the lung (Whalen and Kinsella-Shaw 1997).
Time-scheduling
and phone articulation
construction
Humans not only engage in
thoracic breathing but also when articulating
phones, engage in exquisite “dexterity” of the vocal tract. The reason
for
this, I suggest, is that in nonhuman animals, pulmonary pressure and
the vocal
tract are restricted by reflexes to articulating a limited set of
evolved
vocalizations. But because human vocal tract actions are “unlocked”
from such
reflexes in humans by direct cortical control (Kuyper 1958; Liscic,
Zidar et
al. 1998; Ludlow 2005; Ghazanfar and Rendall 2008; Teitti, Maatta et
al. 2008),
it can be synchronized and motor coordinated in complex sequences of
diverse
and differently timed glottal, laryngeal and supralaryngeal movements.
It is
this ability to combine as independent motor elements glottal
phonation,
laryngeal/ supralaryngeal gesture and vocal tract modifications
(Lofqvist and
Gracco 1999) with timed anticipatory motor adjustment that, could be
responsible for enabling the human motor system to create, and then
string
together, its rich diversity of speech phones into spoken words.
If glottal phonation, for
example, can be adjusted independently and
anticipatorily to the rest of the vocal chain, it can be time-schedule
synchronized
to create speech sounds that differ in the timing between their glottal
onset
and their acoustic shaping by vocal tract gestures (voiced/ unvoiced
contrast;
glottal phones). Likewise, if the laryngeal shape is not reflexively
locked to
articulators higher up the vocal chain, then its resonance “vowel”
quality can
be changed independent of them so that vowel vocalizations can be
conjoined in
a time exact manner with a great variety of gestures in different vocal
tract
locations (bilabial, labio-dental, dental, alveolar, post-alveolar,
retroflex,
palatal, velar, uvular, pharyngeal, epiglottal, and glottal), and
manners
(nasal, plosive, fricative, approximant, trill, tap/flap, and their
lateral
variants). As a result, vowels can be provided with diverse kinds of
associated
consonantal sounds. For example, using data from the International
Phonetic
Alphabet (International Phonetic Association 1999), the movement of the
lips
(bilabiality) can create six consonants depending upon their timing
with the
on-start of phonation in the glottis (voiced vs. unvoiced), the
presence or not
of nasality (/m/) (created by soft palette opening), and how that lip
movement
is carried out (plosive, /p/, /b/; fricative, /ф/, /ß/; or trill, /в/). The lips with such top-down
control can create further pronunciations such as anterior release
“click”
consonants that do not even use pulmonary air pressure. This motor
ability to
independently stabilize different vocal components explains the
diversity, that
was noted above with which the human vocal apparatus
can be used.
In this context, it is
interesting to note that internal models in the
cerebellum upon the auditory signal of phone production have been
suggested to
underlie phone perception (Callan, Tsytsarev et al. 2006), vocal tract
articulation
(right side) (Callan, Kawato et al. 2007) and speech prosody (left
side) (Callan,
Kawato et al. 2007). There is evidence that phone perception involves
processes
used in its production (Liberman, Cooper et al. 1967; Pulvermuller,
Huss et al.
2006). This research suggests that there may be a considerable
opportunity to
explain phenomena already identified in phonetic and speech sciences
with the
internal model processes that became more complex when the human brain
expanded.
Possible link
to syntax
As
with knapping, the nature of internal models allows that such
musculoskeletal level predictive internal models can engage in complex
hierarchical interaction with higher internal model ones. As noted
above, it is a peculiarity of human vocalization that it is made in the
context of several layers of hierarchical organization that concern not
only
productive ones (such as in speech syllable, word, phrase, and
sentence) but
also those involved in communication such as semantics, syntax,
pragmatics and
emotions. There is even evidence that the speech production system does
not
only aid the perception of speech (Liberman, Cooper et al. 1967;
Pulvermuller,
Huss et al. 2006) but provides prediction and imitation abilities that
also
aids higher level language comprehension (Pickering and Garrod 2007).
Of particular importance in
this context is that strings of phones are
made into units that are organized and arranged in planned syntactic
ways. This
syntax level directly interacts down upon the lower musculoskeletal
ones—a
phenomena that can be seen in the way that syntactic tense can modify
vowel
vocalization such as in "swim", "swum", "swam".
This suggests that the syntax and musculoskeletal levels are in some
way
closely interlinked. While any ideas in this area are necessarily
preliminary,
this raises the possibility that the internal models needed for
low-level
musculoskeletal control of the vocal tract could have created the
opportunity
by which higher-level models are constructed in motor control upon them
so that
the speech units that they create can be structured to support
communication
and semantics. It is interesting to note that the Broca’s area, a brain
region
in the premotor cortex traditionally associated with syntax, and more
recently,
syntactic working memory (Fiebach, Schlesewsky et al. 2005), has also
been
recently identified as underlying the anticipatory hierarchization of
actions (Fiebach
and Schubotz 2006). This is consistent with lower motor level models in
vocalization providing the basis for the development of higher-level
ones that
have come in their organization of lower ones to possess what are
analyzed as
syntactic functions.
Summary of
vocalization and internal models
These brief observations show
that human vocalization and voluntary
respiration control could gain their evolutionary novelty like human
dexterity
and bipedality from top-down internal model timed motor stabilization.
Like
them, this is consistent with them being linked to the
cerebello-cerebral
cortex circuits (Murphy, Corfield et al. 1997; Dresel, Castrop et al.
2005;
Schulz, Varga et al. 2005; Callan, Tsytsarev et al. 2006; Callan,
Kawato et al.
2007; Loucks, Poletto et al. 2007; Spencer and Slocomb 2007). Further,
like
dexterity and bipedality, the kinematics of speech production continues
to be
refined into adolescence and after (Smith and Zelaznik 2004).
References
Alemseged, Z., F. Spoor, et al. (2006). "A
juvenile early
hominin skeleton from
Callan, D. E., M. Kawato, et al. (2007).
"Speech and song: The
role of the cerebellum." Cerebellum: 1-7.
Callan, D. E., V. Tsytsarev, et al. (2006).
"Song and speech:
brain regions involved with perception and covert production." Neuroimage
31(3): 1327-42.
Chen, J. M., J. Smith, et al. (2008).
"Experienced saxophonists
learn to tune their vocal tracts." Science
319(5864): 776.
Dresel, C., F. Castrop, et al. (2005). "The
functional
neuroanatomy of coordinated orofacial movements: sparse sampling fMRI
of
whistling." Neuroimage 28(3): 588-97.
Fiebach, C. J., M. Schlesewsky, et al. (2005).
"Revisiting the
role of Broca's area in sentence processing: syntactic integration
versus
syntactic working memory." Hum Brain Mapp 24(2):
79-91.
Fiebach, C. J. and R. I. Schubotz (2006).
"Dynamic anticipatory
processing of hierarchical sequential events: A common role for Broca's
area
and ventral premotor cortex across domains?" Cortex
42: 499-502.
Fritz, C. and J. Wolfe (2005). "How do clarinet
players adjust
the resonances of their vocal tracts for different playing effects?" J
Acoust Soc Am 118(5): 3306-15.
Furness, W. H. (1916). "Observations on the
mentality of
chimpanzees and orang-utans." Proceedings of the American
Philosophical
Society 55: 281-290.
Ghazanfar, A. A. and D. Rendall (2008).
"Evolution of human
vocal production." Current Biology 18: R457-R460.
Gracco, V. L. and A. Löfqvist (1994). " Speech
motor
coordination and control, Evidence form lip, jaw, and laryngeal
movements." Journal of Neuroscience 14: 6585-6597.
Hayes, C. (1951). The ape in our house.
Hixon, T. J. (1973). "Kinematics of the chest
wall during
speech production: volume displacements of the rib cage, abdomen, and
lung." J Speech Hear Res 16(1): 78-115.
International Phonetic Association (1999). Handbook
of the
International Phonetic Association.
Kuyper, H. G. (1958). "Corticobulbar connexions
to the pons and
lower brain-stem in man." Brain 81: 364-388.
Ladefoged, P. (1960). "The regulation of
sub-glottal
pressure." Folia Phoniatrica 12: 169-175.
Lee, H.-t. and R. B. Banzett (1997).
"Mechanical links between
locomotion and breathing." News in Physiological Science
12: 273-.
Liberman, A. M., F. S. Cooper, et al. (1967).
"Perception of
the speech code." Psychological Review 74: 431-461.
Liscic, R. M., J. Zidar, et al. (1998).
"Evidence of direct
connection of corticobulbar fibers to orafacial muscles in man." Muscle
and Nerve 21: 561-566.
Lofqvist, A. and V. L. Gracco (1999).
"Interarticulator
programming in VCV sequences: lip and tongue movements." J
Acoust Soc
Am 105(3): 1864-76.
Loucks, T. M., C. J. Poletto, et al. (2007).
"Human brain
activation during phonation and exhalation: Common volutional control
for two
upper airway functions." Neuroimage 15(131-143).
MacLarnon, A. M. and G. P. Hewitt (1999). "The
evolution of
human speech: the role of enhanced breathing control." Am J
Phys
Anthropol 109(3): 341-63.
Maddieson,
Meyer, M. (2003). "Vertebrae and Language
Ability in Early
Hominids." PaleoAnthropology 1: 20-21.
Murphy, K., D. R. Corfield, et al. (1997).
"Cerebral areas
associated with motor cortrol of speech in humans." Journal
of Applied
Physiology 85: 1438-1447.
Nishimura, T., A. Mikami, et al. (2007).
"Development of the
Laryngeal Air Sac in Chimpanzees." International Journal of
Primatology
28: 483-492.
Pickering, M. J. and S. Garrod (2007). "Do
people use language
production to make predictions during comprehension?" Trends
in
Cognitive Science 11: 105-110.
Pollick, A. S. and F. B. M. de Waal (2007).
"Ape gestures and
language evolution." Proceedings of the
Proctor, D. F. (1986). Modifications of
breathing for phonation. Handbook
of ohysiology, The respiratory system. A. P. Fishman.
Provine, R. R. (1996). "Laughter." American
Scientist
84: 38-45.
Pulvermuller, F., M. Huss, et al. (2006).
"Motor cortex maps
articulatory features of speech sounds." Proc Natl Acad Sci U
S A
103(20): 7865-70.
Richman, B. (1976). "Some vocal distinctive
features used by
gelada monkeys." Journal of the Acoustical Society of
Savage-Rumbaugh, s., W. M. Fields, et al.
(2004). "The
emergence of knapping and vocal expression embedded in a Pan/Homo
culture." Biology and philosophy 19: 541-575.
Savage-Rumbaugh, S. and R. Lewin (1994). Kanzi.
Schulz, G. M., M. Varga, et al. (2005).
"Functional
neuroanatomy of human vocalization: an H215O PET study." Cereb
Cortex
15(12): 1835-47.
Simonyan, K., Z. S. Saad, et al. (2007).
"Functional
neuroanatomy of human voluntary cough and sniff production." Neuroimage
37: 401-409.
Smith, A. and H. N. Zelaznik (2004).
"Development of functional
synergies for speech motor coordination in childhood and adolescence." Developmental
Psychobiology 45: 22-33.
Smith, B. L. (1994). Speech production,
Atypical aspects. The
encyclopedia of language and linguistics. R. E. Asher.
Spencer, K. A. and D. L. Slocomb (2007). "The
neural basis of
ataxic dysarthria." Cerebellum 6: 58-65.
Steinschneider, M., J. Arezzo, et al. (1982).
"Speech evoked
activity in the auditory radiations and cortex of the awake monkey." Brain
Res 252(2): 353-65.
Suthers, R. A., F. Goller, et al. (1999). "The
neuromuscular
control of birdsong." Philosophical Transactions of the Royal
Society
of
Taglialatela, J. P., S. Savage-Rambaugh, et al.
(2003). "Vocal
production by a language-competent Pan
paniscus." International Journal of Primatology
24: 1-47.
Teitti, S.,
Whalen, D. H. and J. M. Kinsella-Shaw (1997).
"Exploring the
relationship of inspiration duration to utterance duration." Phonetica
54: 138-152.
Willerslev, R. (2004). "Not animal, not not-animal: Hunting, imitation and
empathetic knowledge among the
Siberian Yukaghirs." Journal of the Royal Anthropological
Institute
10: 629-652.
Yerkes, R. M. and B. W. Learned (1925). Chimpanzee
intelligence
and its vocal expressions.