Barriers to Progress in Speaker Identification with Comments on the Trayvon Martin Case
DOI:
https://doi.org/10.5195/lesli.2013.3Keywords:
speaker identification, automated speech processing, expert witnesses, Trayvon MartinAbstract
Linguistics and phonetics overlap in many areas. The essay to follow reviews some of the problems experienced by phoneticians in one of these regions. It may provide some insight for linguists when they are confronted by barriers in their own field. The present example involves individuals who are attempting to identify speakers from voice analysis. The fundamental challenge they face is, of course, caused by the thousands of variables associated with that task. Included here are differences among speakers’ gender, age, size, physiology, language, dialect, psychological/health states, background/education, reason for speaking, situation, environment, configuration of the acoustic channel -- plus many others. Many formal assessment procedures -- both aural-perceptual ones conducted by humans or machine/computer based systems -- have been proposed and/or used for the cited analyses. Unfortunately, however, few have enjoyed particularly high levels of success. Worse yet, reasonable progress has suffered from external impedances; the report to follow will outline some of them. Among the problems considered are: 1) competition (verification vs. identification, from voiceprints), 2) concept disputes 3) the continued undervaluation of relevant evidence and 4) markedly dissimilar philosophies of professionals from different disciplines. A response in the form of a short review of the data and concepts which clearly support the possibility of robust speaker identification is presented. Also included are suggestions as to how to enhance the effectiveness of disciplines such as ours.
References
REFERENCES
Note: This article reviews so many events and experiments -- those occurring over such a long
period of time -- that over 300 references would be needed to fully document them.
However, in order to reduce their number to a manageable level, certain steps were taken.
First, the well-known “rule of three” was imposed. In addition, a reference was included
only when 1) identification of an event or project was absolutely necessary or 2) further
explanation of a concept was considered desirable. Finally, when any of many dozens of
references would be relevant, only the best or most important was included.
Adcock, J.M., (Editor) Investigative Sciencs Journal, Contact: jmadcock@jma-forensics.org or www.investigativesciencejournal.org
Agnitio, (2009) Batvox 3.0, Basic User Manual, Madrid, Spain
Alexander, A., Botti, F., Dessimoz, D. and Drygajlo, A. (2005) The Effect of Mismatched Recording Conditions on Human and Automatic Speaker Recognition in Forensic Application, Forensic Sci. Internat., S95-99.
Atal, B.S. (1972) Automatic Speaker Recognition Based on Pitch Contours, J. Acoust. Soc. Amer., 52: 1678-7697.
Beigi, H. (2011) Fundamentals of Speaker Recognition, Secausus, NJ, Springer.
Bower, B. (2013) Closed Thinking, Science News, 183: 26-29.
Bricker, P. and Pruzanzky, S. (1966) Effects of Stimulus Content and Duration on Talker Identification, J. Acoust. Soc. Amer., 40: 1441-1450.
Bronkhorst, A.W. (2000) The Cocktail Party Phenomenon: A Review of Research on Speech Intelligibility in Multiple-talker Conditions, Acustica, 86: 117-128.
Campbell, J., Shen, W., Campbell, W. Schwartz, R., Bonastre, J.F. and Matrouf, D. (2009) Forensic Speaker Recognition, IEEE Signal Processing Mag., March: 95-103.
Daubert vs. Merrel Dow Pharms Inc., (1992) 509 U.S. 579, 113S. CT 2786.
DeJong, G. (1998) Earwitness Characteristics and Speaker Identification Accuracy, PhD dissertation, Univ. of Florida.
Fiedler, K., Kutzner, F. and Krueger, J. (2012) The Long Way from -Error Control to Validity Proper, Perspect. Psychol. Sci., 7: 661-669.
Florida vs. Zimmerman, (2013) No. 1712F4573 Circuit Court, Seminole County, Florida.
Gelfer, M.P., Massey K.P., and Hollien, H. (1989) The Effects of Sample Duration and
Timing of Speaker Identification Accuracy by Means of Long-term Spectra, J. Phonet; 17: 327-338
Gigeenzer, G. (2010) Personal Reflections on Theory and Psychology, Theory and Psychology, 20: 733-743.
Hautamäki, V., Kinnunen, T., Nosratighods, M., Lee, K.A., Ma, B. and Li, H. (2010) Approaching Human Listener Accuracy with Modern Speaker Verification, In INTERSPEECH-2010, 1473-1476.
Hecker, M.H.L. (1971) Speaker Recognition: An Interpretive Survey of the Literature, ASHA, Monograph #16, Washington, D.C.
Hollien, H. (1990) Acoustics of Crime, New York, Plenum Press.
Hollien, H. (2002) Forensic Voice Identification, London, Academic Press Forensics.
Hollien, H. (2012) On Earwitness Lineups, Investigat. Sci. J., 4: 1-17.
Hollien, H. and Harnsberger, J. (2010) Speaker Identification: The Case for Speech Vector Analysis, J. Acoust. Soc. Amer., 128: 2394A (and submitted)
Hollien, H. and Harnsberger, J. (2013) Attempted Speaker Identification: Florida vs. Zimmerman (1712F4573), submitted to the Office of the State Attorney, Fourth Judicial Circuit, Jacksonville, FL.
Hollien, H. and Hollien, P.A. (1995) Improving Aural-Perceptual Speaker Identification Techniques, Stud. Forensic Phonet., 64: 87-97.
Hollien, H. and Jiang, M. (1998) The Challenge of Effective Speaker Identification, RLA2C, Avignon, France, 1: 2-9.
Hollien, H. and Majewski, W. (1977) Speaker Identification Using Long-term Spectra Under Normal and Distorted Speech Conditions, J. Acoust. Soc. Amer., 62: 975-980.
Hollien, H. and Majewski, W. (2009) Unintended Consequences: Due to Lack of Standards for Speaker Identification and Other Forensic Procedures, Proceed. 16th Internat. Congr. Sound/Vib., Krakow, Poland, July 866: 1-6.
Hollien, H., Hicks, J.W. and Oliver, L.H. (1990) A Semi-Automatic System for Speaker Identification, Neue Tend. Angewandten Phon. III (V.A. Borowski and J.P. Koester, Eds.), Hamburg, Helmut Buske Verlag, 62: 89-106.
Hollien, H., Majewski, W. and Doherty, E.T. (1982) Perceptual Identification of Voices Under Normal, Stress and Disguise Speaker Conditions, J. Phonetics, 10: 139-148.
Jacewicz, E., Fox, R.A., and Wei, L. (2010) Between-speaker and Within-speaker Variation in speech tempo of American English, J. Acoust. Soc. Am., 128: 839-850
Jiang, M. (1995) Experiments on a Speaker Identification System (PhD dissertation, Univ. of Florida)
Jiang, M. (1996) Fundamental Frequency Vector for a Speaker Identification System, Forensic Ling., 3: 95-106
Johnson, C.C., Hollien, H. and Hicks, J.W. (1984) Speaker Identification Utilizing Selected Temporal Speech Features, J. Phonet., 12: 319-327.
Kersta, L. (1962) Voiceprint Identification, Nature, 196: 1253-1257
Koester, J.P. (1987) Performance of Experts and Naïve Listeners in Auditory Speaker Recognition, in German, Festschrift fiir H. Wangler (R. Weiss, Ed.) Hamburg: Buske, 171-180.
Kraus, N. and Nicol, T. (2010) The Musician’s Auditory World, Acoustics Today, 3: 15-27.
Kraus, N., McGee, T., Carrell, T.D. and Sharma, A. (1995) Neurophysiologic Bases of Speech Discrimination, Ear and Hear., 16: 19-37.
Kraus, N., Skoe, E., Parberry-Clarke, A. and Ashley, R. (2009) Experience-induced Malleability in Neural Encoding of Pitch, Timbre and Timing: Implications for Language and Music, Annals New York Acad. Sci., Neurosci. and Music III, 1169: 543-557.
Krishnan, A, Xu, Y.S., Gandour, J. and Cariani, P. (2005) Encoding of Pitch in the Human Brainstem is Sensitive to Language Experience, Cognitive Brain Res., 25: 161-168.
Künzel, H. (2013) Automatic Speaker Recognition with Cross-language Speech Material, Journal of Speech, Lang. and Law, Vol. 20-1: 21-44.
LaRiviere, C.L. (1975) Contributions of Fundamental Frequency and Formant Frequencies to Speaker Identification, Phonetica, 31: 185-197.
Lea, W. (1981) Voice Analysis on Trial, Springfield Il, Thomas, Charles C.
Mack vs. State of Florida, 54, Fla. 55 44 50 706 (1907) citing 5, Howell’s State Trials 1186
McGehee, F. (1937) The Reliability of the Identification of the Human Voice, J. Gen. Psychol., 17: 249-271.
Michel, W. (2008) The Toothbrush Problem, The Observer, Assn. Psychol. Sci., 21: 1-3.
Morrison, G.S. (2002) Liklihood-ratio Forensic Voice Comparison Using Parametric Representations of the Formant Trajectories of Diphthongs, J. Acoust. Soc. Amer., 125: 2387-2397.
Morrison, G.S. (2006) Vowel Inherent Spectral Change in Forensic Voice Comparison, J. Acoust. Soc. Am., 125: 2695A
Orchard, T., and Yarmey, A. (1995) The Effects of Whispers, Voice-sample Duration and Voice Distinctives on Criminal Speaker Identification, Appt. Cogn. Psychol., 9: 249-260
Papcun, G., Kreiman, J. and Davis, A. (1989) Long-term Memory for Unfamiliar Voices, J. Acoust. Soc. Amer., 85: 913-925.
Pollack, I., Pickett, J.M. and Sumby, W.H. (1954) On the Identification of Speakers by Voice, J. Acoust. Soc. Amer., 26: 403-412.
Reynolds, D.A. (1995) Speaker Identification and Verification Using Gaussian Mixture Speaker Models, Speech Comm., 17: 91-108.
Sambur, M.R. (1976) Speaker Recognition Using Orthogonal Linear Prediction, IEEE Trans., ASSP, 24: 283-287.
Schmidt-Nielson, A and Crystal, T.H. (2000) Speaker Verification by Human Listeners: Experiments Comparing Human and Machine Performance Using the NIST Speaker Evaluation Data, Digit. Sign. Proc., 10: 249-266.
Schuartz, M.F. (1986) Identification of Speaker Sex rom Isolated Voice Fricatives, J. Acoust. Soc. Am., 43: 1178-1179
Shirt, M. (1984) An Auditory Speaker Recognition Experiment, Proceed., Conf. Police Appli. Speech, Tape Record. Evidence, London, Instit. Acoust., 71-74.
Siegfried, T. (2010) Odds Are, It’s Wrong, Science News, 177: 26-35.
Simons, J., Nelson, L. and Simonsohn, U. (2011) Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant, Psychol. Sci., 22: 1359-1366.
Stevens, K.N. (1971) Sources of Inter- and Intra-speaker Variability in the Acoustic Properties of Speech Sounds, Proceed. 7th Int. Cong. Phonetic Sci., Montreal, 206-232.
Strait, D., Skoe, E., Kraus, N. and Ashley, R. (2009) Musical Experience and Neural Efficiency: Effects of Training on Subcortical Processing of Vocal Expressions of Emotion, Europ. J. Neurosci., 29: 661-668.
Tsai, W.H. and Wang, H.M. (2006) Speech Utterance Clustering Based on the Maximization of Within-clustering Homogeneity of Speaker Voice Characteristics, J. Acoust. Soc. Amer., 120: 1631-1645.
Wong, P., Skoe , E., Russo, N., Dees, T. and Kraus, N. (2007) Musical Experience Shapes Human Brainstem Encoding of Linguistic Pitch Patterns, Nature Neurosci, 10: 420-422
Yarmey, A.D. (1995) Earwitness Speaker Identification, Psychol. Public Policy Law, 1: 792-816.
Downloads
Published
Issue
Section
License
- The Author shall grant to the Publisher and its agents the nonexclusive perpetual right and license to publish, archive, and make accessible the Work in whole or in part in all forms of media now or hereafter known under a Creative Commons Attribution 4.0 Licenseor its equivalent, which, for the avoidance of doubt, allows others to copy, distribute, and transmit the Work under the following conditions:
- Attribution—other users must attribute the Work in the manner specified by the author as indicated on the journal Web site;
- The Author is able to enter into separate, additional contractual arrangements for the nonexclusive distribution of the journal's published version of the Work (e.g., post it to an institutional repository or publish it in a book), as long as there is provided in the document an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post online a pre-publication manuscript (but not the Publisher’s final formatted PDF version of the Work) in institutional repositories or on their Websites prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (see The Effect of Open Access). Any such posting made before acceptance and publication of the Work shall be updated upon publication to include a reference to the Publisher-assigned DOI (Digital Object Identifier) and a link to the online abstract for the final published Work in the Journal.
- Upon Publisher’s request, the Author agrees to furnish promptly to Publisher, at the Author’s own expense, written evidence of the permissions, licenses, and consents for use of third-party material included within the Work, except as determined by Publisher to be covered by the principles of Fair Use.
- The Author represents and warrants that:
- the Work is the Author’s original work;
- the Author has not transferred, and will not transfer, exclusive rights in the Work to any third party;
- the Work is not pending review or under consideration by another publisher;
- the Work has not previously been published;
- the Work contains no misrepresentation or infringement of the Work or property of other authors or third parties; and
- the Work contains no libel, invasion of privacy, or other unlawful matter.
- The Author agrees to indemnify and hold Publisher harmless from Author’s breach of the representations and warranties contained in Paragraph 7 above, as well as any claim or proceeding relating to Publisher’s use and publication of any content contained in the Work, including third-party content.