Seeing through Deception: A Computational Approach to Deceit Detection in Spanish Written Communication
DOI:
https://doi.org/10.5195/lesli.2013.5Keywords:
deception detection, opinion mining, Support Vector Machine, bag of wordsAbstract
The present paper addresses the question of the nature of deception language. Specifically, the main aim of this piece of research is the exploration of deceit in Spanish written communication. We have designed an automatic classifier based on Support Vector Machines (SVM) for the identification of deception in an ad hoc opinion corpus. In order to test the effectiveness of the LIWC2001 categories in Spanish, we have drawn a comparison with a Bag-of-Words (BoW) model. The results indicate that the classification of the texts is more successful by means of our initial set of variables than with the latter system. These findings are potentially applicable to areas such as forensic linguistics and opinion mining, where extensive research on languages other than English is needed.
References
Almela, A. (2011). Can lexical choice betray a liar? Paper presented at the I Symposium on the Sociology of Words, University of Murcia, Spain.
Alpers, G. W., Winzelberg, A., Classen, C., Roberts, H., Dev, p., Koopman, C. and Taylor, B. (2005). Evaluation of computerized text analysis in an Internet breast cancer support group. Computers in Human Behavior, 21, 361-376.
Bishop, J. (2009). Enhancing the understanding of genres of web-based communities: The role of the ecological cognition framework. International Journal of Web-Based Communities, 5(1), 4-17.
Bond, G. D. and Lee, A. Y. (2005). Language of lies in prison: Linguistic classification of prisoners’ truthful and deceptive natural language. Applied Cognitive Psychology, 19, 313-329.
Bouckaert, R. R., Frank, E., Hall, M. A., Holmes, G., Pfahringer, B., Reutemann, P. and Witten, I. H. (2010). WEKA-experiences with a java open-source project. Journal of Machine Learning Research, 11:2533-2541.
Burgoon, J. K., Blair, J. P., Qin, T. and Nunamaker, J. F. (2003). Detecting deception through linguistic analysis. Intelligence and Security Informatics, 2665, 91–101.
Cade, W. L., Lehman, B. A. and Olney, A. (2010). An exploration of off topic conversation. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 669-672. Association for Computational Linguistics.
Chung, C. and Pennebaker, J. W. (2007). The psychological functions of function words. In K. Fiedler (Ed.), Social Communication, 343–359. New York: Psychology Press.
Dave, K., Lawrence, S. and Pennock, D. M. (2003). Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In Proceedings of the 12th international conference on World Wide Web (WWW '03). ACM, New York, NY, USA, 519-528.
DePaulo, B. M., Kashy, D. A., Kirkendol, S. E., Wyer, M. M. and Epstein, J. A. (1996). Lying in everyday life. Journal of Personality and Social Psychology, 70: 979-995.
Fornaciari, T. and Poesio, M. (2011). Lexical vs. Surface Features in Deceptive Language Analysis. In Wyner, A. and Branting, K. Proceedings of the ICAIL 2011 Workshop Applying Human Language Technology to the Law.
Granhag, P. A. and Strömwall, L. A. (2004). The detection of deception in forensic contexts. Cambridge, UK: Cambridge University Press.
Hancock, J. T., Curry, L. E., Goorha, S. and Woodworth, M. T. (2004). Lies in conversation: an examination of deception using automated linguistic analysis. Annual Conference of the Cognitive Science Society. Taylor and Francis Group, Psychology Press, Mahwah, NJ.
Hancock, J. T., Curry, L. E., Goorha, S. and Woodworth, M. T. (2008). On lying and being lied to: A linguistic analysis of deception in computer-mediated communication. Discourse Processes, 45, 1-23.
Joachims, T. (1998). Text categorization with support vector machines: learning with many relevant features. ECML-98, 137-142.
Labov, W. (1972). Sociolinguistic Patterns. Oxford, UK: Blackwell.
Leshed, G., Hancock, J. T., Cosley, D., McLeod, P. L. and Gay, G. (2007). Feedback for guiding reflection on teamwork practices. In Proceedings of the GROUP’07 conference on supporting group work, 217-220. New York: Association for Computing Machinery Press.
Lewis, D. D. (1998). Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval. In Proceedings of ECML-98, 10th European Conference on Machine Learning, Springer Verlag, Heidelberg, Germany.
Mairesse, F., Walker, M. A., Mehl, M. and Moore, R. K. (2007). Using linguistic cues for the automatic recognition of personality in conversation and text. Journal of Artificial Intelligence Research, 30(1), 457-500.
Mihalcea, R. and Strapparava, C. (2009). The Lie Detector: Explorations in the Automatic Recognition of Deceptive Language. In Proceedings of the Association for Computational Linguistics (ACL-IJCNLP 2009), Singapore, 309-312.
Newman, M. L., Pennebaker, J. W., Berry, D. S. and Richards, J. M. (2003). Lying words: Predicting deception from linguistic styles. Personality and Social Psychology Bulletin, 29: 665-675.
Ott, M., Choi, Y., Cardie, C. and Hancock, J. T. (2011). Finding deceptive opinion spam by any stretch of the imagination. In Proceedings of ACL, 309-319.
Pennebaker, J. W., Francis, M. E. and Booth, R. J. (2001). Linguistic Inquiry and Word Count. Erlbaum Publishers, Mahwah, NJ.
Pennebaker, J. W., Chung, C. K., Ireland, M., Gonzales, A. L. and Booth, R. J. (2007). The development and psychometric properties of LIWC2007. LIWC.net, Austin, TX.
Picornell, I. (2011). The Rake’s Progress: Mapping deception in written witness statements. Paper presented at the International Association of Forensic Linguists Tenth Biennial Conference, Aston University, Birmingham, United Kingdom.
Provost, J. (1999). Naive-bayes vs. rule-learning in classifcation of email. Technical Report AI-TR-99-284, University of Texas at Austin, Artificial Intelligence Lab.
Ramírez-Esparza, N., Pennebaker, J. W. and García, F. A. (2007). La psicología del uso de las palabras: Un programa de computadora que analiza textos en español [The psychology of word use: A computer program that analyzes texts in Spanish]. Revista Mexicana de Psicología, 24, 85-99.
Rude, S. S., Gortner, E. M. and Pennebaker, J. W. (2004). Language use of depressed and depression-vulnerable college students. Cognition and Emotion, 18, 1121-1133.
Rushdi-Saleh, M., Martín-Valdivia, M. T., Montejo, A., and Ureña, L. A. (2011). Experiments with SVM to classify opinions in different domains. Expert Systems with Applications, 38(12):14799-14804.
Tausczik, Y. R. and Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 29, 24-54.
Vrij, A. (2010). Detecting lies and deceit: Pitfalls and opportunities. 2nd edition. John Wiley and Sons, Chischester, UK.
Vrij, A., Mann, S., Kristen, S. and Fisher, R. P. (2007). Cues to deception and ability to detect lies as a function of police interview styles. Law and human behavior, 31(5), 499-518.
Downloads
Published
Issue
Section
License
- The Author shall grant to the Publisher and its agents the nonexclusive perpetual right and license to publish, archive, and make accessible the Work in whole or in part in all forms of media now or hereafter known under a Creative Commons Attribution 4.0 Licenseor its equivalent, which, for the avoidance of doubt, allows others to copy, distribute, and transmit the Work under the following conditions:
- Attribution—other users must attribute the Work in the manner specified by the author as indicated on the journal Web site;
- The Author is able to enter into separate, additional contractual arrangements for the nonexclusive distribution of the journal's published version of the Work (e.g., post it to an institutional repository or publish it in a book), as long as there is provided in the document an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post online a pre-publication manuscript (but not the Publisher’s final formatted PDF version of the Work) in institutional repositories or on their Websites prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (see The Effect of Open Access). Any such posting made before acceptance and publication of the Work shall be updated upon publication to include a reference to the Publisher-assigned DOI (Digital Object Identifier) and a link to the online abstract for the final published Work in the Journal.
- Upon Publisher’s request, the Author agrees to furnish promptly to Publisher, at the Author’s own expense, written evidence of the permissions, licenses, and consents for use of third-party material included within the Work, except as determined by Publisher to be covered by the principles of Fair Use.
- The Author represents and warrants that:
- the Work is the Author’s original work;
- the Author has not transferred, and will not transfer, exclusive rights in the Work to any third party;
- the Work is not pending review or under consideration by another publisher;
- the Work has not previously been published;
- the Work contains no misrepresentation or infringement of the Work or property of other authors or third parties; and
- the Work contains no libel, invasion of privacy, or other unlawful matter.
- The Author agrees to indemnify and hold Publisher harmless from Author’s breach of the representations and warranties contained in Paragraph 7 above, as well as any claim or proceeding relating to Publisher’s use and publication of any content contained in the Work, including third-party content.