Seeing through Deception: A Computational Approach to Deceit Detection in Spanish Written Communication

Ángela Almela, Rafael Valencia-García, Pascual Cantos


The present paper addresses the question of the nature of deception language. Specifically, the main aim of this piece of research is the exploration of deceit in Spanish written communication. We have designed an automatic classifier based on Support Vector Machines (SVM) for the identification of deception in an ad hoc opinion corpus. In order to test the effectiveness of the LIWC2001 categories in Spanish, we have drawn a comparison with a Bag-of-Words (BoW) model. The results indicate that the classification of the texts is more successful by means of our initial set of variables than with the latter system. These findings are potentially applicable to areas such as forensic linguistics and opinion mining, where extensive research on languages other than English is needed.


deception detection, opinion mining, Support Vector Machine, bag of words

Full Text:



Almela, A. (2011). Can lexical choice betray a liar? Paper presented at the I Symposium on the Sociology of Words, University of Murcia, Spain.

Alpers, G. W., Winzelberg, A., Classen, C., Roberts, H., Dev, p., Koopman, C. and Taylor, B. (2005). Evaluation of computerized text analysis in an Internet breast cancer support group. Computers in Human Behavior, 21, 361-376.

Bishop, J. (2009). Enhancing the understanding of genres of web-based communities: The role of the ecological cognition framework. International Journal of Web-Based Communities, 5(1), 4-17.

Bond, G. D. and Lee, A. Y. (2005). Language of lies in prison: Linguistic classification of prisoners’ truthful and deceptive natural language. Applied Cognitive Psychology, 19, 313-329.

Bouckaert, R. R., Frank, E., Hall, M. A., Holmes, G., Pfahringer, B., Reutemann, P. and Witten, I. H. (2010). WEKA-experiences with a java open-source project. Journal of Machine Learning Research, 11:2533-2541.

Burgoon, J. K., Blair, J. P., Qin, T. and Nunamaker, J. F. (2003). Detecting deception through linguistic analysis. Intelligence and Security Informatics, 2665, 91–101.

Cade, W. L., Lehman, B. A. and Olney, A. (2010). An exploration of off topic conversation. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 669-672. Association for Computational Linguistics.

Chung, C. and Pennebaker, J. W. (2007). The psychological functions of function words. In K. Fiedler (Ed.), Social Communication, 343–359. New York: Psychology Press.

Dave, K., Lawrence, S. and Pennock, D. M. (2003). Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In Proceedings of the 12th international conference on World Wide Web (WWW '03). ACM, New York, NY, USA, 519-528.

DePaulo, B. M., Kashy, D. A., Kirkendol, S. E., Wyer, M. M. and Epstein, J. A. (1996). Lying in everyday life. Journal of Personality and Social Psychology, 70: 979-995.

Fornaciari, T. and Poesio, M. (2011). Lexical vs. Surface Features in Deceptive Language Analysis. In Wyner, A. and Branting, K. Proceedings of the ICAIL 2011 Workshop Applying Human Language Technology to the Law.

Granhag, P. A. and Strömwall, L. A. (2004). The detection of deception in forensic contexts. Cambridge, UK: Cambridge University Press.

Hancock, J. T., Curry, L. E., Goorha, S. and Woodworth, M. T. (2004). Lies in conversation: an examination of deception using automated linguistic analysis. Annual Conference of the Cognitive Science Society. Taylor and Francis Group, Psychology Press, Mahwah, NJ.

Hancock, J. T., Curry, L. E., Goorha, S. and Woodworth, M. T. (2008). On lying and being lied to: A linguistic analysis of deception in computer-mediated communication. Discourse Processes, 45, 1-23.

Joachims, T. (1998). Text categorization with support vector machines: learning with many relevant features. ECML-98, 137-142.

Labov, W. (1972). Sociolinguistic Patterns. Oxford, UK: Blackwell.

Leshed, G., Hancock, J. T., Cosley, D., McLeod, P. L. and Gay, G. (2007). Feedback for guiding reflection on teamwork practices. In Proceedings of the GROUP’07 conference on supporting group work, 217-220. New York: Association for Computing Machinery Press.

Lewis, D. D. (1998). Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval. In Proceedings of ECML-98, 10th European Conference on Machine Learning, Springer Verlag, Heidelberg, Germany.

Mairesse, F., Walker, M. A., Mehl, M. and Moore, R. K. (2007). Using linguistic cues for the automatic recognition of personality in conversation and text. Journal of Artificial Intelligence Research, 30(1), 457-500.

Mihalcea, R. and Strapparava, C. (2009). The Lie Detector: Explorations in the Automatic Recognition of Deceptive Language. In Proceedings of the Association for Computational Linguistics (ACL-IJCNLP 2009), Singapore, 309-312.

Newman, M. L., Pennebaker, J. W., Berry, D. S. and Richards, J. M. (2003). Lying words: Predicting deception from linguistic styles. Personality and Social Psychology Bulletin, 29: 665-675.

Ott, M., Choi, Y., Cardie, C. and Hancock, J. T. (2011). Finding deceptive opinion spam by any stretch of the imagination. In Proceedings of ACL, 309-319.

Pennebaker, J. W., Francis, M. E. and Booth, R. J. (2001). Linguistic Inquiry and Word Count. Erlbaum Publishers, Mahwah, NJ.

Pennebaker, J. W., Chung, C. K., Ireland, M., Gonzales, A. L. and Booth, R. J. (2007). The development and psychometric properties of LIWC2007., Austin, TX.

Picornell, I. (2011). The Rake’s Progress: Mapping deception in written witness statements. Paper presented at the International Association of Forensic Linguists Tenth Biennial Conference, Aston University, Birmingham, United Kingdom.

Provost, J. (1999). Naive-bayes vs. rule-learning in classifcation of email. Technical Report AI-TR-99-284, University of Texas at Austin, Artificial Intelligence Lab.

Ramírez-Esparza, N., Pennebaker, J. W. and García, F. A. (2007). La psicología del uso de las palabras: Un programa de computadora que analiza textos en español [The psychology of word use: A computer program that analyzes texts in Spanish]. Revista Mexicana de Psicología, 24, 85-99.

Rude, S. S., Gortner, E. M. and Pennebaker, J. W. (2004). Language use of depressed and depression-vulnerable college students. Cognition and Emotion, 18, 1121-1133.

Rushdi-Saleh, M., Martín-Valdivia, M. T., Montejo, A., and Ureña, L. A. (2011). Experiments with SVM to classify opinions in different domains. Expert Systems with Applications, 38(12):14799-14804.

Tausczik, Y. R. and Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 29, 24-54.

Vrij, A. (2010). Detecting lies and deceit: Pitfalls and opportunities. 2nd edition. John Wiley and Sons, Chischester, UK.

Vrij, A., Mann, S., Kristen, S. and Fisher, R. P. (2007). Cues to deception and ability to detect lies as a function of police interview styles. Law and human behavior, 31(5), 499-518.



  • There are currently no refbacks.