Article Content
Abstract
This study explores the potential of using large language models (LLMs) for automating fine-grained speech act annotation by assessing GPT-4o’s and DeepSeek’s performance in this task. This fine-grained annotation refers to the annotation of speech acts within the framework of local grammar, which annotates both speech act utterances and pragmatically meaningful syntactic units of a speech act utterance. Zooming in on the speech act of thanking and drawing on data taken from the British National Corpus, our investigation found that both models achieved high accuracy – 90.29% for GPT-4o and 92.95% for DeepSeek respectively, indicating that LLMs can approach human performance in domains that have traditionally relied on manual annotation. The subsequent detailed marker-by-marker analyses revealed that each model exhibits strengths and vulnerabilities; specifically, GPT-4o excelled with frequent, informal and context-dependent markers, while DeepSeek performed better with explicit and formal markers. Overall, the study shows that LLMs have great potential to facilitate complex tasks such as fine-grained speech act annotation, which not only means that LLMs can be a valuable methodological resource but also highlights the possibility of developing a human-LLM collaboration framework for speech act research.
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.
- Computational Linguistics
- Language Processing
- Natural Language Processing (NLP)
- Sequence Annotation
- Speech act theory
- Speech and Audio Processing
Notes
-
All the examples used in the present study, unless otherwise noted, were taken from the British National Corpus.
References
-
Ahn, J., Lee, J., & Son, M. (2024). ChatGPT in ELT: Disruptor? Or well-trained teaching assistant? ELT Journal, 78(3), 345–355. https://doi.org/10.1093/elt/ccae017
-
Aijmer, K. (1996). Conversational routines in english. Longman.
-
Aijmer, K., & Rühlemann, C. (2015). Corpus pragmatics: A handbook. Cambridge University Press. https://doi.org/10.1017/CBO9781139057493
-
Barnbrook, G. (2002). Defining language: A local grammar of definition sentences. John Benjamins. https://doi.org/10.1075/scl.11
-
Barnbrook, G., & Sinclair, J. (2001). Specialised corpus, local and functional grammars. In M. Ghadessy, A. Henry, & R. Roseberry (Eds.), Small Corpus studies and ELT: Theory and practice (pp. 237–276). John Benjamins.
-
Barrot, J. S. (2023). Using ChatGPT for second Language writing: Pitfalls and potentials. Assessing Writing, 57, 1–6. https://doi.org/10.1016/j.asw.2023.100745
-
Bednarek, M. (2008). Emotion talk across corpora. Palgrave Macmillan. https://doi.org/10.1057/9780230285712
-
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman grammar of spoken and written english. Longman.
-
Cheng, S. (2010). A corpus-based approach to the study of speech act of thanking. Concentric: Studies in Linguistics, 36(2), 257–274. Available at http://www.concentric-linguistics.url.tw/upload/articlesfs25140210544939027.pdf
-
Cheng, W., & Ching, T. (2018). Not a guarantee of future performance: The local grammar of disclaimers. Applied Linguistics, 39(3), 263–301. https://doi.org/10.1093/applin/amw006
-
Chen, X., Li, J., & Ye, Y. T. (2024). A feasibility study for the application of AI-generated conversations in pragmatic analysis. Journal of Pragmatics, 223, 14–30. https://doi.org/10.1016/j.pragma.2024.01.003
-
Curry, N., Baker, P., & Brookes, G. (2024). Generative AI for corpus approaches to discourse studies: A critical evaluation of ChatGPT. Applied Corpus Linguistics, 4(1), 100082. https://doi.org/10.1016/j.acorp.2023.100082
-
Garcia, P. (2015). Speech acts: A synchronic perspective. In K. Aijmer, & C. Rühlemann (Eds.), Corpus pragmatics: A handbook (pp. 29–51). Cambridge University Press.
-
Gillings, M., Kohn, T., & Mautner, G. (2024). The rise of large Language models: Challenges for critical discourse studies. Critical Discourse Studies. https://doi.org/10.1080/17405904.2024.2373733. Advance online publication.
-
Hunston, S. (2002). Corpora in applied linguistics. Cambridge University Press.
-
Hunston, S., & Su, H. (2019). Patterns, constructions, and local grammar: A case study of ‘evaluation’. Applied Linguistics, 40(4), 567–593. https://doi.org/10.1093/applin/amx046
-
Jautz, S. (2013). Thanking formulae in english: Explorations across varieties and genres. John Benjamins Publishing Company.
-
Jucker, A. (2009). Speech act research between armchair, field and laboratory: The case of compliments. Journal of Pragmatics, 41(8), 1611–1635. https://doi.org/10.1016/j.pragma.2009.02.004
-
Jucker, A., & Taavitsainen, I. (2008). Apologies in the history of english: Routinized and lexicalized expressions of responsibility and regret. In A. Jucker, & I. Taavitsainen (Eds.), Speech acts in the history of english (pp. 229–244). John Benjamins.
-
Kim, M., & Lu, X. F. (2024). Exploring the potential of using ChatGPT for rhetorical move-step analysis: The impact of prompt refinement, few-shot learning, and fine-tuning. Journal of English for Academic Purposes, 71. https://doi.org/10.1016/j.jeap.2024.101422
-
Kirk, J. (2016). The pragmatic annotation scheme of the SPICE-Ireland Corpus. International Journal of Corpus Linguistics, 21(3), 299–322.
-
Kohnke, L., Moorhouse, B. L., & Zou, D. (2023). ChatGPT for Language teaching and learning. RECL Journal, 54(3), 1–14.
-
Leech, G., & Weisser, M. (2003). Generic speech act annotation for task-oriented dialogues. In D. Archer, P. Rayson, A. Wilson, & T. McEnery (Eds.), Proceedings of the Corpus Linguistics Conferences 2003 (pp. 441–446). University Centre for Computer Corpus Research on Language, University of Lancaster, UK.
-
Liu, F., Jin, T., & Lee, J. S. Y. (2025). Automatic readability assessment for sentences: Neural, hybrid and large Language models. Language Resources and Evaluation. https://doi.org/10.1007/s10579-024-09800-5
-
Quan, Z., & Chen, Z. W. (2024). Human–computer pragmatics trialled: Some (im)polite interactions with ChatGPT 4.0 and the ensuing implications. Interactive Learning Environments, 1–20. https://doi.org/10.1080/10494820.2024.2362829
-
Romero-Trillo, J. (2008). Pragmatics and corpus linguistics: A mutualistic entente. Mouton de Gruyter. https://doi.org/10.1515/9783110199024
-
Schueler, D., & Marx, M. (2023). Speech acts in the Dutch COVID-19 press conferences. Language Resources and Evaluation, 57, 869–892. https://doi.org/10.1007/s10579-022-09602-7
-
Searle, J. (1976). A classification of illocutionary acts. Language in Society, 5(1), 1–23.
-
Stadler, S. A. (2011). Coding speech acts for their degree of explicitness. Journal of Pragmatics, 43(1), 36–50. https://doi.org/10.1016/j.pragma.2010.08.014
-
Su, H. (2017). Local grammars of speech acts: An exploratory study. Journal of Pragmatics, 111, 72–83. https://doi.org/10.1016/j.pragma.2017.02.008
-
Su, H. (2018).“Thank bloody God it’s Friday”: A local grammar of thanking. Corpus Pragmatics, 2(1), 83–105. https://doi.org/10.1007/s41701-017-0024-9
-
Su, H. (2025). Local grammar approaches to speech act studies: Apology in contemporary spoken British English. John Benjamins.
-
Su, H., & Fu, Y. X. (2023). Local grammar approaches to speech acts in Chinese: A case study of exemplification. Journal of Pragmatics, 212, 44–57. https://doi.org/10.1016/j.pragma.2023.05.004
-
Su, H., & Wei, N. X. (2018). “I’m really sorry about what I said”: A local grammar of apology. Pragmatics, 28(3), 439–462. https://doi.org/10.1075/prag.17005.su
-
Su, Y. F., Lin, Y., & Lai, C. (2023). Collaborating with ChatGPT in argumentative writing classrooms. Assessing Writing, 57, 1–11. https://doi.org/10.1016/j.asw.2023.100752
-
Weisser, M. (2003). SPAACy– A semi-automated tool for annotating dialogue acts. International Journal of Corpus Linguistics, 8(1), 63–74. https://doi.org/10.1075/ijcl.8.1.03wei
-
Weisser, M. (2015). Speech act annotation. In K. Aijmer, & C. Rühlemann (Eds.), Corpus pragmatics: A handbook (pp. 84–113). Cambridge University Press.
-
Weisser, M. (2016). DART– The dialogue annotation and research tool. Corpus Linguistics and Linguistic Theory, 12(2), 355–388. https://doi.org/10.1515/cllt-2014-0051
-
Weisser, M. (2018). How to do Corpus pragmatics on pragmatically annotated data. John Benjamins.
-
Weisser, M. (2019). The DART annotation scheme: Form, applicability and application. Studia Neophilologica, 91(2), 131–153. https://doi.org/10.1080/00393274.2019.1616218
-
Yu, D., Bondi, M., & Hyland, K. (2024b). Can GPT-4 learn to analyse moves in research article abstracts. Applied Linguistics. https://doi.org/10.1093/applin/amae071
-
Yu, D., Li, L., Su, H., & Fuoli, M. (2024a). Assessing the potential of LLM-assisted annotation for corpus-based pragmatics and discourse analysis: The case of apologies. International Journal of Corpus Linguistics. Advance online publication. https://doi.org/10.1075/ijcl.23087.yu
Funding
None.
Ethics declarations
Statement
The study reported in this paper has not been published previously or submitted for consideration in any other journals.
Conflict of Interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic Supplementary Material
Below is the link to the electronic supplementary material.
Supplementary Material 1
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
Reprints and permissions
About this article
Cite this article
Su, H., Ye, J. Large Language Models for Automating Fine-grained Speech Act Annotation: A Critical Evaluation of GPT-4o and DeepSeek. Corpus Pragmatics (2025). https://doi.org/10.1007/s41701-025-00200-w
- Received
- Accepted
- Published
- DOI https://doi.org/10.1007/s41701-025-00200-w
Keywords
- Large Language Models
- Speech Act Annotation
- Local Grammar
- GPT-4o
- DeepSeek