
Overview
Background
Martin Schweinberger uses big data and computational methods to explore the messy, fascinating reality of how people actually talk—including all the swear words, filler words, and informal expressions that traditional language education overlooks. As a Lecturer in Applied Linguistics at the University of Queensland, he bridges the gap between computer science and linguistics to understand how language evolves in our digital age.
Uncovering Hidden Language Patterns
Much of Martin's research focuses on the language phenomena that schools don't teach but that permeate everyday conversation. He analyzes massive datasets to study vulgarity and swearing patterns, as well as discourse markers—those ubiquitous filler words like "like," "you know," "well," and "I mean" that pepper our speech. By applying statistical methods to real-world language use, he reveals how these supposedly "incorrect" forms of expression actually follow sophisticated social and linguistic rules.
His work also tracks how language changes over time and varies between different social settings, using computational tools to identify patterns that would be impossible to detect through traditional research methods alone.
Building Australia's Language Data Future
As Director of the Language Technology and Data Analysis Laboratory (LADAL)—a free upskilling platform for language data science with hundreds of thousands of users worldwide—and a key figure in one of Australia's major research infrastructure projects, the Language Data Commons of Australia (LDaCA), Martin is helping build the digital infrastructure that will support language research across the country. LDaCA has received substantial funding to create accessible tools and resources that allow researchers to analyze text and speech data more effectively.
Championing Research Transparency
Beyond his linguistic research, Martin advocates for reproducibility and transparency in humanities and social science research. He provides guidance on how language researchers can adopt more rigorous, open research practices—addressing a growing concern about the reliability of academic findings across disciplines.
Martin's international visibility is reflected in his leadership roles: he serves as Vice-President Professional of the International Society for the Linguistics of English (ISLE) and sits on the board of The International Computer Archive of Modern and Medieval English (ICAME), one of the oldest and most reputable societies for corpus linguistics. These positions demonstrate his commitment to advancing computational language research on a global scale.
Potential topics for supervision
I would be particularly interested in supervising theses on the following topics:
Sociolinguistics / Language Variation and Change / World Englishes
- General extenders
- Terms-of-address and salutations
- Discourse particles and markers
- Vulgarity
- Adjective amplification
Learner Language / Applied Linguistics / Corpus Phonetics / Learner Corpus Research
- Vowel production among L1 speakers and learners of English
- Voice-onset-times among L1 speakers and learners of English
- Fluency and pauses in learner and L1 speech.
- Accent and intelligibility / comprehension.
Text Analytics / Digital Humanities / Corpus Linguistics
- Applied word embedding applications in the language sciences.
- Comparison of different association / keyness measures
Availability
- Dr Martin Schweinberger is:
- Available for supervision
- Media expert
Fields of research
Qualifications
- Doctor of Philosophy, Universität Hamburg
Research interests
-
Vulgarity and Swearing
I investigate how swear words and taboo language are used in everyday speech and online discourse. Contrary to popular belief, vulgar language follows systematic social and linguistic rules. My research uncovers how such expressions function in communication and what they reveal about speakers’ identities, emotions, and group memberships.
-
Discourse Markers and Filler Words
I study words like like, you know, and well—terms often dismissed as meaningless. Using computational analysis, I show how these elements structure conversations and convey nuanced meanings. My work demonstrates that such "filler" words play important roles in signaling attitudes, managing interactions, and guiding listener expectations.
-
Open Science and Research Transparency
I actively promote reproducible, open research practices in the humanities and social sciences. I provide practical training and resources to help language researchers adopt transparent workflows. My advocacy supports greater academic rigor and long-term trust in empirical research.
-
Text Analytics and Computational Linguistics
I apply computational methods—like machine learning and statistical modelling—to large corpora to uncover hidden linguistic patterns. These tools help quantify language use in a way that supports replicable, empirical research. My work is at the intersection of computer science and linguistics, making it especially relevant in the digital age.
-
Digital Infrastructure and Research Tools
As Director of LADAL and a lead in LDaCA, I am building accessible digital platforms that support large-scale language analysis. These initiatives democratize access to language data and computational tools for researchers, students, and educators alike. My infrastructure work enhances the capacity for advanced language research in Australia and beyond.
-
Language Variation and Change
I explore how language evolves over time and across different social settings. By analyzing large-scale linguistic datasets, I identify subtle patterns of variation in how people speak, particularly in informal and digital contexts. This research helps reveal how social norms and technology influence the way we communicate.
-
Learner Language and Second Language Acquisition
I analyze how learners of English produce sounds, manage fluency, and develop pronunciation over time. This includes examining features like vowel quality, voice-onset time, pauses, and accent intelligibility. By comparing learner and native speaker data, my research informs language teaching and helps improve learner outcomes.
-
Corpus Phonetics
I use corpus-based methods to investigate the phonetic characteristics of spoken language, including pronunciation patterns among both native and non-native speakers. I focus on measurable acoustic features such as vowel production and timing cues. This approach allows for the large-scale, data-driven analysis of speech in real-life settings.
Research impacts
As director and initiator of the Language Technology and Data Analysis Laboratory (LADAL) I am very proud that LADAL has emerged as one of Australia’s most prominent web-based collaborative support infrastructures for digital and computational humanities with more than 1.1 million page views of more than 500,000 active users in nearly 750,000 engaged sessions since 2021.
Works
Search Professor Martin Schweinberger’s works on UQ eSpace
2025
Journal Article
Vulgarity in online discourse around the English-speaking world
Schweinberger, Martin and Burridge, Kate (2025). Vulgarity in online discourse around the English-speaking world. Lingua, 321 103946, 1-25. doi: 10.1016/j.lingua.2025.103946
2025
Other Outputs
Indigenous Australian languages – linguistic features, revitalization efforts, and research infrastructure for archiving and accessibility
Schweinberger, Martin (2025). Indigenous Australian languages – linguistic features, revitalization efforts, and research infrastructure for archiving and accessibility. Hamburg, Germany: University of Hamburg.
2024
Journal Article
Seeded topic modeling as a more appropriate alternative to unsupervised standard topic models
Schweinberger, Martin (2024). Seeded topic modeling as a more appropriate alternative to unsupervised standard topic models. Discourse Studies. doi: 10.1177/14614456241293895
2024
Other Outputs
Vulgarity in online discourse around the English-speaking world
Schweinberger, Martin (2024). Vulgarity in online discourse around the English-speaking world. Bonn, Germany: University of Bonn.
2024
Book Chapter
A corpus-based comparative acoustic analysis of target-like vowel production by L1-Japanese learners and native speakers of English
Schweinberger, Martin and Komiya, Yuki (2024). A corpus-based comparative acoustic analysis of target-like vowel production by L1-Japanese learners and native speakers of English. Crossing Boundaries through Corpora: Innovative corpus approaches within and beyond linguistics. (pp. 41-61) edited by Sarah Buschfeld, Patricia Ronan, Theresa Neumaier, Andreas Weilinghoff and Lisa Westermayer. Amsterdam, Netherlands: John Benjamins Publishing Company. doi: 10.1075/scl.119.03sch
2024
Book Chapter
An introduction to sociopragmatic variation
Ronan, Patricia and Schweinberger, Martin (2024). An introduction to sociopragmatic variation. Socio-Pragmatic Variation in Ireland. (pp. 1-8) Berlin, Germany: De Gruyter. doi: 10.1515/9783110791457-001
2024
Book Chapter
Concluding remarks and future directions in studies on sociopragmatic variation
Schweinberger, Martin and Ronan, Patricia (2024). Concluding remarks and future directions in studies on sociopragmatic variation. Socio-pragmatic variation in Ireland: using pragmatic variation to construct social identities. (pp. 235-240) edited by Martin Schweinberger and Patricia Ronan. Berlin, Germany: De Gruyter Mouton. doi: 10.1515/9783110791457-012
2024
Book Chapter
Boring much? Semantic determinants of constructional attraction in Irish English
Schweinberger, Martin and Ronan, Patricia (2024). Boring much? Semantic determinants of constructional attraction in Irish English. Socio-Pragmatic Variation in Ireland. (pp. 107-130) Berlin, Germany: De Gruyter. doi: 10.1515/9783110791457-007
2024
Conference Publication
Automated, Corpus-and Usage-Based Semantic Classification of Word Class using Word Embeddings
Schweinberger, Martin and Luo, Chang-Hao (2024). Automated, Corpus-and Usage-Based Semantic Classification of Word Class using Word Embeddings. ICAME45, Vigo, Spain, 18-21 June 2024.
2024
Other Outputs
Introduction to R for Social Science
Schweinberger, Martin (2024). Introduction to R for Social Science. Joensuu, Finland: University of Eastern Finland.
2024
Other Outputs
Introduction to Computational Text Analytics (Workshop)
Schweinberger, Martin and Hames, Sam (2024). Introduction to Computational Text Analytics (Workshop). Brisbane, QLD Australia: The University of Queensland.
2024
Journal Article
Corpus-based discourse analysis: from meta-reflection to accountability
Bednarek, Monika, Schweinberger, Martin and Lee, Kelvin K. H. (2024). Corpus-based discourse analysis: from meta-reflection to accountability. Corpus Linguistics and Linguistic Theory, 20 (3), 1-28. doi: 10.1515/cllt-2023-0104
2024
Journal Article
A corpus‐based analysis of adjective amplification in Hong Kong, Indian and Philippine English
Schweinberger, Martin (2024). A corpus‐based analysis of adjective amplification in Hong Kong, Indian and Philippine English. World Englishes. doi: 10.1111/weng.12640
2024
Book Chapter
When natural language processing meets corpus linguistics
Schweinberger, Martin (2024). When natural language processing meets corpus linguistics. Digitally-assisted historical English linguistics. (pp. 73-88) edited by Carolina P. Amador-Moreno, Dagmar Haumann and Arne Peters. New York, NY, United States: Routledge. doi: 10.4324/9781003360285-6
2023
Other Outputs
Introduction to Dimension Reduction Methods with R
Schweinberger, Martin (2023). Introduction to Dimension Reduction Methods with R. Tromsø, Norway: The Arctic University of Norway.
2023
Journal Article
On the L1-acquisition of the pragmatics of discourse like
Schweinberger, Martin (2023). On the L1-acquisition of the pragmatics of discourse like. Functions of Language, 30 (3), 255-286. doi: 10.1075/fol.20025.sch
2023
Other Outputs
F%$# Twitter. A corpus-based analysis of vulgar language on Twitter
Schweinberger, Martin (2023). F%$# Twitter. A corpus-based analysis of vulgar language on Twitter. Bayreuth, Germany: Bayreuth University.
2023
Journal Article
Research trends in corpus linguistics: a bibliometric analysis of two decades of Scopus-indexed corpus linguistics research in arts and humanities
Crosthwaite, Peter, Ningrum, Sulistya and Schweinberger, Martin (2023). Research trends in corpus linguistics: a bibliometric analysis of two decades of Scopus-indexed corpus linguistics research in arts and humanities. International Journal of Corpus Linguistics, 28 (3), 344-377. doi: 10.1075/ijcl.21072.cro
2023
Conference Publication
An introduction to the resources provided by LADAL - the Language Technology and Data Analysis Laboratory
Schweinberger, Martin (2023). An introduction to the resources provided by LADAL - the Language Technology and Data Analysis Laboratory. 7th Meeting of the International Society for the Linguistics of English (ISLE7), Brisbane, QLD, Australia, 19 - 23 June 2023. Brisbane, QLD, Australia: University of Queensland.
2023
Conference Publication
Who swears most – and in what social settings?
Schweinberger, Martin, Fatemi, Masoud, Hames, Sam, Haugh, Michael, Laitinen, Mikko, Rautionaho, Paula and Takahashi, Marissa (2023). Who swears most – and in what social settings?. 7th Meeting of the International Society for the Linguistics of English (ISLE7), Brisbane, QLD, Australia, 19-23 June 2023. Brisbane, QLD, Australia: University of Queensland.
Supervision
Availability
- Dr Martin Schweinberger is:
- Available for supervision
Before you email them, read our advice on how to contact a supervisor.
Supervision history
Current supervision
-
Doctor Philosophy
A multifactorial study of morpho-syntactic errors across different L1 backgrounds and language proficiency levels
Principal Advisor
Other advisors: Associate Professor Peter Crosthwaite
-
Doctor Philosophy
A corpus-based analysis of conspiracy theory discourse on Reddit: Understanding conspiracy-fuelled anomie and moral panics during COVID-19
Principal Advisor
Other advisors: Professor Ryan Ko
-
Doctor Philosophy
Corpus-based investigation of three-minute thesis presentations: Register perspective
Associate Advisor
Other advisors: Associate Professor Peter Crosthwaite
-
Doctor Philosophy
The Relationship Between Writing Tasks and Second Language Writers¿ Use of Metadiscourse
Associate Advisor
Other advisors: Associate Professor Peter Crosthwaite
-
Doctor Philosophy
Integrating Artificial Intelligence and Machine Learning in TESOL: A Study on Personalised Learning and Impact on Student Engagement and Motivation in A Rural Indonesian University
Associate Advisor
Other advisors: Associate Professor Peter Crosthwaite
Completed supervision
-
2023
Doctor Philosophy
The acquisition of number marking: The case of Indonesian as a second language
Associate Advisor
Other advisors: Associate Professor Peter Crosthwaite
Media
Enquiries
Contact Dr Martin Schweinberger directly for media enquiries about their areas of expertise.
Need help?
For help with finding experts, story ideas and media enquiries, contact our Media team: