Martin Schweinberger

Email:: m.schweinberger@uq.edu.au
Phone:: +61 7 336 56892

Background

Martin Schweinberger uses big data and computational methods to explore the messy, fascinating reality of how people actually talk—including all the swear words, filler words, and informal expressions that traditional language education overlooks. As a Senior Lecturer in Applied Linguistics at the University of Queensland, he bridges the gap between computer science and linguistics to understand how language evolves in our digital age.

Uncovering Hidden Language Patterns

Much of Martin's research focuses on the language phenomena that schools don't teach but that permeate everyday conversation. He analyzes massive datasets to study vulgarity and swearing patterns, as well as discourse markers—those ubiquitous filler words like "like," "you know," "well," and "I mean" that pepper our speech. By applying statistical methods to real-world language use, he reveals how these supposedly "incorrect" forms of expression actually follow sophisticated social and linguistic rules.

His work also tracks how language changes over time and varies between different social settings, using computational tools to identify patterns that would be impossible to detect through traditional research methods alone.

Building Australia's Language Data Future

As Director of the Language Technology and Data Analysis Laboratory (LADAL)—a free upskilling platform for language data science with hundreds of thousands of users worldwide—and a key figure in one of Australia's major research infrastructure projects, the Language Data Commons of Australia (LDaCA), Martin is helping build the digital infrastructure that will support language research across the country. LDaCA has received substantial funding to create accessible tools and resources that allow researchers to analyze text and speech data more effectively.

Championing Research Transparency

Beyond his linguistic research, Martin advocates for reproducibility and transparency in humanities and social science research. He provides guidance on how language researchers can adopt more rigorous, open research practices—addressing a growing concern about the reliability of academic findings across disciplines.

Martin's international visibility is reflected in his leadership roles: he serves as Vice-President Professional of the International Society for the Linguistics of English (ISLE) and sits on the board of The International Computer Archive of Modern and Medieval English (ICAME), one of the oldest and most reputable societies for corpus linguistics. These positions demonstrate his commitment to advancing computational language research on a global scale.

Potential topics for supervision

I would be particularly interested in supervising theses on the following topics:

Sociolinguistics / Language Variation and Change / World Englishes

General extenders
Terms-of-address and salutations
Discourse particles and markers
Vulgarity
Adjective amplification

Learner Language / Applied Linguistics / Corpus Phonetics / Learner Corpus Research

Vowel production among L1 speakers and learners of English
Voice-onset-times among L1 speakers and learners of English
Fluency and pauses in learner and L1 speech.
Accent and intelligibility / comprehension.

Text Analytics / Digital Humanities / Corpus Linguistics

Applied word embedding applications in the language sciences.
Comparison of different association / keyness measures

Availability

Dr Martin Schweinberger is:: Available for supervision; Media expert

Fields of research

Corpus linguistics English language Language studies Language, Communication and Culture Linguistics

Qualifications

Doctor of Philosophy, Universität Hamburg

Research interests

Vulgarity and Swearing

I investigate how swear words and taboo language are used in everyday speech and online discourse. Contrary to popular belief, vulgar language follows systematic social and linguistic rules. My research uncovers how such expressions function in communication and what they reveal about speakers’ identities, emotions, and group memberships.
Discourse Markers and Filler Words

I study words like like, you know, and well—terms often dismissed as meaningless. Using computational analysis, I show how these elements structure conversations and convey nuanced meanings. My work demonstrates that such "filler" words play important roles in signaling attitudes, managing interactions, and guiding listener expectations.
Open Science and Research Transparency

I actively promote reproducible, open research practices in the humanities and social sciences. I provide practical training and resources to help language researchers adopt transparent workflows. My advocacy supports greater academic rigor and long-term trust in empirical research.
Text Analytics and Computational Linguistics

I apply computational methods—like machine learning and statistical modelling—to large corpora to uncover hidden linguistic patterns. These tools help quantify language use in a way that supports replicable, empirical research. My work is at the intersection of computer science and linguistics, making it especially relevant in the digital age.
Digital Infrastructure and Research Tools

As Director of LADAL and a lead in LDaCA, I am building accessible digital platforms that support large-scale language analysis. These initiatives democratize access to language data and computational tools for researchers, students, and educators alike. My infrastructure work enhances the capacity for advanced language research in Australia and beyond.
Language Variation and Change

I explore how language evolves over time and across different social settings. By analyzing large-scale linguistic datasets, I identify subtle patterns of variation in how people speak, particularly in informal and digital contexts. This research helps reveal how social norms and technology influence the way we communicate.
Learner Language and Second Language Acquisition

I analyze how learners of English produce sounds, manage fluency, and develop pronunciation over time. This includes examining features like vowel quality, voice-onset time, pauses, and accent intelligibility. By comparing learner and native speaker data, my research informs language teaching and helps improve learner outcomes.
Corpus Phonetics

I use corpus-based methods to investigate the phonetic characteristics of spoken language, including pronunciation patterns among both native and non-native speakers. I focus on measurable acoustic features such as vowel production and timing cues. This approach allows for the large-scale, data-driven analysis of speech in real-life settings.

Research impacts

As director and initiator of the Language Technology and Data Analysis Laboratory (LADAL) I am very proud that LADAL has emerged as one of Australia’s most prominent web-based collaborative support infrastructures for digital and computational humanities with more than 1.1 million page views of more than 500,000 active users in nearly 750,000 engaged sessions since 2021.

Search Professor Martin Schweinberger’s works on UQ eSpace

103 works between 2008 and 2026

All (103) Book (1) Journal Article (27) Other Outputs (18) Conference Publication (36) Book Chapter (21)

2022

Conference Publication

Research trends in corpus linguistics: a bibliometric analysis of two decades of Scopus-indexed corpus linguistics research in arts and humanities

Crosthwaite, P., Ningrum, S. and Schweinberger, M. (2022). Research trends in corpus linguistics: a bibliometric analysis of two decades of Scopus-indexed corpus linguistics research in arts and humanities. ICAME43, Cambridge, United Kingdom, 27-30 July 2022.

Research trends in corpus linguistics: a bibliometric analysis of two decades of Scopus-indexed corpus linguistics research in arts and humanities

2022

Other Outputs

From tables to forests – working with tables and tree-based models

Schweinberger, Martin (2022). From tables to forests – working with tables and tree-based models. Tromsø, Norway: The Arctic University of Norway.

From tables to forests – working with tables and tree-based models

2022

Other Outputs

Introduction to Power Analysis with R

Schweinberger, Martin (2022). Introduction to Power Analysis with R. Tromsø, Norway: The Arctic University of Norway.

Introduction to Power Analysis with R

2022

Other Outputs

Introduction to data visualization with R

Schweinberger, Martin (2022). Introduction to data visualization with R. Tromsø, Norway: The Arctic University of Norway.

Introduction to data visualization with R

2022

Book Chapter

Absolutely fantastic and really really good: language variation and change in Irish English

Schweinberger, Martin (2022). Absolutely fantastic and really really good: language variation and change in Irish English. Expanding the landscapes of Irish English research. (pp. 129-145) edited by Stephen Lucek and Carolina P. Amador-Moreno. New York, United States: Routledge. doi: 10.4324/9781003025078-7

Absolutely fantastic and really really good: language variation and change in Irish English

2021

Journal Article

Ongoing change in the Australian English amplifier system

Schweinberger, Martin (2021). Ongoing change in the Australian English amplifier system. Australian Journal of Linguistics, 41 (2), 166-194. doi: 10.1080/07268602.2021.1931028

Ongoing change in the Australian English amplifier system

2021

Journal Article

Training disciplinary genre awareness through blended learning: an exploration into EAP students’ perceptions of online annotation of genres across disciplines

Crosthwaite, Peter, Sanhueza, Alicia Gazmuri and Schweinberger, Martin (2021). Training disciplinary genre awareness through blended learning: an exploration into EAP students’ perceptions of online annotation of genres across disciplines. Journal of English for Academic Purposes, 53 101021, 1-16. doi: 10.1016/j.jeap.2021.101021

Training disciplinary genre awareness through blended learning: an exploration into EAP students’ perceptions of online annotation of genres across disciplines

2021

Journal Article

Which word gets the nuclear stress in a turn-at-talk?

Ruhlemann, Christoph and Schweinberger, Martin (2021). Which word gets the nuclear stress in a turn-at-talk?. Journal of Pragmatics, 178, 426-439. doi: 10.1016/j.pragma.2021.04.005

Which word gets the nuclear stress in a turn-at-talk?

2021

Journal Article

Voices from the periphery: perceptions of Indonesian primary vs secondary pre-service teacher trainees about corpora and data-driven learning in the L2 English classroom

Crosthwaite, Peter, Luciana and Schweinberger, Martin (2021). Voices from the periphery: perceptions of Indonesian primary vs secondary pre-service teacher trainees about corpora and data-driven learning in the L2 English classroom. Applied Corpus Linguistics, 1 (1) 100003, 1-13. doi: 10.1016/j.acorp.2021.100003

Voices from the periphery: perceptions of Indonesian primary vs secondary pre-service teacher trainees about corpora and data-driven learning in the L2 English classroom

2021

Other Outputs

Fixed- and mixed-effects regression models in R

Schweinberger, Martin (2021). Fixed- and mixed-effects regression models in R. Brisbane, QLD, Australia: The University of Queensland, School of Languages and Cultures.

Fixed- and mixed-effects regression models in R

2021

Journal Article

Analysing discourse around COVID-19 in the Australian Twittersphere: a real-time corpus-based analysis

Schweinberger, Martin, Haugh, Michael and Hames, Sam (2021). Analysing discourse around COVID-19 in the Australian Twittersphere: a real-time corpus-based analysis. Big Data and Society, 8 (1) 20539517211021437, 205395172110214. doi: 10.1177/20539517211021437

Analysing discourse around COVID-19 in the Australian Twittersphere: a real-time corpus-based analysis

2021

Book Chapter

Using intensifier-adjective collocations to investigate mechanisms of change

Schweinberger, Martin (2021). Using intensifier-adjective collocations to investigate mechanisms of change. Variation in time and space: observing the world through corpora. (pp. 231-255) edited by Anna Čermáková and Markéta Malá. Berlin, Germany: De Gruyter. doi: 10.1515/9783110604719-010

Using intensifier-adjective collocations to investigate mechanisms of change

2021

Other Outputs

Tree-based models in R

Schweinberger, Martin (2021). Tree-based models in R. Brisbane, QLD, Australia: The University of Queensland, School of Languages and Cultures.

Tree-based models in R

2021

Book Chapter

On the waning of forms – a corpus-based analysis of decline and loss in adjective amplification

Schweinberger, Martin (2021). On the waning of forms – a corpus-based analysis of decline and loss in adjective amplification. Lost in change: causes and processes in the loss of grammatical elements and constructions. (pp. 235-260) edited by Svenja Kranich and Tine Breban . Amsterdam, Netherlands: John Benjamins Publishing Company. doi: 10.1075/slcs.218.08sch

On the waning of forms – a corpus-based analysis of decline and loss in adjective amplification

2021

Journal Article

Analyzing Historical Changes in the Irish English Amplifier System

Schweinberger, M. (2021). Analyzing Historical Changes in the Irish English Amplifier System. Anglistik, 32 (1), 139-158. doi: 10.33675/angl/2021/1/11

Analyzing Historical Changes in the Irish English Amplifier System

2020

Journal Article

A corpus-based analysis of differences in the use of very for adjective amplification among native speakers and learners of English

Schweinberger, Martin (2020). A corpus-based analysis of differences in the use of very for adjective amplification among native speakers and learners of English. International Journal of Learner Corpus Research, 6 (2), 163-192. doi: 10.1075/ijlcr.20011.sch

A corpus-based analysis of differences in the use of very for adjective amplification among native speakers and learners of English

2020

Journal Article

Less is more? The impact of written corrective feedback on corpus-assisted L2 error resolution

Crosthwaite, Peter, Storch, Neomy and Schweinberger, Martin (2020). Less is more? The impact of written corrective feedback on corpus-assisted L2 error resolution. Journal of Second Language Writing, 49 100729, 100729. doi: 10.1016/j.jslw.2020.100729

Less is more? The impact of written corrective feedback on corpus-assisted L2 error resolution

2020

Journal Article

How learner corpus-research can inform language learning and teaching

Schweinberger, Martin (2020). How learner corpus-research can inform language learning and teaching. Australian Review of Applied Linguistics, 43 (2), 195-217.

How learner corpus-research can inform language learning and teaching

2020

Journal Article

How learner corpus research can inform language learning and teaching: an analysis of adjective amplification among L1 and L2 English speakers

Schweinberger, Martin (2020). How learner corpus research can inform language learning and teaching: an analysis of adjective amplification among L1 and L2 English speakers. Australian Review of Applied Linguistics, 43 (2), 196-218. doi: 10.1075/aral.00032.sch

How learner corpus research can inform language learning and teaching: an analysis of adjective amplification among L1 and L2 English speakers

2020

Journal Article

Speech-unit final like in Irish English

Schweinberger, Martin (2020). Speech-unit final like in Irish English. English World-Wide, 41 (1), 89-117. doi: 10.1075/eww.00041.sch

Speech-unit final like in Irish English

Current funding

2024 - 2028

Language Data Commons of Australia (LDaCA-RDC)

Australian Research Data Commons Limited

Open grant

Past funding

2021 - 2024

Language Data Commons of Australia HASS RDC (LDaCA-RDC)

ARDC - Australian Data Partnerships

Open grant

Availability

Dr Martin Schweinberger is:: Available for supervision

Looking for a supervisor? Read our advice on how to choose a supervisor.

Supervision history

Current supervision

Doctor Philosophy

A multifactorial study of morpho-syntactic errors across different L1 backgrounds and language proficiency levels

Principal Advisor

Other advisors: Associate Professor Peter Crosthwaite
Doctor Philosophy

The Relationship Between Writing Tasks and Second Language Writers¿ Use of Metadiscourse

Associate Advisor

Other advisors: Associate Professor Peter Crosthwaite
Doctor Philosophy

Enhancing Lexical Resources for Argumentative Essay Writing through Corpus Integration

Associate Advisor

Other advisors: Associate Professor Peter Crosthwaite
Doctor Philosophy

Integrating Artificial Intelligence and Machine Learning in TESOL: A Study on Personalised Learning and Impact on Student Engagement and Motivation in A Rural Indonesian University

Associate Advisor

Other advisors: Associate Professor Peter Crosthwaite
Doctor Philosophy

Corpus-based investigation of three-minute thesis presentations: Register perspective

Associate Advisor

Other advisors: Associate Professor Peter Crosthwaite

Completed supervision

2025

Doctor Philosophy

A corpus-based analysis of conspiracy theory discourse on Reddit: Understanding conspiracy-fuelled anomie and moral panics during COVID-19

Principal Advisor

Other advisors: Professor Ryan Ko
2023

Doctor Philosophy

The acquisition of number marking: The case of Indonesian as a second language

Associate Advisor

Other advisors: Associate Professor Peter Crosthwaite

Enquiries

Contact Dr Martin Schweinberger directly for media enquiries about their areas of expertise.

Need help?

For help with finding experts, story ideas and media enquiries, contact our Media team:

communications@uq.edu.au

Martin Schweinberger

Overview

Background

Availability

Fields of research

Qualifications

Research interests

Vulgarity and Swearing

Discourse Markers and Filler Words

Open Science and Research Transparency

Text Analytics and Computational Linguistics

Digital Infrastructure and Research Tools

Language Variation and Change

Learner Language and Second Language Acquisition

Corpus Phonetics

Research impacts

Works

Research trends in corpus linguistics: a bibliometric analysis of two decades of Scopus-indexed corpus linguistics research in arts and humanities

From tables to forests – working with tables and tree-based models

Introduction to Power Analysis with R

Introduction to data visualization with R

Absolutely fantastic and really really good: language variation and change in Irish English

Ongoing change in the Australian English amplifier system

Training disciplinary genre awareness through blended learning: an exploration into EAP students’ perceptions of online annotation of genres across disciplines

Which word gets the nuclear stress in a turn-at-talk?

Voices from the periphery: perceptions of Indonesian primary vs secondary pre-service teacher trainees about corpora and data-driven learning in the L2 English classroom

Fixed- and mixed-effects regression models in R

Analysing discourse around COVID-19 in the Australian Twittersphere: a real-time corpus-based analysis

Using intensifier-adjective collocations to investigate mechanisms of change

Tree-based models in R

On the waning of forms – a corpus-based analysis of decline and loss in adjective amplification

Analyzing Historical Changes in the Irish English Amplifier System

A corpus-based analysis of differences in the use of very for adjective amplification among native speakers and learners of English

Less is more? The impact of written corrective feedback on corpus-assisted L2 error resolution

How learner corpus-research can inform language learning and teaching

How learner corpus research can inform language learning and teaching: an analysis of adjective amplification among L1 and L2 English speakers

Speech-unit final like in Irish English

Funding

Current funding

Past funding

Supervision

Availability

Supervision history

Current supervision

A multifactorial study of morpho-syntactic errors across different L1 backgrounds and language proficiency levels

The Relationship Between Writing Tasks and Second Language Writers¿ Use of Metadiscourse

Enhancing Lexical Resources for Argumentative Essay Writing through Corpus Integration

Integrating Artificial Intelligence and Machine Learning in TESOL: A Study on Personalised Learning and Impact on Student Engagement and Motivation in A Rural Indonesian University

Corpus-based investigation of three-minute thesis presentations: Register perspective

Completed supervision

A corpus-based analysis of conspiracy theory discourse on Reddit: Understanding conspiracy-fuelled anomie and moral panics during COVID-19

The acquisition of number marking: The case of Indonesian as a second language

Media

Enquiries

Need help?