British English Language Datasets | 150+ Years of Research | Natural Language Processing (NLP) Data | LLMs | TTS | Dictionary Display | EU Coverage product image in hero

British English Language Datasets | 150+ Years of Research | Natural Language Processing (NLP) Data | LLMs | TTS | Dictionary Display | EU Coverage

Oxford Languages
No reviews yetBadge iconVerified Data Provider
xxxxxxxxxx
Xxxxxxxxx
xxxxxx
xxxxxxxxxx
Xxxxx
Xxxxxx
Xxxxxxxxxx
Xxxxxx
Xxxxxxxxx
xxxxxxxxxx Xxxxxxxxx xxxxxx xxxxxxxxxx Xxxxx Xxxxxx Xxxxxxxxxx Xxxxxx Xxxxxxxxx
Xxxxxxxxxx xxxxxxxxx Xxxxxxxxx xxxxxxxxx Xxxxxxx xxxxxx Xxxxx xxxxxxxxxx xxxxxx
Xxxxxxxxxx xxxxxx Xxxxx Xxxxxx xxxxx xxxxxxxx xxxxxxx Xxxxx Xxxxxxxx
xxxxxxxxxx xxxxxx Xxxxxxxxx xxxxxx Xxxxxxxxx Xxxxxxxxx xxxxxxxxxx Xxxxxx Xxxxx
xxxxxx xxxxxxx xxxxxxx Xxxxx xxxxxx Xxxxxxxxxx xxxxxxxx xxxxxx Xxxxx
Xxxxxxx xxxxxx Xxxxxxxx Xxxxxxx Xxxxx xxxxxx xxxxxxxxxx Xxxxx xxxxxxxxxx
xxxxxxxxx Xxxxxxx xxxxxxxx xxxxxxxx Xxxxxxxxxx Xxxxxxxx Xxxxxxxx xxxxxxxxx Xxxxxxxxxx
Xxxxxx Xxxxxxxxx xxxxx xxxxxxx xxxxxxxxx Xxxxxx Xxxxxxx Xxxxxxxxx xxxxxxxxx
xxxxxxxxx Xxxxx xxxxxxxx Xxxxxxx xxxxxxxxx Xxxxxxx xxxxx Xxxxxxx xxxxxxx
Xxxxx xxxxxxxxxx Xxxxxxx Xxxxx xxxxxxxxxx Xxxxxx xxxxxx Xxxxxxxxx xxxxx
Volume
600K
Synonyms
Avail. Formats
.csv, .json, and .mp3
File
Coverage
15
Countries
History
150
years

Description

Derived from 150+ years of lexical research, this British English dataset offers comprehensive annotations including headwords, definitions, senses, examples, POS tags, semantic metadata, and usage info. Optimized for use in NLP, dictionary tools, TTS systems, and language model fine-tuning.
Our British English language datasets are meticulously curated and annotated by experienced linguistics and language experts, ensuring exceptional accuracy, consistency, and linguistic depth. The below datasets in British English are available for license: 1. British English Monolingual Dictionary Data 2. British English Synonyms and Antonyms Data 3. British English Pronunciations with Audio Key Features (approximate numbers): 1. British English Monolingual Dictionary Data Our British English monolingual dataset delivers clear, reliable definitions and authentic usage examples, featuring a high volume of headwords and in-depth coverage of the British English variant of English. As one of the world’s most authoritative lexical resources, it’s trusted by leading academic, AI, and language technology organizations. - Headwords: 146,000 - Senses: 230,000 - Sentence examples: 149,000 - Format: XML and JSON format - Delivery: Email (link-based file sharing) and REST API - Updated frequency: twice a year 2. British English Synonyms and Antonyms Data This British English language dataset offers a rich collection of synonyms and antonyms, accompanied by detailed definitions and part-of-speech (POS) annotations, making it a comprehensive resource for NLP tasks such as semantic search, word sense disambiguation, and language generation. - Synonyms: 600,000 - Antonyms: 22,000 - Usage Examples: 39,000 - Format: XML and JSON format - Delivery: Email (link-based file sharing) - Updated frequency: annually 3. British English Pronunciations with audio (word-level) This dataset provides IPA transcriptions and mapped audio files for words in contemporary British English, with a focus on UK speaker usage. It includes syllabified transcriptions, variant spellings, part-of-speech tags, and pronunciation group identifiers. Audio files are supplied separately and linked where available – ideal for TTS, ASR, and pronunciation modeling. - Transcriptions (IPA): 250,000 - Audio files: 180,000 - Format: XLSX (for transcriptions), MP3 and WAV (audio files) - Updated frequency: annually Use Cases: We consistently work with our clients on new use cases as language technology continues to evolve. These include Natural Language Processing (NLP) applications, TTS, dictionary display tools, games, translations, word embedding, and word sense disambiguation (WSD). If you have a specific use case in mind that isn't listed here, we’d be happy to explore it with you. Don’t hesitate to get in touch with us at Growth.OL@oup.com to start the conversation. Pricing: Oxford Languages offers flexible pricing based on use case and delivery format. Our datasets are licensed via term-based IP agreements and tiered pricing for API-delivered data. Whether you’re integrating into a product, training an LLM, or building custom NLP solutions, we tailor licensing to your specific needs. Contact our team or email us at Growth.OL@oup.com to explore pricing options and discover how our language data can support your goals.

Country Coverage

Africa (3)
Kenya
Nigeria
South Africa
Asia (5)
Hong Kong
India
Malaysia
Pakistan
Singapore
Europe (2)
Ireland
United Kingdom
Oceania (2)
Australia
New Zealand
South America (3)
Barbados
Jamaica
Trinidad and Tobago

History

150 years of historical data

Volume

180,000 Audio files
149,000 Sentences
146,000 Words
230,000 Senses
600,000 Synonyms
22,000 Antonyms
39,000 Usage examples
250,000 Transcriptions (IPA)

Pricing

License Starts at
One-off purchase Available
Monthly License Not available
Yearly License Available
Usage-based Available

Suitable Company Sizes

Small Business
Medium-sized Business
Enterprise

Delivery

Methods
Email
REST API
Frequency
yearly
Format
.csv
.json
.mp3
.wav
.xls
.xml

Use Cases

Artificial Intelligence (AI)
Machine Learning (ML)
Gaming
LLM Training

Categories

Related Products

Frequently asked questions

What is British English Language Datasets 150+ Years of Research Natural Language Processing (NLP) Data LLMs TTS Dictionary Display EU Coverage?

Derived from 150+ years of lexical research, this British English dataset offers comprehensive annotations including headwords, definitions, senses, examples, POS tags, semantic metadata, and usage info. Optimized for use in NLP, dictionary tools, TTS systems, and language model fine-tuning.

What is British English Language Datasets 150+ Years of Research Natural Language Processing (NLP) Data LLMs TTS Dictionary Display EU Coverage used for?

This product has 4 key use cases. Oxford Languages recommends using the data for Artificial Intelligence (AI), Machine Learning (ML), Gaming, and LLM Training. Global businesses and organizations buy Natural Language Processing (NLP) Data from Oxford Languages to fuel their analytics and enrichment.

Who can use British English Language Datasets 150+ Years of Research Natural Language Processing (NLP) Data LLMs TTS Dictionary Display EU Coverage?

This product is best suited if you’re a Small Business, Medium-sized Business, or Enterprise looking for Natural Language Processing (NLP) Data. Get in touch with Oxford Languages to see what their data can do for your business and find out which integrations they provide.

How far back does the data in British English Language Datasets 150+ Years of Research Natural Language Processing (NLP) Data LLMs TTS Dictionary Display EU Coverage go?

This product has 150 years of historical coverage. It can be delivered on a yearly basis.

Which countries does British English Language Datasets 150+ Years of Research Natural Language Processing (NLP) Data LLMs TTS Dictionary Display EU Coverage cover?

This product includes data covering 15 countries like India, UK, Australia, Nigeria, and South Africa. Oxford Languages is headquartered in United Kingdom.

How much does British English Language Datasets 150+ Years of Research Natural Language Processing (NLP) Data LLMs TTS Dictionary Display EU Coverage cost?

Pricing information for British English Language Datasets 150+ Years of Research Natural Language Processing (NLP) Data LLMs TTS Dictionary Display EU Coverage is available by getting in contact with Oxford Languages. Connect with Oxford Languages to get a quote and arrange custom pricing models based on your data requirements.

How can I get British English Language Datasets 150+ Years of Research Natural Language Processing (NLP) Data LLMs TTS Dictionary Display EU Coverage?

Businesses can buy Natural Language Processing (NLP) Data from Oxford Languages and get the data via Email and REST API. Depending on your data requirements and subscription budget, Oxford Languages can deliver this product in .csv, .json, .mp3, .wav, .xls, and .xml format.

What is the data quality of British English Language Datasets 150+ Years of Research Natural Language Processing (NLP) Data LLMs TTS Dictionary Display EU Coverage?

You can compare and assess the data quality of Oxford Languages using Datarade’s data marketplace.

What are similar products to British English Language Datasets 150+ Years of Research Natural Language Processing (NLP) Data LLMs TTS Dictionary Display EU Coverage?

This product has 3 related products. These alternatives include LATAM Data Suite 1.8M+ Sentences Natural Language Processing (NLP) Data TTS Dictionary Display Translation Data LATAM Coverage, In-Cabin Speech Data 15,000 Hours AI Training Data Speech Recognition Data Audio Data Natural Language Processing (NLP) Data, and Machine Learning (ML) Data 800M+ B2B Profiles AI-Ready for Deep Learning (DL), NLP & LLM Training. You can compare the best Natural Language Processing (NLP) Data providers and products via Datarade’s data marketplace and get the right data for your use case.

Pricing available upon request
License Starts at
One-off purchase Available
Monthly License Not available
Yearly License Available
Usage-based Available