American English Language Datasets | 150+ Years of Research | Textual Data | NLP | LLMs | TTS | Dictionary Display | Game | US English Coverage product image in hero

American English Language Datasets | 150+ Years of Research | Textual Data | NLP | LLMs | TTS | Dictionary Display | Game | US English Coverage

Oxford Languages
No reviews yetBadge iconVerified Data Provider
xxxxxxxxxx
Xxxxxxxxx
xxxxxx
xxxxxxxxxx
Xxxxx
Xxxxxx
Xxxxxxxxxx
Xxxxxx
Xxxxxxxxx
xxxxxxxxxx Xxxxxxxxx xxxxxx xxxxxxxxxx Xxxxx Xxxxxx Xxxxxxxxxx Xxxxxx Xxxxxxxxx
Xxxxxxxxxx xxxxxxxxx Xxxxxxxxx xxxxxxxxx Xxxxxxx xxxxxx Xxxxx xxxxxxxxxx xxxxxx
Xxxxxxxxxx xxxxxx Xxxxx Xxxxxx xxxxx xxxxxxxx xxxxxxx Xxxxx Xxxxxxxx
xxxxxxxxxx xxxxxx Xxxxxxxxx xxxxxx Xxxxxxxxx Xxxxxxxxx xxxxxxxxxx Xxxxxx Xxxxx
xxxxxx xxxxxxx xxxxxxx Xxxxx xxxxxx Xxxxxxxxxx xxxxxxxx xxxxxx Xxxxx
Xxxxxxx xxxxxx Xxxxxxxx Xxxxxxx Xxxxx xxxxxx xxxxxxxxxx Xxxxx xxxxxxxxxx
xxxxxxxxx Xxxxxxx xxxxxxxx xxxxxxxx Xxxxxxxxxx Xxxxxxxx Xxxxxxxx xxxxxxxxx Xxxxxxxxxx
Xxxxxx Xxxxxxxxx xxxxx xxxxxxx xxxxxxxxx Xxxxxx Xxxxxxx Xxxxxxxxx xxxxxxxxx
xxxxxxxxx Xxxxx xxxxxxxx Xxxxxxx xxxxxxxxx Xxxxxxx xxxxx Xxxxxxx xxxxxxx
Xxxxx xxxxxxxxxx Xxxxxxx Xxxxx xxxxxxxxxx Xxxxxx xxxxxx Xxxxxxxxx xxxxx
Volume
600K
Synonyms
Avail. Formats
.csv, .json, and .mp3
File
Coverage
7
Countries
History
150
years

Description

Derived from 150+ years of lexical research, these comprehensive English datasets, focused on American English, offer linguistically annotated data including headwords, definitions, senses, examples, POS tags, and semantic metadata. Ideal for dictionary tools, NLP, and AI applications.
One of our flagship datasets, the American English data, is expertly curated and linguistically annotated by professionals, with annual updates to ensure accuracy and relevance. The below datasets in American English are available for license: 1. American English Monolingual Dictionary Data 2. American English Synonyms and Antonyms Data 3. American English Pronunciations with Audio Key Features (approximate numbers): 1. American English Monolingual Dictionary Data Our American English Monolingual Dictionary Data is the foremost authority on American English, including detailed tagging and labelling covering parts of speech (POS), grammar, region, register, and subject, providing rich linguistic information. Additionally, all grammar and usage information is present to ensure relevance and accuracy. - Headwords: 140,000 - Senses: 222,000 - Sentence examples: 140,000 - Format: XML and JSON formats - Delivery: Email (link-based file sharing) and REST API - Updated frequency: annually 2. American English Synonyms and Antonyms Data The American English Synonyms and Antonyms Dataset is a leading resource offering comprehensive and up-to-date coverage of word relationships in contemporary American English. It includes rich linguistic detail such as precise definitions and part-of-speech (POS) tags, making it an essential asset for developing AI systems and language technologies that require deep semantic understanding. - Synonyms: 600,000 - Antonyms: 22,000 - Format: XML and JSON formats - Delivery: Email (link-based file sharing) and REST API - Updated frequency: annually 3. American English Pronunciations with Audio (word-level) This dataset provides IPA transcriptions and mapped audio files for words in contemporary American English, with a focus on US speaker usage. It includes syllabified transcriptions, variant spellings, part-of-speech tags, and pronunciation group identifiers. Audio files are supplied separately and linked where available – ideal for TTS, ASR, and pronunciation modeling. - Transcriptions (IPA): 250,000 - Audio files: 180,000 - Format: XLSX (for transcriptions), MP3 and WAV (audio files) - Updated frequency: annually Use Cases: We consistently work with our clients on new use cases as language technology continues to evolve. These include Natural Language Processing (NLP) applications, TTS, dictionary display tools, games, translations, word embedding, and word sense disambiguation (WSD). If you have a specific use case in mind that isn't listed here, we’d be happy to explore it with you. Don’t hesitate to get in touch with us at Growth.OL@oup.com to start the conversation. Pricing: Oxford Languages offers flexible pricing based on use case and delivery format. Our datasets are licensed via term-based IP agreements and tiered pricing for API-delivered data. Whether you’re integrating into a product, training an LLM, or building custom NLP solutions, we tailor licensing to your specific needs. Contact our team or email us at Growth.OL@oup.com to explore pricing options and discover how our language data can support your goals.

Country Coverage

Asia (1)
Philippines
North America (1)
United States of America
Oceania (3)
American Samoa
Guam
Northern Mariana Islands
South America (2)
Puerto Rico
Virgin Islands (U.S.)

History

150 years of historical data

Volume

180,000 Audio files
140,000 Sentences
140,000 Words
222,000 Senses
600,000 Synonyms
22,000 Antonyms
250,000 Transcriptions (IPA)

Pricing

License Starts at
One-off purchase Available
Monthly License Not available
Yearly License Available
Usage-based Available

Suitable Company Sizes

Small Business
Medium-sized Business
Enterprise

Delivery

Methods
Email
REST API
Frequency
yearly
Format
.csv
.json
.mp3
.wav
.xls
.xml

Use Cases

Artificial Intelligence (AI)
Machine Learning (ML)
Gaming
LLM Training

Categories

Related Searches

Related Products

Frequently asked questions

What is American English Language Datasets 150+ Years of Research Textual Data NLP LLMs TTS Dictionary Display Game US English Coverage?

Derived from 150+ years of lexical research, these comprehensive English datasets, focused on American English, offer linguistically annotated data including headwords, definitions, senses, examples, POS tags, and semantic metadata. Ideal for dictionary tools, NLP, and AI applications.

What is American English Language Datasets 150+ Years of Research Textual Data NLP LLMs TTS Dictionary Display Game US English Coverage used for?

This product has 4 key use cases. Oxford Languages recommends using the data for Artificial Intelligence (AI), Machine Learning (ML), Gaming, and LLM Training. Global businesses and organizations buy Natural Language Processing (NLP) Data from Oxford Languages to fuel their analytics and enrichment.

Who can use American English Language Datasets 150+ Years of Research Textual Data NLP LLMs TTS Dictionary Display Game US English Coverage?

This product is best suited if you’re a Small Business, Medium-sized Business, or Enterprise looking for Natural Language Processing (NLP) Data. Get in touch with Oxford Languages to see what their data can do for your business and find out which integrations they provide.

How far back does the data in American English Language Datasets 150+ Years of Research Textual Data NLP LLMs TTS Dictionary Display Game US English Coverage go?

This product has 150 years of historical coverage. It can be delivered on a yearly basis.

Which countries does American English Language Datasets 150+ Years of Research Textual Data NLP LLMs TTS Dictionary Display Game US English Coverage cover?

This product includes data covering 7 countries like USA, Philippines, Guam, Northern Mariana Islands, and American Samoa. Oxford Languages is headquartered in United Kingdom.

How much does American English Language Datasets 150+ Years of Research Textual Data NLP LLMs TTS Dictionary Display Game US English Coverage cost?

Pricing information for American English Language Datasets 150+ Years of Research Textual Data NLP LLMs TTS Dictionary Display Game US English Coverage is available by getting in contact with Oxford Languages. Connect with Oxford Languages to get a quote and arrange custom pricing models based on your data requirements.

How can I get American English Language Datasets 150+ Years of Research Textual Data NLP LLMs TTS Dictionary Display Game US English Coverage?

Businesses can buy Natural Language Processing (NLP) Data from Oxford Languages and get the data via Email and REST API. Depending on your data requirements and subscription budget, Oxford Languages can deliver this product in .csv, .json, .mp3, .wav, .xls, and .xml format.

What is the data quality of American English Language Datasets 150+ Years of Research Textual Data NLP LLMs TTS Dictionary Display Game US English Coverage?

You can compare and assess the data quality of Oxford Languages using Datarade’s data marketplace.

What are similar products to American English Language Datasets 150+ Years of Research Textual Data NLP LLMs TTS Dictionary Display Game US English Coverage?

This product has 3 related products. These alternatives include British English Language Datasets 150+ Years of Research Natural Language Processing (NLP) Data LLMs TTS Dictionary Display EU Coverage, Native & Accented English Speech Data 40,000 Hours Audio Data Speech Recognition Data Natural Language Processing (NLP) Data, and Global English Speech with Accent Conversational Dataset — Multi-Region Validated Speech with Gender, Age & Metadata for AI & NLP Training. You can compare the best Natural Language Processing (NLP) Data providers and products via Datarade’s data marketplace and get the right data for your use case.

Pricing available upon request
License Starts at
One-off purchase Available
Monthly License Not available
Yearly License Available
Usage-based Available