LATAM Data Suite | 1.8M+ Sentences | NLP | TTS | Dictionary Display | Translations | Spanish, Portuguese, American English Coverage product image in hero

LATAM Data Suite | 1.8M+ Sentences | NLP | TTS | Dictionary Display | Translations | Spanish, Portuguese, American English Coverage

Oxford Languages
No reviews yetBadge iconVerified Data Provider
xxxxxxxxxx
Xxxxxxxxx
xxxxxx
xxxxxxxxxx
Xxxxx
Xxxxxx
Xxxxxxxxxx
Xxxxxx
Xxxxxxxxx
xxxxxxxxxx Xxxxxxxxx xxxxxx xxxxxxxxxx Xxxxx Xxxxxx Xxxxxxxxxx Xxxxxx Xxxxxxxxx
Xxxxxxxxxx xxxxxxxxx Xxxxxxxxx xxxxxxxxx Xxxxxxx xxxxxx Xxxxx xxxxxxxxxx xxxxxx
Xxxxxxxxxx xxxxxx Xxxxx Xxxxxx xxxxx xxxxxxxx xxxxxxx Xxxxx Xxxxxxxx
xxxxxxxxxx xxxxxx Xxxxxxxxx xxxxxx Xxxxxxxxx Xxxxxxxxx xxxxxxxxxx Xxxxxx Xxxxx
xxxxxx xxxxxxx xxxxxxx Xxxxx xxxxxx Xxxxxxxxxx xxxxxxxx xxxxxx Xxxxx
Xxxxxxx xxxxxx Xxxxxxxx Xxxxxxx Xxxxx xxxxxx xxxxxxxxxx Xxxxx xxxxxxxxxx
xxxxxxxxx Xxxxxxx xxxxxxxx xxxxxxxx Xxxxxxxxxx Xxxxxxxx Xxxxxxxx xxxxxxxxx Xxxxxxxxxx
Xxxxxx Xxxxxxxxx xxxxx xxxxxxx xxxxxxxxx Xxxxxx Xxxxxxx Xxxxxxxxx xxxxxxxxx
xxxxxxxxx Xxxxx xxxxxxxx Xxxxxxx xxxxxxxxx Xxxxxxx xxxxx Xxxxxxx xxxxxxx
Xxxxx xxxxxxxxxx Xxxxxxx Xxxxx xxxxxxxxxx Xxxxxx xxxxxx Xxxxxxxxx xxxxx
Volume
2.02M
Sentences
Avail. Formats
.csv, .json, and .mp3
File
Coverage
21
Countries

Description

LATAM Data Suite provides high-quality datasets in Spanish, Portuguese, and American English. Ideal for NLP, AI, LLMs, translation, and education, it combines linguistic depth and regional authenticity to power scalable, multilingual language technologies.
Discover our expertly curated language datasets in the LATAM Data Suite. Compiled and annotated by language and linguistic experts, this suite offers high-quality resources tailored to your needs. This suite includes: - Monolingual and Bilingual Dictionary Data Featuring headwords, definitions, word senses, part-of-speech (POS) tags, and semantic metadata. - Sentences Curated examples of real-world usage with contextual annotations. - Synonyms & Antonyms Lexical relations to support semantic search, paraphrasing, and language understanding. - Audio Data Native speaker recordings for TTS and pronunciation modeling. - Word Lists Frequency-ranked and thematically grouped lists. Learn more about the datasets included in the data suite: 1. Portuguese Monolingual Dictionary Data 2. Portuguese Bilingual Dictionary Data 3. Spanish Monolingual Dictionary Data 4. Spanish Bilingual Dictionary Data 5. Spanish Sentences Data 6. Spanish Synonyms and Antonyms Data 7. Spanish Audio Data 8. Spanish Word List Data 9. American English Monolingual Dictionary Data 10. American English Synonyms and Antonyms Data 11. American English Pronunciations with Audio Key Features (approximate numbers): 1. Portuguese Monolingual Dictionary Data Our Portuguese monolingual covers both European and Latin American varieties, featuring clear definitions and examples, a large volume of headwords, and comprehensive coverage of the Portuguese language. - Headwords: 143,600 - Senses: 285,500 - Sentence examples: 69,300 - Format: XML format - Delivery: Email (link-based file sharing) 2. Portuguese Bilingual Dictionary Data The bilingual data provides translations in both directions, from English to Portuguese and from Portuguese to English. It is annually reviewed and updated by our in-house team of language experts. Offers comprehensive coverage of the language, providing a substantial volume of translated words of excellent quality that span both European and Latin American Portuguese varieties. - Translations: 300,000 - Senses: 158,000 - Example sentences: 117,800 - Format: XML and JSON formats - Delivery: Email (link-based file sharing) and REST API - Updated frequency: annually 3. Spanish Monolingual Dictionary Data Our Spanish monolingual reliably offers clear definitions and examples, a large volume of headwords, and comprehensive coverage of the Spanish language. - Headwords: 73,000 - Senses: 123,000 - Sentence examples: 104,000 - Format: XML and JSON formats - Delivery: Email (link-based file sharing) and REST API - Updated frequency: annually 4. Spanish Bilingual Dictionary Data The bilingual data provides translations in both directions, from English to Spanish and from Spanish to English. It is annually reviewed and updated by our in-house team of language experts. Offers significant coverage of the language, providing a large volume of translated words of excellent quality. - Translations: 221,300 - Senses: 103,500 - Example sentences: 74,500 - Example translations: 83,800 - Format: XML and JSON formats - Delivery: Email (link-based file sharing) and REST API - Updated frequency: annually 5. Spanish Sentences Data Spanish sentences retrieved from corpus are ideal for NLP model training, presenting approximately 20 million words. The sentences provide a great coverage of Spanish-speaking countries and are accordingly tagged to a particular country or dialect. - Sentences volume: 1,840,000 - Format: XML and JSON formats - Delivery: Email (link-based file sharing) and REST API 6. Spanish Synonyms and Antonyms Data This Spanish language dataset offers a rich collection of synonyms and antonyms, accompanied by detailed definitions and part-of-speech (POS) annotations, making it a comprehensive resource for building linguistically aware AI systems and language technologies. - Synonyms: 127,700 - Antonyms: 9,500 - Format: XML format - Delivery: Email (link-based file sharing) - Updated frequency: annually 7. Spanish Audio Data (word-level) Curated word-level audio data for the Spanish language, which covers all varieties of world Spanish, providing rich dialectal diversity in the Spanish language. - Audio files: 20,900 - Format: XLSX (for index), MP3 and WAV (audio files) 8. Spanish Word List Data This language data contains a carefully curated and comprehensive list of 450,000 Spanish words. - Wordforms: 450,000 - Format: CSV and TXT formats - Delivery: Email (link-based file sharing) 9. American English Monolingual Dictionary Data Our American English Monolingual Dictionary Data is the foremost authority on American English, including detailed tagging and labelling covering parts of speech (POS), grammar, region, register, and subject, providing rich linguistic information. Additionally, all grammar and usage information is present to ensure relevance and accuracy. - Headwords: 140,000 - Senses: 222,000 - Sentence examples: 140,000 - Format: XML and JSON formats - Delivery: Email (link-based file sharing) and REST API - Updated frequency: annually 10. American English Synonyms and Antonyms Data The American English Synonyms and Antonyms Dataset is a leading resource offering comprehensive, up-to-date coverage of word relationships in contemporary American English. It includes rich linguistic detail such as precise definitions and part-of-speech (POS) tags, making it an essential asset for developing AI systems and language technologies that require deep semantic understanding. - Synonyms: 600,000 - Antonyms: 22,000 - Format: XML and JSON formats - Delivery: Email (link-based file sharing) and REST API - Updated frequency: annually 11. American English Pronunciations with Audio (word-level) This dataset provides IPA transcriptions and mapped audio files for words in contemporary American English, with a focus on US speaker usage. It includes syllabified transcriptions, variant spellings, part-of-speech tags, and pronunciation group identifiers. Audio files are supplied separately and linked where available – ideal for TTS, ASR, and pronunciation modeling. - Transcriptions (IPA): 250,000 - Audio files: 180,000 - Format: XLSX (for transcriptions), MP3 and WAV (audio files) - Updated frequency: annually Use Cases: We consistently work with our clients on new use cases as language technology continues to evolve. These include NLP applications, TTS, dictionary display tools, games, translation, word embedding, and word sense disambiguation (WSD). If you have a specific use case in mind that isn't listed here, we’d be happy to explore it with you. Don’t hesitate to get in touch with us at Oxford.Languages@oup.com to start the conversation. Pricing: Oxford Languages offers flexible pricing based on use case and delivery format. Our datasets are licensed via term-based IP agreements and tiered pricing for API-delivered data. Whether you’re integrating into a product, training an LLM, or building custom NLP solutions, we tailor licensing to your specific needs. Contact our team or email us at Oxford.Languages@oup.com to explore pricing options and discover how our language data can support your goals.

Country Coverage

Europe (1)
Spain
North America (7)
Costa Rica
El Salvador
Guatemala
Honduras
Mexico
Nicaragua
Panama
South America (13)
Argentina
Bolivia (Plurinational State of)
Brazil
Chile
Colombia
Cuba
Dominican Republic
Ecuador
Paraguay
Peru
Puerto Rico
Uruguay
Venezuela (Bolivarian Republic of)

Volume

271,000 Audio files
2.02 million Sentences
356,000 Words
631,000 Senses
521,000 Translations
727,000 Synonyms
31,500 Antonyms

Pricing

License Starts at
One-off purchase Available
Monthly License Not available
Yearly License Available
Usage-based Available

Suitable Company Sizes

Small Business
Medium-sized Business
Enterprise

Delivery

Methods
Email
REST API
Frequency
yearly
Format
.csv
.json
.mp3
.wav
.xls
.xml

Use Cases

Artificial Intelligence (AI)
Machine Learning (ML)
Gaming
LLM Training

Categories

Related Products

Frequently asked questions

What is LATAM Data Suite 1.8M+ Sentences NLP TTS Dictionary Display Translations Spanish, Portuguese, American English Coverage?

LATAM Data Suite provides high-quality datasets in Spanish, Portuguese, and American English. Ideal for NLP, AI, LLMs, translation, and education, it combines linguistic depth and regional authenticity to power scalable, multilingual language technologies.

What is LATAM Data Suite 1.8M+ Sentences NLP TTS Dictionary Display Translations Spanish, Portuguese, American English Coverage used for?

This product has 4 key use cases. Oxford Languages recommends using the data for Artificial Intelligence (AI), Machine Learning (ML), Gaming, and LLM Training. Global businesses and organizations buy Natural Language Processing (NLP) Data from Oxford Languages to fuel their analytics and enrichment.

Who can use LATAM Data Suite 1.8M+ Sentences NLP TTS Dictionary Display Translations Spanish, Portuguese, American English Coverage?

This product is best suited if you’re a Small Business, Medium-sized Business, or Enterprise looking for Natural Language Processing (NLP) Data. Get in touch with Oxford Languages to see what their data can do for your business and find out which integrations they provide.

Which countries does LATAM Data Suite 1.8M+ Sentences NLP TTS Dictionary Display Translations Spanish, Portuguese, American English Coverage cover?

This product includes data covering 21 countries like Brazil, Spain, Mexico, Argentina, and Colombia. Oxford Languages is headquartered in United Kingdom.

How much does LATAM Data Suite 1.8M+ Sentences NLP TTS Dictionary Display Translations Spanish, Portuguese, American English Coverage cost?

Pricing information for LATAM Data Suite 1.8M+ Sentences NLP TTS Dictionary Display Translations Spanish, Portuguese, American English Coverage is available by getting in contact with Oxford Languages. Connect with Oxford Languages to get a quote and arrange custom pricing models based on your data requirements.

How can I get LATAM Data Suite 1.8M+ Sentences NLP TTS Dictionary Display Translations Spanish, Portuguese, American English Coverage?

Businesses can buy Natural Language Processing (NLP) Data from Oxford Languages and get the data via Email and REST API. Depending on your data requirements and subscription budget, Oxford Languages can deliver this product in .csv, .json, .mp3, .wav, .xls, and .xml format.

What is the data quality of LATAM Data Suite 1.8M+ Sentences NLP TTS Dictionary Display Translations Spanish, Portuguese, American English Coverage?

You can compare and assess the data quality of Oxford Languages using Datarade’s data marketplace.

What are similar products to LATAM Data Suite 1.8M+ Sentences NLP TTS Dictionary Display Translations Spanish, Portuguese, American English Coverage?

This product has 3 related products. These alternatives include Spanish Language Datasets 1.8M+ Sentences NLP TTS Dictionary Display Game Translations European & Latin Amer. Coverage, In-Cabin Speech Data 15,000 Hours AI Training Data Speech Recognition Data Audio Data Natural Language Processing (NLP) Data, and Machine Learning (ML) Data 800M+ B2B Profiles AI-Ready for Deep Learning (DL), NLP & LLM Training. You can compare the best Natural Language Processing (NLP) Data providers and products via Datarade’s data marketplace and get the right data for your use case.

Pricing available upon request
License Starts at
One-off purchase Available
Monthly License Not available
Yearly License Available
Usage-based Available