Chinese Language Datasets | 583KTranslations | 178K Words | NLP | Dictionary Display | Translations Data | APAC coverage  | Mandarin | Cantonese product image in hero

Chinese Language Datasets | 583KTranslations | 178K Words | NLP | Dictionary Display | Translations Data | APAC coverage | Mandarin | Cantonese

Oxford Languages
No reviews yetBadge iconVerified Data Provider
xxxxxxxxxx
Xxxxxxxxx
xxxxxx
xxxxxxxxxx
Xxxxx
Xxxxxx
Xxxxxxxxxx
Xxxxxx
Xxxxxxxxx
xxxxxxxxxx Xxxxxxxxx xxxxxx xxxxxxxxxx Xxxxx Xxxxxx Xxxxxxxxxx Xxxxxx Xxxxxxxxx
Xxxxxxxxxx xxxxxxxxx Xxxxxxxxx xxxxxxxxx Xxxxxxx xxxxxx Xxxxx xxxxxxxxxx xxxxxx
Xxxxxxxxxx xxxxxx Xxxxx Xxxxxx xxxxx xxxxxxxx xxxxxxx Xxxxx Xxxxxxxx
xxxxxxxxxx xxxxxx Xxxxxxxxx xxxxxx Xxxxxxxxx Xxxxxxxxx xxxxxxxxxx Xxxxxx Xxxxx
xxxxxx xxxxxxx xxxxxxx Xxxxx xxxxxx Xxxxxxxxxx xxxxxxxx xxxxxx Xxxxx
Xxxxxxx xxxxxx Xxxxxxxx Xxxxxxx Xxxxx xxxxxx xxxxxxxxxx Xxxxx xxxxxxxxxx
xxxxxxxxx Xxxxxxx xxxxxxxx xxxxxxxx Xxxxxxxxxx Xxxxxxxx Xxxxxxxx xxxxxxxxx Xxxxxxxxxx
Xxxxxx Xxxxxxxxx xxxxx xxxxxxx xxxxxxxxx Xxxxxx Xxxxxxx Xxxxxxxxx xxxxxxxxx
xxxxxxxxx Xxxxx xxxxxxxx Xxxxxxx xxxxxxxxx Xxxxxxx xxxxx Xxxxxxx xxxxxxx
Xxxxx xxxxxxxxxx Xxxxxxx Xxxxx xxxxxxxxxx Xxxxxx xxxxxx Xxxxxxxxx xxxxx
Volume
766K
Senses
Avail. Formats
.json, .xml, and .csv
File
Coverage
7
Countries

Description

Comprehensive Chinese language datasets with linguistic annotations, including headwords, definitions, word senses, usage examples, part-of-speech (POS) tags, semantic metadata, and contextual usage details. Covering Simplified and Traditional writing systems.
Our Chinese language datasets are carefully compiled and annotated by language and linguistic experts. The below datasets are available for license: 1. Mandarin Chinese (simplified) Monolingual Dictionary Data 2. Mandarin Chinese (traditional) Monolingual Dictionary Data 3. Cantonese Chinese Monolingual Dictionary Data 4. Mandarin Chinese (simplified) Bilingual Dictionary Data 5. Mandarin Chinese (traditional) Bilingual Dictionary Data 6. Mandarin Chinese (simplified) Synonyms and Antonyms Data Key Features (approximate numbers): 1. Mandarin Chinese (simplified) Monolingual Dictionary Data Our Mandarin Chinese (simplified) monolingual features clear definitions, headwords, examples, and comprehensive coverage of the Mandarin Chinese language spoken today. - Headwords: 81,300 - Senses: 62,400 - Sentence examples: 80,700 - Format: XML and JSON formats - Delivery: Email (link-based file sharing) and REST API 2. Mandarin Chinese (traditional) Monolingual Dictionary Data Our Mandarin Chinese (traditional) monolingual features clear definitions, headwords, examples, and comprehensive coverage of the Mandarin Chinese language spoken today. - Headwords: 60,100 - Senses: 144,700 - Sentence examples: 29,900 - Format: XML format - Delivery: Email (link-based file sharing) 3. Cantonese Chinese Monolingual Dictionary Data Our Cantonese Chinese monolingual features clear definitions, headwords, examples, and comprehensive coverage of the Cantonese Chinese language spoken today. - Headwords: 37,700 - Senses: 51,400 - Sentence examples: 53,400 - Format: XML format - Delivery: Email (link-based file sharing) 4. Mandarin Chinese (simplified) Bilingual Dictionary Data The bilingual data provides translations in both directions, from English to Mandarin Chinese (simplified) and from Mandarin Chinese (simplified) to English. It is annually reviewed and updated by our in-house team of language experts. Offers comprehensive coverage of the language, providing a substantial volume of translated words of excellent quality. - Translations: 367,600 - Senses: 204,500 - Translation sentences: 150,900 - Format: XML and JSON formats - Delivery: Email (link-based file sharing) and REST API - Updated frequency: annually 5. Mandarin Chinese (traditional) Bilingual Dictionary Data The bilingual data provides translations in both directions, from English to Mandarin Chinese (traditional) and from Mandarin Chinese (traditional) to English. It is annually reviewed and updated by our in-house team of language experts. Offers comprehensive coverage of the language, providing a substantial volume of translated words of excellent quality. - Translations: 215,600 - Senses: 202,800 - Sentence examples: 149,700 - Format: XML format - Delivery: Email (link-based file sharing) 6. Mandarin Chinese (simplified) Synonyms and Antonyms Data The Mandarin Chinese (simplified) Synonyms and Antonyms Dataset is a leading resource offering comprehensive, up-to-date coverage of word relationships in contemporary Mandarin Chinese. It includes rich linguistic detail such as precise definitions and part-of-speech (POS) tags, making it an essential asset for developing AI systems and language technologies that require deep semantic understanding. - Synonyms: 3,800 - Antonyms: 3,180 - Format: XML format - Delivery: Email (link-based file sharing) Use Cases: We consistently work with our clients on new use cases as language technology continues to evolve. These include NLP applications, TTS, dictionary display tools, games, translation, word embedding, and word sense disambiguation (WSD). If you have a specific use case in mind that isn't listed here, we’d be happy to explore it with you. Don’t hesitate to get in touch with us at Growth.OL@oup.com to start the conversation. Pricing: Oxford Languages offers flexible pricing based on use case and delivery format. Our datasets are licensed via term-based IP agreements and tiered pricing for API-delivered data. Whether you’re integrating into a product, training an LLM, or building custom NLP solutions, we tailor licensing to your specific needs. Contact our team or email us at Growth.OL@oup.com to explore pricing options and discover how our language data can support your goals. Please note that some datasets may have rights restrictions. Contact us for more information. About the sample: The sample above offers a brief overview of one or two language datasets (monolingual or/and bilingual dictionary data). If you would like the complete original sample or detailed information about any of the datasets mentioned, please contact us (Growth.OL@oup.com).

Country Coverage

Asia (7)
China
Hong Kong
Indonesia
Macao
Malaysia
Singapore
Taiwan

Volume

179,000 Words
583,000 Translations
766,000 Senses
151,000 Translation sentences
314,000 Sentence examples
3,800 Synonyms

Pricing

License Starts at
One-off purchase Available
Monthly License Not available
Yearly License Available
Usage-based Available

Suitable Company Sizes

Small Business
Medium-sized Business
Enterprise

Delivery

Methods
Email
REST API
Frequency
yearly
Format
.json
.xml
.csv
.txt

Use Cases

Artificial Intelligence (AI)
Machine Learning (ML)
Gaming
LLM Training

Categories

Related Searches

Related Products

Frequently asked questions

What is Chinese Language Datasets 583KTranslations 178K Words NLP Dictionary Display Translations Data APAC coverage Mandarin Cantonese?

Comprehensive Chinese language datasets with linguistic annotations, including headwords, definitions, word senses, usage examples, part-of-speech (POS) tags, semantic metadata, and contextual usage details. Covering Simplified and Traditional writing systems.

What is Chinese Language Datasets 583KTranslations 178K Words NLP Dictionary Display Translations Data APAC coverage Mandarin Cantonese used for?

This product has 4 key use cases. Oxford Languages recommends using the data for Artificial Intelligence (AI), Machine Learning (ML), Gaming, and LLM Training. Global businesses and organizations buy Natural Language Processing (NLP) Data from Oxford Languages to fuel their analytics and enrichment.

Who can use Chinese Language Datasets 583KTranslations 178K Words NLP Dictionary Display Translations Data APAC coverage Mandarin Cantonese?

This product is best suited if you’re a Small Business, Medium-sized Business, or Enterprise looking for Natural Language Processing (NLP) Data. Get in touch with Oxford Languages to see what their data can do for your business and find out which integrations they provide.

Which countries does Chinese Language Datasets 583KTranslations 178K Words NLP Dictionary Display Translations Data APAC coverage Mandarin Cantonese cover?

This product includes data covering 7 countries like China, Indonesia, Hong Kong, Singapore, and Malaysia. Oxford Languages is headquartered in United Kingdom.

How much does Chinese Language Datasets 583KTranslations 178K Words NLP Dictionary Display Translations Data APAC coverage Mandarin Cantonese cost?

Pricing information for Chinese Language Datasets 583KTranslations 178K Words NLP Dictionary Display Translations Data APAC coverage Mandarin Cantonese is available by getting in contact with Oxford Languages. Connect with Oxford Languages to get a quote and arrange custom pricing models based on your data requirements.

How can I get Chinese Language Datasets 583KTranslations 178K Words NLP Dictionary Display Translations Data APAC coverage Mandarin Cantonese?

Businesses can buy Natural Language Processing (NLP) Data from Oxford Languages and get the data via Email and REST API. Depending on your data requirements and subscription budget, Oxford Languages can deliver this product in .json, .xml, .csv, and .txt format.

What is the data quality of Chinese Language Datasets 583KTranslations 178K Words NLP Dictionary Display Translations Data APAC coverage Mandarin Cantonese?

You can compare and assess the data quality of Oxford Languages using Datarade’s data marketplace.

What are similar products to Chinese Language Datasets 583KTranslations 178K Words NLP Dictionary Display Translations Data APAC coverage Mandarin Cantonese?

This product has 3 related products. These alternatives include Portuguese Language Datasets 300K Translations Natural Language Processing (NLP) Data Dictionary Display Translation EU & LATAM Coverage, Parallel Corpus Data 200 Million Pairs Machine Translation Data Natural Language Processing Data Translation Data, and Machine Learning (ML) Data 800M+ B2B Profiles AI-Ready for Deep Learning (DL), NLP & LLM Training. You can compare the best Natural Language Processing (NLP) Data providers and products via Datarade’s data marketplace and get the right data for your use case.

Pricing available upon request
License Starts at
One-off purchase Available
Monthly License Not available
Yearly License Available
Usage-based Available

Oxford Languages

The home of expertly curated language data

Verified provider icon Verified Provider
1h Avg. response time
100% Response rate

Trusted by

Customer Logo #1 of Oxford Languages
Customer Logo #2 of Oxford Languages
Customer Logo #3 of Oxford Languages
Promoted

Sync this data product to your data warehouse - no code

Monda makes it easy to access data products from any source and sync them to your preferred data warehouse.