TAUS Language Translation Data | Parallel translation for Colloquial English into various languages for Machine Learning product image in hero

TAUS Language Translation Data | Parallel translation for Colloquial English into various languages for Machine Learning

TAUS
5.0(1)Badge iconVerified Data Provider
#
English - Assamese
English - Urdu
English - Tamil
English - Hind
English - Nepali
English - Turkish
English - Pashto
English - Sorani
English - Bengali
English - Burmese
English - Telugu
English - Sinhalese
English - Dari
English - Punjabi (Pakistan)
English - Kurmanji (lat)
English - Kurmanji (arab)
1 xxxxxxxxxx Xxxxxxxxx xxxxxx xxxxxxxxxx Xxxxx Xxxxxx Xxxxxxxxxx Xxxxxx Xxxxxxxxx Xxxxxxxxxx xxxxxxxxx Xxxxxxxxx xxxxxxxxx Xxxxxxx xxxxxx Xxxxx
2 xxxxxxxxxx xxxxxx Xxxxxxxxxx xxxxxx Xxxxx Xxxxxx xxxxx xxxxxxxx xxxxxxx Xxxxx Xxxxxxxx xxxxxxxxxx xxxxxx Xxxxxxxxx xxxxxx Xxxxxxxxx
3 Xxxxxxxxx xxxxxxxxxx Xxxxxx Xxxxx xxxxxx xxxxxxx xxxxxxx Xxxxx xxxxxx Xxxxxxxxxx xxxxxxxx xxxxxx Xxxxx Xxxxxxx xxxxxx Xxxxxxxx
4 Xxxxxxx Xxxxx xxxxxx xxxxxxxxxx Xxxxx xxxxxxxxxx xxxxxxxxx Xxxxxxx xxxxxxxx xxxxxxxx Xxxxxxxxxx Xxxxxxxx Xxxxxxxx xxxxxxxxx Xxxxxxxxxx Xxxxxx
5 Xxxxxxxxx xxxxx xxxxxxx xxxxxxxxx Xxxxxx Xxxxxxx Xxxxxxxxx xxxxxxxxx xxxxxxxxx Xxxxx xxxxxxxx Xxxxxxx xxxxxxxxx Xxxxxxx xxxxx Xxxxxxx
6 xxxxxxx Xxxxx xxxxxxxxxx Xxxxxxx Xxxxx xxxxxxxxxx Xxxxxx xxxxxx Xxxxxxxxx xxxxx Xxxxxxxxxx xxxxxx xxxxx xxxxxxxx Xxxxxx Xxxxxxxxxx
7 xxxxxxxxx Xxxxxxxxxx xxxxxxxx xxxxx Xxxxxx xxxxxxxxxx xxxxxxxxx xxxxx xxxxx xxxxxxxx xxxxxx Xxxxxxxxxx xxxxxxxxxx Xxxxx xxxxxxx Xxxxxxxx
8 Xxxxxxx xxxxx xxxxxxxx xxxxxxxxxx Xxxxxx xxxxxxxxx Xxxxx xxxxx xxxxxxxxx xxxxxxx Xxxxxxxxx Xxxxxxx xxxxxxxxxx Xxxxx xxxxxxxxx xxxxxxx
9 Xxxxxx xxxxxxxxx xxxxx Xxxxxxx xxxxxxxxx Xxxxxxxx xxxxxxxx Xxxxxxxx Xxxxxxxx xxxxxxxx xxxxxxxxx Xxxxxxx Xxxxxxxxx xxxxxxxx xxxxx Xxxxxxxxxx
10 xxxxxxxxxx xxxxxx Xxxxx Xxxxxxx Xxxxx Xxxxxx Xxxxx Xxxxxxxxx xxxxxx xxxxxxxx Xxxxxxxxx Xxxxxx Xxxxxxxxxx Xxxxxx Xxxxx Xxxxxxx
... xxxxxxxxx Xxxxx xxxxx Xxxxxx xxxxxxxxx xxxxxxx xxxxxxxxx Xxxxxxxxxx xxxxxxxxx Xxxxx Xxxxx Xxxxxxxxx xxxxxxxxxx xxxxxx xxxxxxxxx xxxxxxx
Request Data Sample
Volume
1M
words per language pair
Data Quality
100%
words
Avail. Formats
.xml, .csv, and .xls
File
Coverage
15
Countries
History
7
months

Data Dictionary

Product Attributes
Attribute Type Example Mapping
English - Assamese
I just had the most wonderful dream. - মই এইমাত্ৰ আটাইতক...
English - Urdu
It's really cool. - Thanks. یہ سچ میں اچھا ہے ، شکریہ ۔
English - Tamil
I always wanted to try one. நான் எப்போதும் இதை முயற்சிக்க...
English - Hind
Tom announced that he was quitting football. टॉम ने घोषणा...
English - Nepali
It's in the history books. त्यो इतिहासका किताबमा छ ।
English - Turkish
It was a really bad day. Gerçekten çok kötü bir gündü.
English - Pashto
Do I need to shout it out in the airport? ايا زه اړتيا ل...
English - Sorani
Always in front of TV. هەمیشە لەبەردەم تەلەفزیۆندا
English - Bengali
Thank you. That looks lovely. ধন্যবাদ। এটা খুব সুন্দর দেখ...
English - Burmese
I am thinking about the future of? ငါအနာဂတ််အကြောင်းကိုတွ...
English - Telugu
I got a feeling I'd like to go down there right now. నాకు...
English - Sinhalese
You showed me your wedding ring. నీ పెళ్లి నాటి ఉంగరాన్ని...
English - Dari
But my arrangements sound like some cheesy Saturday morni...
English - Punjabi (Pakistan)
Our report calls for the latter. ساہڈی رپورٹ بعد الذکر دا...
English - Kurmanji (lat)
One day I'd like to fly. Ez dixwazim rojekê bifirim.
English - Kurmanji (arab)
Not so long ago, few people had heard of the Internet. هە...

Description

A carefully selected part of the colloquial corpus has been translated and reviewed by native speakers in many long-tail languages, to get the highest-quality customized set for your MT training.
The corpus is a great fit for training chat bots or social media content, and will give the conversation with your local audience a friendly, casual tone. From product user reviews and blog post comments to everyday business small talk, your MT engine will be able to handle even the most creative user voices. This corpus contains over 1 million words, and a total vocabulary of more than 37000 different words. Need more data? In the following months, TAUS will release more equally sized corpora for the same domain and language combinations, with a significant increase of vocabulary. English - Hindi English - Urdu English - Tamil English - Nepali English - Turkish English - Pashto English - Sorani English - Bengali English - Burmese English - Assamese English - Telugu English - Sinhalese English - Dari English - Punjabi (Pakistan) English - Punjabi (India) English - Lao English - Kurmanji (lat) English - Kurmanji (arab) Other languages are available on demand.

Country Coverage

Asia (13)
Bangladesh
India
Indonesia
Iran (Islamic Republic of)
Iraq
Lao People's Democratic Republic
Myanmar
Nepal
Pakistan
Sri Lanka
Timor-Leste
Turkey
Vietnam
Europe (1)
United Kingdom
North America (1)
United States of America

History

7 months of historical data

Volume

1 million words per language pair
37,000 unique words

Pricing

Free sample available
License Starts at
One-off purchase
€100,000 / purchase
Monthly License Not available
Yearly License Not available
Usage-based Not available

Suitable Company Sizes

Small Business
Medium-sized Business
Enterprise

Quality

Self-reported by the provider
100%
words

Delivery

Methods
Email
Frequency
on-demand
Format
.xml
.csv
.xls
.txt

Use Cases

Artificial Intelligence (AI)
Machine Learning (ML)
Custom Machine Translation Engine Training
Base Machine Translation Engine Traiing

Categories

Related Searches

Related Products

200 million pairs
90% Accuracy
109 countries covered
Off-the-shelf parallel corpus data (Translation Data) covers many fields including spoken language, traveling, medical treatment,news, and finance. Data clea...
10K recordings
95% accuracy
64 countries covered
Authentic and spoofed faces recorded with different mobile phone cameras, showcasing both men and women, with and without glasses, under indoor and outdoor l...
730M Individual Profiles
99% Complete and Fully Updated Data
250 countries covered
Xverum’s Machine Learning (ML) data will help you to train LLMs and generative AI with 800M B2B profiles. 100+ attributes, global coverage, and GDPR-complian...
350K calls per month
63 countries covered
1 years of historical data
Access a vast collection of transcribed customer call records tailored to your needs. Ideal for in-depth analysis of customer interactions and behavior trend...

Frequently asked questions

What is TAUS Language Translation Data Parallel translation for Colloquial English into various languages for Machine Learning?

A carefully selected part of the colloquial corpus has been translated and reviewed by native speakers in many long-tail languages, to get the highest-quality customized set for your MT training.

What is TAUS Language Translation Data Parallel translation for Colloquial English into various languages for Machine Learning used for?

This product has 4 key use cases. TAUS recommends using the data for Artificial Intelligence (AI), Machine Learning (ML), Custom Machine Translation Engine Training, and Base Machine Translation Engine Traiing. Global businesses and organizations buy AI Training Data from TAUS to fuel their analytics and enrichment.

Who can use TAUS Language Translation Data Parallel translation for Colloquial English into various languages for Machine Learning?

This product is best suited if you’re a Enterprise looking for AI Training Data. Get in touch with TAUS to see what their data can do for your business and find out which integrations they provide.

How far back does the data in TAUS Language Translation Data Parallel translation for Colloquial English into various languages for Machine Learning go?

This product has 7 months of historical coverage. It can be delivered on a on-demand basis.

Which countries does TAUS Language Translation Data Parallel translation for Colloquial English into various languages for Machine Learning cover?

This product includes data covering 15 countries like USA, India, United Kingdom, Indonesia, and Turkey. TAUS is headquartered in Netherlands.

How much does TAUS Language Translation Data Parallel translation for Colloquial English into various languages for Machine Learning cost?

Pricing for TAUS Language Translation Data Parallel translation for Colloquial English into various languages for Machine Learning starts at EUR100,000 per purchase. Connect with TAUS to get a quote and arrange custom pricing models based on your data requirements.

How can I get TAUS Language Translation Data Parallel translation for Colloquial English into various languages for Machine Learning?

Businesses can buy AI Training Data from TAUS and get the data via Email. Depending on your data requirements and subscription budget, TAUS can deliver this product in .xml, .csv, .xls, and .txt format.

What is the data quality of TAUS Language Translation Data Parallel translation for Colloquial English into various languages for Machine Learning?

TAUS has reported that this product has the following quality and accuracy assurances: 100% words. You can compare and assess the data quality of TAUS using Datarade’s data marketplace. TAUS has received 1 review from clients.

What are similar products to TAUS Language Translation Data Parallel translation for Colloquial English into various languages for Machine Learning?

This product has 3 related products. These alternatives include Nexdata Multilingual Parallel Corpus Data 200 Million Pairs Text AI Training Data Natural Language Processing Data Translation Data, FileMarket Dataset for Face Anti-Spoofing (Videos) in Computer Vision Applications Machine Learning (ML) Data Deep Learning (DL) Data, and AI & ML Training Data 800M Profiles for LLMs, Generative AI, NLP & Predictive Models. You can compare the best AI Training Data providers and products via Datarade’s data marketplace and get the right data for your use case.

Starts at
€100,000 / purchase
License Starts at
One-off purchase
€100,000 / purchase
Monthly License Not available
Yearly License Not available
Usage-based Not available