TAUS Language Translation Data | Parallel translation for Colloquial English into various languages for Machine Learning product image in hero

TAUS Language Translation Data | Parallel translation for Colloquial English into various languages for Machine Learning

TAUS
Start icon5.0(1)Badge iconVerified Data Provider
#
English - Assamese
English - Urdu
English - Tamil
English - Hind
English - Nepali
English - Turkish
English - Pashto
English - Sorani
English - Bengali
English - Burmese
English - Telugu
English - Sinhalese
English - Dari
English - Punjabi (Pakistan)
English - Kurmanji (lat)
English - Kurmanji (arab)
1 xxxxxxxxxx Xxxxxxxxx xxxxxx xxxxxxxxxx Xxxxx Xxxxxx Xxxxxxxxxx Xxxxxx Xxxxxxxxx Xxxxxxxxxx xxxxxxxxx Xxxxxxxxx xxxxxxxxx Xxxxxxx xxxxxx Xxxxx
2 xxxxxxxxxx xxxxxx Xxxxxxxxxx xxxxxx Xxxxx Xxxxxx xxxxx xxxxxxxx xxxxxxx Xxxxx Xxxxxxxx xxxxxxxxxx xxxxxx Xxxxxxxxx xxxxxx Xxxxxxxxx
3 Xxxxxxxxx xxxxxxxxxx Xxxxxx Xxxxx xxxxxx xxxxxxx xxxxxxx Xxxxx xxxxxx Xxxxxxxxxx xxxxxxxx xxxxxx Xxxxx Xxxxxxx xxxxxx Xxxxxxxx
4 Xxxxxxx Xxxxx xxxxxx xxxxxxxxxx Xxxxx xxxxxxxxxx xxxxxxxxx Xxxxxxx xxxxxxxx xxxxxxxx Xxxxxxxxxx Xxxxxxxx Xxxxxxxx xxxxxxxxx Xxxxxxxxxx Xxxxxx
5 Xxxxxxxxx xxxxx xxxxxxx xxxxxxxxx Xxxxxx Xxxxxxx Xxxxxxxxx xxxxxxxxx xxxxxxxxx Xxxxx xxxxxxxx Xxxxxxx xxxxxxxxx Xxxxxxx xxxxx Xxxxxxx
6 xxxxxxx Xxxxx xxxxxxxxxx Xxxxxxx Xxxxx xxxxxxxxxx Xxxxxx xxxxxx Xxxxxxxxx xxxxx Xxxxxxxxxx xxxxxx xxxxx xxxxxxxx Xxxxxx Xxxxxxxxxx
7 xxxxxxxxx Xxxxxxxxxx xxxxxxxx xxxxx Xxxxxx xxxxxxxxxx xxxxxxxxx xxxxx xxxxx xxxxxxxx xxxxxx Xxxxxxxxxx xxxxxxxxxx Xxxxx xxxxxxx Xxxxxxxx
8 Xxxxxxx xxxxx xxxxxxxx xxxxxxxxxx Xxxxxx xxxxxxxxx Xxxxx xxxxx xxxxxxxxx xxxxxxx Xxxxxxxxx Xxxxxxx xxxxxxxxxx Xxxxx xxxxxxxxx xxxxxxx
9 Xxxxxx xxxxxxxxx xxxxx Xxxxxxx xxxxxxxxx Xxxxxxxx xxxxxxxx Xxxxxxxx Xxxxxxxx xxxxxxxx xxxxxxxxx Xxxxxxx Xxxxxxxxx xxxxxxxx xxxxx Xxxxxxxxxx
10 xxxxxxxxxx xxxxxx Xxxxx Xxxxxxx Xxxxx Xxxxxx Xxxxx Xxxxxxxxx xxxxxx xxxxxxxx Xxxxxxxxx Xxxxxx Xxxxxxxxxx Xxxxxx Xxxxx Xxxxxxx
... xxxxxxxxx Xxxxx xxxxx Xxxxxx xxxxxxxxx xxxxxxx xxxxxxxxx Xxxxxxxxxx xxxxxxxxx Xxxxx Xxxxx Xxxxxxxxx xxxxxxxxxx xxxxxx xxxxxxxxx xxxxxxx
Request Data Sample
Volume
1M
words per language pair
Data Quality
100%
words
Avail. Formats
.xml, .csv, and .xls
File
Coverage
15
Countries
History
7
months

Data Dictionary

Product Attributes
Attribute Type Example Mapping
English - Assamese
I just had the most wonderful dream. - মই এইমাত্ৰ আটাইতক...
English - Urdu
It's really cool. - Thanks. یہ سچ میں اچھا ہے ، شکریہ ۔
English - Tamil
I always wanted to try one. நான் எப்போதும் இதை முயற்சிக்க...
English - Hind
Tom announced that he was quitting football. टॉम ने घोषणा...
English - Nepali
It's in the history books. त्यो इतिहासका किताबमा छ ।
English - Turkish
It was a really bad day. Gerçekten çok kötü bir gündü.
English - Pashto
Do I need to shout it out in the airport? ايا زه اړتيا ل...
English - Sorani
Always in front of TV. هەمیشە لەبەردەم تەلەفزیۆندا
English - Bengali
Thank you. That looks lovely. ধন্যবাদ। এটা খুব সুন্দর দেখ...
English - Burmese
I am thinking about the future of? ငါအနာဂတ််အကြောင်းကိုတွ...
English - Telugu
I got a feeling I'd like to go down there right now. నాకు...
English - Sinhalese
You showed me your wedding ring. నీ పెళ్లి నాటి ఉంగరాన్ని...
English - Dari
But my arrangements sound like some cheesy Saturday morni...
English - Punjabi (Pakistan)
Our report calls for the latter. ساہڈی رپورٹ بعد الذکر دا...
English - Kurmanji (lat)
One day I'd like to fly. Ez dixwazim rojekê bifirim.
English - Kurmanji (arab)
Not so long ago, few people had heard of the Internet. هە...

Description

A carefully selected part of the colloquial corpus has been translated and reviewed by native speakers in many long-tail languages, to get the highest-quality customized set for your MT training.
The corpus is a great fit for training chat bots or social media content, and will give the conversation with your local audience a friendly, casual tone. From product user reviews and blog post comments to everyday business small talk, your MT engine will be able to handle even the most creative user voices. This corpus contains over 1 million words, and a total vocabulary of more than 37000 different words. Need more data? In the following months, TAUS will release more equally sized corpora for the same domain and language combinations, with a significant increase of vocabulary. English - Hindi English - Urdu English - Tamil English - Nepali English - Turkish English - Pashto English - Sorani English - Bengali English - Burmese English - Assamese English - Telugu English - Sinhalese English - Dari English - Punjabi (Pakistan) English - Punjabi (India) English - Lao English - Kurmanji (lat) English - Kurmanji (arab) Other languages are available on demand.

Geography

Asia (13)
Bangladesh
India
Indonesia
Iran (Islamic Republic of)
Iraq
Lao People's Democratic Republic
Myanmar
Nepal
Pakistan
Sri Lanka
Timor-Leste
Turkey
Vietnam
Europe (1)
United Kingdom
North America (1)
United States of America

History

7 months of historical data

Volume

1 million words per language pair
37,000 unique words

Pricing

Free sample available
License Starts at
One-off purchase
€100,000 / purchase
Monthly License Not available
Yearly License Not available
Usage-based Not available

Suitable Company Sizes

Small Business
Medium-sized Business
Enterprise

Quality

Self-reported by the provider
100%
words

Delivery

Methods
Email
Frequency
on-demand
Format
.xml
.csv
.xls
.txt

Use Cases

Artificial Intelligence (AI)
Machine Learning (ML)
Custom Machine Translation Engine Training
Base Machine Translation Engine Traiing

Categories

Related Searches

Related Products

200 million pairs
90% Accuracy
129 countries covered
Off-the-shelf parallel corpus data(Translation Data) covers many fields including spoken language, traveling, medical treatment,news, and finance. Data clean...
123M Target words in total
100% words
19 countries covered
These corpora are the result of a collective industry charity effort where participants contributed their own translation memories covering this domain so th...
600 Hours of Recording
64 countries covered
We offer a comprehensive collection of audio data, amounting to over 600 hours of high-quality recordings. Our audio datasets are meticulously curated and de...
399M records
249 countries covered
40 months of historical data
Job Postings Data is your guide to the job market. With Coresignal's job posting datasets or Jobs API, you can access millions of new and historical job post...

Frequently asked questions

What is TAUS Language Translation Data Parallel translation for Colloquial English into various languages for Machine Learning?

A carefully selected part of the colloquial corpus has been translated and reviewed by native speakers in many long-tail languages, to get the highest-quality customized set for your MT training.

What is TAUS Language Translation Data Parallel translation for Colloquial English into various languages for Machine Learning used for?

This product has 4 key use cases. TAUS recommends using the data for Artificial Intelligence (AI), Machine Learning (ML), Custom Machine Translation Engine Training, and Base Machine Translation Engine Traiing. Global businesses and organizations buy AI & ML Training Data from TAUS to fuel their analytics and enrichment.

Who can use TAUS Language Translation Data Parallel translation for Colloquial English into various languages for Machine Learning?

This product is best suited if you’re a Enterprise looking for AI & ML Training Data. Get in touch with TAUS to see what their data can do for your business and find out which integrations they provide.

How far back does the data in TAUS Language Translation Data Parallel translation for Colloquial English into various languages for Machine Learning go?

This Tabular Data has 7 months of historical coverage. It can be delivered on a on-demand basis.

Which countries does TAUS Language Translation Data Parallel translation for Colloquial English into various languages for Machine Learning cover?

This product includes data covering 15 countries like USA, India, United Kingdom, Indonesia, and Turkey. TAUS is headquartered in Netherlands.

How much does TAUS Language Translation Data Parallel translation for Colloquial English into various languages for Machine Learning cost?

Pricing for TAUS Language Translation Data Parallel translation for Colloquial English into various languages for Machine Learning starts at EUR100,000 per purchase. Connect with TAUS to get a quote and arrange custom pricing models based on your data requirements.

How can I get TAUS Language Translation Data Parallel translation for Colloquial English into various languages for Machine Learning?

Businesses can buy AI & ML Training Data from TAUS and get the data via Email. Depending on your data requirements and subscription budget, TAUS can deliver this product in .xml, .csv, .xls, and .txt format.

What is the data quality of TAUS Language Translation Data Parallel translation for Colloquial English into various languages for Machine Learning?

TAUS has reported that this product has the following quality and accuracy assurances: 100% words. You can compare and assess the data quality of TAUS using Datarade’s data marketplace. TAUS has received 1 review from clients.

What are similar products to TAUS Language Translation Data Parallel translation for Colloquial English into various languages for Machine Learning?

This Tabular Data has 3 related products. These alternatives include Nexdata Multilingual Parallel Corpus Data 200 Million Pair Text AI & ML Training Data Natural Language Processing Data Translation Data, TAUS Language Translation Data Parallel translation for Covid-19, Medical and Healthcare, various languages for Machine Learning, and WebAutomation Off the Shelf Datasets Audio Data for AI & ML Training 600+ Hours of Recording Speech Recognition, Natural Language Processing. You can compare the best AI & ML Training Data providers and products via Datarade’s data marketplace and get the right data for your use case.

Starts at
€100,000 / purchase
License Starts at
One-off purchase
€100,000 / purchase
Monthly License Not available
Yearly License Not available
Usage-based Not available