TAUS

5.0(1)Badge iconVerified Data Provider
Contact Provider

Optimized for quick response

-
Clean and High-quality Data
-
Community of Data Workers
-
On-demand Data Production
-
15+ Years of Experience
Explore Data Products
View Data Pricing
Read Data Reviews
Learn more

TAUS Data Products: APIs & Datasets

Explore TAUS’s datasets, databases, and data feeds.
Reliable product descriptions and information are a crucial asset in any e-commerce environment. In these corpora you'll find carefully f...
1M words per language pair
100% words
11 countries covered
1 years of historical data
A carefully selected part of the colloquial corpus has been translated and reviewed by native speakers in many long-tail languages, to ge...
1M words per language pair
100% words
15 countries covered
7 months of historical data
High fidelity MT training data is always important, even more so when it comes to medical subjects. This is a must-have corpus for anyone...
3M Million words per language
100% words
6 countries covered
1 years of historical data
These corpora are the result of a collective industry charity effort where participants contributed their own translation memories coveri...
123M Target words in total
100% words
19 countries covered
1 years of historical data
When settling an agreement, there should be no doubt about the conditions and mutual obligations. Contracts and agreements are subject to...
5M Million words per language
100% words
7 countries covered
1 years of historical data

TAUS Pricing & Cost

Learn about TAUS’s prices, subscription cost, and API pricing.

We set our prices based on factors such as the rarity of the language pair, the locale the data is requested from, the volume of the data requested, and so on. Contact us to describe your data needs to get a pricing quote.

TAUS’s APIs and datasets range in cost from €5,000 / purchase to €100,000 / purchase. TAUS offers free samples for individual data requirements. Get talking to a member of the TAUS team to receive custom pricing options, information about data subscription fees, and quotes for TAUS’s data offering tailored to your use case.

TAUS Reviews

Read authentic reviews about TAUS from your peers.
5.0 (1 reviews)
AN
Miloš Milovanović
TAUS
5.0
Data Quality
Data Volume
Value for Money
Customer Service
“Oracle International Product Solutions has worked with TAUS on a joint pilot project to enable data discovery within TAUS's Data Cloud corpora. The process consisted in Oracle IPS supplying TAUS with a sample of approximately 30K English strings, representing content that is aligned to Oracle projects. TAUS used the sample to explore Data Cloud for similarity & proximity, across 5 languages, and reverted back with three categories of data output, with score ranges on similarity and proximity. Oracle IPS then performed a linguistic assessment of this output. Our in-depth linguistic review rendered positive results and the content supplied by TAUS was of good quality, appropriate to consume as aligned corpora to that supplied in the Oracle sample with an average score of 84% for across the 5 languages. Oracle IPS will continue to work with TAUS to assess the effect that consuming this discovered corpora will have on engine quality. We look forward to having data search and discovery features on Data Cloud, whereby a user is capable of discovering their own project aligned content as a consumable self-service. We believe this will allow TAUS and its members to drive increased value from the TAUS data assets and in turn will likely continue to fuel growth in the pool of data and value-add services.”

Your Review

There are still only a few reviews and ratings for TAUS at the moment. Have you worked with TAUS? You can help other data professionals better understand TAUS’s data products and services by leaving a review now.

Data Quality
Data Volume
Value for Money
Customer Service
Minimum 200 characters

TAUS Competitors & Alternatives

Find data providers that are similar to TAUS.
Webautomation
United Kingdom
Webautomation offers a marketplace of hundreds of pre-built extractors and ready-to-use datasets to start collecting web data in minutes. Collect millions of data points from e-commerce sites, soci...
APISCRAPY
USA
APISCRAPY is an AI-driven web scraping & automation platform that converts any web data into ready-to-use data. The platform is capable to extract data from websites, process data, automate workflo...
Nexdata
USA
Founded in 2011, Nexdata has grown to be a globally renowned AI training data service company. Nexdata owns an extensive library of off-the-shelf datasets and provides flexible data collection, ann...
PromptCloud
USA
PromptCloud, a global leader in web data extraction, delivers tailored data solutions to fuel your business. Our services include Custom Web Crawling, Hosted Indexing & Live Crawls, catering to sec...
View more alternatives

About TAUS

Learn more about TAUS’s data sources, use cases, and integrations.

TAUS in a Nutshell

TAUS was founded in 2005 as a think tank with a mission to automate and innovate translation. Ideas transformed into actions. TAUS became the language data network offering the largest industry-shared repository of data, deep know-how in language engineering and a network of data contributors and annotators around the globe. Our mission today is to empower global enterprises and their service and technology providers with data solutions that help them to communicate in all languages,

Headquarters
Netherlands

Country Coverage

Asia (17)
Bangladesh
China
India
Indonesia
Iran (Islamic Republic of)
Iraq
Japan
Korea (Republic of)
Lao People's Democratic Republic
Myanmar
Nepal
Pakistan
Sri Lanka
Taiwan
Timor-Leste
Turkey
Vietnam
Europe (13)
Czech Republic
Denmark
Estonia
France
Germany
Italy
Netherlands
Poland
Portugal
Russian Federation
Spain
Sweden
United Kingdom
North America (2)
Canada
United States of America
Oceania (1)
Australia
South America (1)
Brazil

Data Offering

TAUS offers text, speech, and image data. You buy domain-specific and crowdsourced datasets from the Data Library or buy and sell on the new Data Marketplace. TAUS also offers cleaning, anonymization, quality review, annotation, and other data preparation services performed by our own global community of data contributors and annotators.

Use Cases

Our use cases include:
Bilingual or monolingual language data collection in any given domain and language pair.
Low-resource language data generation
NER (Named Entity Recognition) Tagging
Text, speech or image data annotation
Speech, text or image data collection
Domain-specific dataset generation based on client’s own sample dataset

Artificial Intelligence (AI)
Machine Learning (ML)

Data Sources & Collection

TAUS has several sources of data:
1- A large repository of legacy data coming from TAUS member uploads with more than 35B words in over 600+ language pairs.
2- Data Marketplace: a language data monetization and acquisition platform for trading data.
3- Data Library: Tailor-made datasets matching the sample data you provide. You can also buy from the library of read-made corpora.
4- HLP Community: Data generation and annotation in the requested locales by our community of data contributors.

Key Differentiators

TAUS has 15+ years of experience in thought-leading and innovation in the language data space. We have a proven ability to mobilize a big community of data contributors and annotators all around the globe. With an in-house NLP Team we are able to match your sample dataset to create you a tailor-made one. Next to our vast library of ready-made datasets, we are happy to provide on-demand data solutions for your projects.

Data Privacy

The TAUS IT environment has been implemented with IT security and data protection as the highest priority. Our infrastructure is hosted on AWS which is fully GDPR compliant. We have performed final revisions and corrections in order to deliver a complete IT Security Framework. This framework is based on “General IT security” policies and best practices which include also the GDPR parts of Personal Data protection and processing limitations.

GDPR compliant

What are you looking for?

Frequently asked questions about TAUS

What does TAUS do?

TAUS is the language data network offering the largest repository of language data, deep know-how in language engineering, and a network of data contributors around the globe. Our mission is to empower global enterprises and their technology providers with data solutions.

How much does TAUS cost?

TAUS’s APIs and datasets range in cost from €5,000 / purchase to €100,000 / purchase. TAUS offers free samples for individual data requirements. Get talking to a member of the TAUS team to receive custom pricing options, information about data subscription fees, and quotes for TAUS’s data offering tailored to your use case.

What kind of data does TAUS have?

AI Training Data, Ecommerce Data, Court Data, Natural Language Processing (NLP) Data, and 2 others

What data does TAUS offer?

TAUS offers text, speech, and image data. You buy domain-specific and crowdsourced datasets from the Data Library or buy and sell on the new Data Marketplace. TAUS also offers cleaning, anonymization, quality review, annotation, and other data preparation services performed by our own global community of data contributors and annotators.

How does TAUS collect data?

TAUS has several sources of data: 1- A large repository of legacy data coming from TAUS member uploads with more than 35B words in over 600+ language pairs. 2- Data Marketplace: a language data monetization and acquisition platform for trading data. 3- Data Library: Tailor-made datasets matching the sample data you provide. You can also buy from the library of read-made corpora. 4- HLP Community: Data generation and annotation in the requested locales by our community of data contributors.

What’s TAUS’s data privacy policy?

The TAUS IT environment has been implemented with IT security and data protection as the highest priority. Our infrastructure is hosted on AWS which is fully GDPR compliant. We have performed final revisions and corrections in order to deliver a complete IT Security Framework. This framework is based on “General IT security” policies and best practices which include also the GDPR parts of Personal Data protection and processing limitations.

What are the best use cases for TAUS’s data?

Our use cases include: Bilingual or monolingual language data collection in any given domain and language pair. Low-resource language data generation NER (Named Entity Recognition) Tagging Text, speech or image data annotation Speech, text or image data collection Domain-specific dataset generation based on client’s own sample dataset