Best Textual datasets & Databases

Easily explore, compare & preview top Textual datasets via Datarade.
50+ Results

FileMarket | Text Recognition Data | 50,000 Images | Computer Vision Data | AI Model Training Data | Textual data | Annotated Imagery Data

This dataset is part of our extensive offerings, which also include Textual Data, Object Detection Data ... , Large Language Model (LLM) Data, and Deep Learning (DL) Data.
Available for 160 countries
50K images
97% accuracy
Pricing available upon request
Free sample preview

Bitext | AI Training Data | Textual Data | 9 Languages for Synthetic Text Data | 100% Utterances Semantically Equivalent | 20 Verticals Covered

by bitext
trusted source for top-tier Textual Data. ... Enhance your AI models with Bitext’s comprehensive Textual Data and access high-quality data with 100%
Available for 249 countries
9 Languages
100% Utterances Semantically Equivalent
Available Pricing:
One-off purchase
Monthly License
Yearly License
Free sample preview
4.9(2)

Factori AI & ML Training Data | Consumer Data | USA | Machine Learning Data

by Factori
data is gathered and aggregated via surveys, digital services, and public data sources. ... Our comprehensive data enrichment solution includes a variety of data sets that can help you address
Available for 1 countries
300 + Million Profiles
1 years of historical data
97% fill rate
Starts at
$360,000 / year
Free sample preview
4.8(12)

Coresignal | Clean Data | Company Data | AI-Enriched Datasets | Global / 35M+ Records / Updated Weekly

Clean data is an excellent data solution for companies with limited data engineering capabilities and ... It’s an excellent data solution for companies with limited data engineering capabilities and those who
Available for 248 countries
35 million records
Available Pricing:
One-off purchase
Monthly License
Yearly License
Usage-based
Free sample preview

FileMarket | 20,000 Voice Memos | Multilingual Training Data for Conversational AI | Machine Learning (ML) Data

Whether you require Transcription Data, Machine Learning (ML) Data, Large Language Model (LLM) Data, ... Deep Learning (DL) Data, or Audio Data, we are equipped to provide comprehensive solutions that align
Available for 240 countries
20K voice memos
Pricing available upon request
Free sample preview
5.0(1)

WebAutomation Off the Shelf Datasets | Audio Data for AI & ML Training | 600+ Hours of Recording | Speech Recognition, Natural Language Processing

We offer a comprehensive collection of audio data, amounting to over 600 hours of high-quality recordings ... Key Features of Our Audio Data Datasets: Vast Collection: Our repository consists of over 600 hours
Available for 64 countries
600 Hours of Recording
Pricing available upon request

Nexdata | Multilingual Parallel Corpus Data | 200 Million Pairs | Text AI Training Data | Natural Language Processing Data | Translation Data

by Nexdata
Specifications Storage format : TXT Data content : Parallel Corpus Data Data size : 200 million pairs ... Off-the-shelf parallel corpus data (Translation Data) covers many fields including spoken language, traveling
Available for 129 countries
200 million pairs
10 years of historical data
90% Accuracy
Starts at
$5,000 / purchase
Free sample preview

SMS Contextual Consumer Behavior Data / USA / 2,000+Segments / 500MM MAIDS / Monthly Updates / Digital Advertising

The SMS contextual audience data set are built from 1st party publishers created by grouping users who
Available for 1 countries
2.2K audience segments
1 months of historical data
100% deterministic
Available Pricing:
Monthly License
Yearly License
Free sample preview
10% Datarade discount
4.9(7)

AI & ML Training Data | Artificial Intelligence (AI) | Machine Learning (ML) Datasets | Deep Learning Datasets | Easy to Integrate | Free Sample

[Related tags:AI Training Data, Textual data, Machine Learning (ML) Data, Deep Learning (DL) Data, ... Annotated Imagery Data, Synthetic Data, Audio Data, Large Language Model (LLM) Data,ML Training Data,
Available for 61 countries
50M Records
1 days of historical data
100% Data Coverage
Starts at
$25 / month

Automaton AI Data labeling services

Services we provide: Data collection & sourcing Data cleaning Data mining Data labeling ... We have developed our custom inbuilt data-labeling tool which reduces the cost of data-labeling by at
Available for 240 countries
Available Pricing:
One-off purchase
Monthly License
Yearly License
Usage-based

Monetize data on Datarade Marketplace

List your data on our global B2B marketplace to reach 100k monthly buyers

More Textual data Products

Discover related textual data products.

4.5M images
100% Image attachment
249 countries covered
Wirestock's AI/ML Image Training Data, 4.5M Files with Metadata: This data product offers a vast collection of images and associated metadata, ideal for trai...
1K hour per month
99.5% word accuracy
136 countries covered
Nexdata provides multi-language, multi-timbre, multi-domain and multi-style speech synthesis data collection servicesfor Deep Learning Data.
10K recordings
95% accuracy
64 countries covered
Authentic and spoofed faces recorded with different mobile phone cameras, showcasing both men and women, with and without glasses, under indoor and outdoor l...
50 Hours
99% Accurate
South Africa covered
50 hours of simulated, unscripted agent-caller dialogue. Domains include: Insurance, Retail, Debt Collection, Travel. 50+ participants from KwaZulu-Natal, ...
15K Hours
98% sentence/word
61 countries covered
The Natural Language Processing (NLP) Data of in-car speech covers 20+ languages, including read, wake-up word, commend word, code-swithing, multimodal and n...
598M records
249 countries covered
Clean Data is an excellent solution for companies with limited information engineering capabilities and those who want to reduce time to value. Dataset consi...
20K Hours of Audio
95% Match Rate
215 countries covered
We help the client source, curate, & transcribe the right set of data required to train AI/ML model, with utmost precision. We offered audio data collection ...
50K Hours
98% sentence/word
21 countries covered
The recorded text is a mixture multi-language sentences, covering general scenes and human-computer interaction scenes. The audio data is rich in content and...
50M Records
100% Data Coverage
61 countries covered
APISCRAPY's AI & ML training data is meticulously curated and labelled to ensure the best quality. Our training data comes from a variety of areas, including...
50 TB per month
98% accuracy
137 countries covered
Nexdata provides high-quality Natural Language Processing (NLP) Data annotation for text cleaning, entity tagging, named entity tagging, text classification ...
50 TB of text data
98% accuracy
141 countries covered
For the high-quality training data required in unsupervised learning and supervised learning, Nexdata provides flexible and customized Large Language Model(L...
598M records
249 countries covered
Clean Data is an excellent solution for companies with limited information engineering capabilities and those who want to reduce time to value. Dataset consi...
26M records
249 countries covered
45 months of historical data
Easily find and get job postings from any industry and location. Job postings API allows you to use a wide selection of filters to discover job listings you'...
240 countries covered
At Bitext, we offer advanced linguistic tools designed for automated pre-labeling of datasets to help scale Data Annotation and Labeling (DAL) projects.
2.2K audience segments
100% deterministic
USA covered
SMS offers over 2,000 contextual-based audiences comprised of users categorized based on the context of the content they are engaging with and other online c...
350K calls per month
63 countries covered
1 years of historical data
Access a vast collection of transcribed customer call records tailored to your needs. Ideal for in-depth analysis of customer interactions and behavior trend...
10 hours
Bulgaria covered
Fourth dataset of 10 hours of Bulgarian dialogue (two people, separate tracks) about general topics. The dataset is high quality with no noise and high-quali...
20 hours
Bulgaria covered
The third dataset of 20 hours of Bulgarian dialogue (two people, separate tracks) about general topics. The dataset is high quality with no noise and high-qu...