Let data providers come to you!

Post your request to reach 1240+ data providers and find the best match for your data needs

How it works

Tell us what you need
2-3 mins
Receive proposals
within 24 hours
Connect with providers
Post request now
Post your data request

Best Large Language Model (LLM) Datasets & Databases

Easily explore, compare & preview top Large Language Model (LLM) Datasets via Datarade.
46 Results

Unsupervised Speech Data |1 Million Hours | Spontaneous Speech | LLM | Pre-training |Large Language Model(LLM) Data

by Nexdata
About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of ... speech data and 800TB of Annotated Imagery Data.
Available for 98 countries
1M hours
5 years of historical data
95% Accuracy
Starts at
$20,000 / purchase

FileMarket | 20,000 photos | AI Training Data | Large Language Model (LLM) Data | Machine Learning (ML) Data | Deep Learning (DL) Data |

In addition to LLM Data, we also offer comprehensive datasets across Object Detection Data, Machine Learning ... (ML) Data, Deep Learning (DL) Data, and Biometric Data.
Available for 249 countries
20K photos
95% accuracy
Pricing available upon request
Free sample preview
4.9(2)

TagX | 10000+ Multilingual Image Dataset | Text Detection | Global coverage | LLM data | LLM finetuning

by TagX
Our dataset allows you to train and test your text detection models with diverse scenes and languages ... With this dataset, you can train and evaluate your text detection and recognition models effectively,
Available for 102 countries
10K images
2 years of historical data
Pricing available upon request

Large Language Model (LLM) Data | Machine Learning (ML) Data | AI Training Data (RAG) for 1M+ Global Grocery, Restaurant, and Retail Stores

by MealMe
relevant information. ... This dataset includes highly detailed, structured information such as: Menus: Restaurant menus with
Available for 250 countries
1B Records
1 years of historical data
Pricing available upon request
Free sample preview

Large Language Model (LLM) Noise Data | Noise Complaints + Urban Noise Levels | CCPA, GDPR Compliant | 100% Traceable Consent

Additional metadata includes device information used during the complaint submission. ... Silencio provides the world’s largest human-labeled global noise complaint dataset, including 160,000
Available for 236 countries
160K Records
2 years of historical data
Starts at
$1,500$1,350 / month
Free sample preview
10% Datarade discount
20% revenue share
5.0(2)

Canaria | Salary Data | US | 25M+ Monthly Job Postings & 2 Year Historical | AI-LLM Enhanced Salary Data

However, enhancing the data with AI-LLM models takes time, so salary data is delivered daily to ensure ... , and LLM-based summarization models that condense large chunks of salary data for you.
Available for 1 countries
500M Job Postings
2 years of historical data
97.3% Genuine Job Score
Available Pricing:
One-off purchase
Monthly License
Yearly License
Usage-based
Free sample preview
5.0(1)

Dappier | Breaking News Data | RAG API, LLM Compatible | Real-Time Updates | Unlimited Data

by Dappier
The LLM-agnostic API reduces hallucinations and enhances the reliability of AI outputs with real-time ... Dappier's Breaking News Data API enables AI developers to integrate real-time, high-quality news data
Available for 250 countries
100K News Sources
100% Real time and Up-to-Date
Starts at
$0.30$0.27 / 100 queries
Free sample preview
10% Datarade discount
50% revenue share
5.0(1)

Machine Learning (ML) Data | 800M+ B2B Profiles | AI-Ready for Deep Learning (DL), NLP & LLM Training

by Xverum
How Is the Data Sourced? ... What Makes Our Data Unique?
Available for 250 countries
730M Individual Profiles
3 years of historical data
100% Open Web Data
Starts at
$1,000$900 / month
Free sample preview
10% Datarade discount
4.8(4)

CrawlBee | ML Training Data | LLM Data | Generative AI Data | Code Base Training Data | Healthcare Training Data

the highest quality training data available. ... CrawlBee ML datasets are specially curated and cleansed to provide the highest quality training data
Available for 1 countries
5B records
1 days of historical data
98% accuracy
Pricing available upon request

Image and Video Description Data | 1 PB | Multimodal Data | GenAI | LLM Data | Large Language Model(LLM) Data| AI Datasets

by Nexdata
About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of ... Audio Data and 800TB of Annotated Imagery Data.
Available for 80 countries
1 PB
5 years of historical data
95% Accuracy
Starts at
$20,000 / purchase

Can't find the data you're looking for?

Let data providers come to you by posting your request

Post your request

More Large Language Model (LLM) Data Products

Discover related large language model (llm) data products.
50K records
249 countries covered
Comprehensive dataset of Telegram users' geolocations with IP addresses, fully consented, comprising 50,000 records. Ideal for AI, ML, DL, and LLM training, ...
50 million records
250 countries covered
Our Sotheby's International Realty dataset is specifically designed for AI and ML training, offering premium, structured real estate data from a globally rec...
500M Job Postings
97.3% Genuine Job Score
USA covered
Canaria's Salary Data includes 25M+ monthly US job postings and 2 years of historical data. Our AI & LLM models, verified by humans, provide accurate insight...
35K Hours
98% sentence/word
76 countries covered
Nexdata has off-the-shelf 35,000 hours Machine Learning (ML) Data of 16kHz conversational speech, covering 100+ countries including English, German, French, ...
15K Hours
98% sentence/word
83 countries covered
The Natural Language Processing (NLP) Data of in-car speech covers 20+ languages, including read, wake-up word, commend word, code-swithing, multimodal and n...
35B Data Points
95% Precision
236 countries covered
Combines 10M+ hours of noise data with mobility and POI visitation data. Ideal for AI models combining environmental, mobility, and behavioral signals. CSV o...
730M Individual Profiles
100% Open Web Data
250 countries covered
Xverum’s Machine Learning (ML) data will help you to train LLMs and generative AI with 800M B2B profiles. 100+ attributes, global coverage, and GDPR-complian...
65K Hours
98% sentence/word
103 countries covered
Off-the-shelf Scripted Monologues Speech Datasets cover 100+ languages. All the Machine Learning (ML) Data are collected from native speakers, with signed au...
150M Business Professionals
249 countries covered
1 years of historical data
Fuel your AI and machine learning models with over 15 million companies and 150 million business professionals. Our global contact and company data is ideal ...
1M financial data points daily
100% Real time and Up-to-Date coverage
250 countries covered
Enhance your AI with real-time, LLM-agnostic RAG APIs for real-time market data from the world's leading exchanges like NASDAQ, NYSE, crypto markets & more. ...
1B Records
250 countries covered
1 years of historical data
Comprehensive training data on 1M+ stores across the US & Canada. Includes detailed menus, inventory, pricing, and availability. Ideal for AI/ML models, powe...
2M pairs
95% Accuracy
51 countries covered
Off-the-shelf 2 millions pairs SFT text data. Contains 12 types of SFT QA, and the accuracy is not less than 95%. All prompts are manually written to meet di...
10K Annotated Flows
USA covered
AI Training Data featuring meticulously annotated checkout flows from leading retail, restaurant, and marketplace websites. Includes detailed step-by-step us...
10M Hours
95% Precision
236 countries covered
Starter dataset for AI teams with sampled noise (from 10M+ hours of measurements), mobility, and POI data. Ideal for rapid prototyping and AI research. CSV o...
35B Data Points
95% Precision
236 countries covered
Combines 10M+ hours of noise data with mobility and POI visitation data. Ideal for AI models combining environmental, mobility, and behavioral signals. CSV o...
35B Data Points
95% Accuracy
236 countries covered
Interpolated noise dataset built on 10M+ hours of real-world acoustic data combined with AI-generated predictions. Ideal for map generation, AI training, and...
35B Data Points
3.7% Horizontal Accuracy (Meters)
236 countries covered
Time-series dataset based on 10M+ hours of measured dBA data. Includes hourly, daily, and seasonal noise patterns. Ideal for AI models focused on forecasting...
160K Records
236 countries covered
2 years of historical data
Contains user-submitted noise complaints recorded via mobile devices. Each entry captures the time, location, type of noise source, and the emotional respons...