Best Large Language Model (LLM) Datasets & Databases
Easily explore, compare & preview top Large Language Model (LLM) Datasets via Datarade.
62 Large Language Model (LLM) Data Datasets

Unsupervised Speech Data |1 Million Hours | Spontaneous Speech | LLM | Pre-training |Large Language Model(LLM) Data
Available in
and 35 more countries

Large Language Model (LLM) Data | Machine Learning (ML) Data | AI Training Data (RAG) for 1M+ Global Grocery, Restaurant, and Retail Stores
Latitude
ZIP Code
City Name
URL
State Abbreviation
and 6 more attributes
Available in
and 245 more countries

Large Language Model (LLM) Data | 10 Million POI Average Noise Levels | 35 B + Data Points | 100% Traceable Consent
Latitude
Longitude
City Name
POI Name
POI ID
and 5 more attributes
Available in
and 231 more countries

TagX | 10000+ Multilingual Image Dataset | Text Detection | Global coverage | LLM data | LLM finetuning
Available in
and 97 more countries

Dappier | Breaking News Data | RAG API, LLM Compatible | Real-Time Updates | Unlimited Data
URL
Company Domain
Website
Available in
and 245 more countries

Machine Learning (ML) Data | 800M+ B2B Profiles | AI-Ready for Deep Learning (DL), NLP & LLM Training
Available in
and 245 more countries

CrawlBee | ML Training Data | LLM Data | Generative AI Data | Code Base Training Data | Healthcare Training Data
Available in

Canaria | Indeed Job Postings Data | U.S. | 4M+ Monthly Indeed Job Postings Data | AI-LLM Enhanced with 3 Years of Historical Indeed Job Postings Data
Company Name
ZIP Code
Company Industry
City Name
Company ID
and 10 more attributes
Available in

Foundation Model Data Collection and Data Annotation | Large Language Model(LLM) Data | SFT Data| Red Teaming Services
Available in
and 110 more countries

Large Language Model (LLM) Noise Data | Noise Complaints + Urban Noise Levels | CCPA, GDPR Compliant | 100% Traceable Consent
Latitude
Longitude
Country Code Alpha-2
Available in
and 231 more countries
Can't find the data you're looking for?
Let data providers come to you by posting your request
Post your request
More Large Language Model (LLM) Data Products
Discover related large language model (llm) data products.

Mixed Speech Data |5,000 Hours |Code-switching|Audio Data| Speech Recognition Data| AI Datasets
Free sample preview
API available
Starts at
$20,000 / purchase

Canaria | Indeed Job Postings Data | U.S. | 4M+ Monthly Indeed Job Postings Data | AI-LLM Enhanced with 3 Years of Historical Indeed Job Postings Data
Free sample preview
Pricing available upon request

Scripted Monologues Speech Data | 65,000 Hours | Generative AI Audio Data| Speech Recognition Data | Machine Learning (ML) Data
Free sample preview
API available
Starts at
$20,000 / purchase

Dappier | Breaking News Data | RAG API, LLM Compatible | Real-Time Updates | Unlimited Data
Free sample preview
API available
Starts at
$0.30$0.27 / 100 queries

Foundation Model Data Collection and Data Annotation | Large Language Model(LLM) Data | SFT Data| Red Teaming Services
Free sample preview
API available
Starts at
$20,000 / purchase

Large Language Model (LLM) Data | 10 Million POI Average Noise Levels | 35 B + Data Points | 100% Traceable Consent
Free sample preview
Starts at
$5,000$4,500 / month

Canaria | Indeed Job Postings Data | U.S. | 4M+ Monthly Indeed Job Postings Data | AI-LLM Enhanced with 3 Years of Historical Indeed Job Postings Data
Free sample preview
Pricing available upon request

TagX Data collection for AI/ ML training | LLM data | Data collection for AI development & model finetuning | Text, image, audio, and document data
API available
Starts at
$1,000 / month

Foundation Model Data Collection and Data Annotation | Large Language Model(LLM) Data | SFT Data| Red Teaming Services
Free sample preview
API available
Starts at
$20,000 / purchase

TagX | 10000+ Multilingual Image Dataset | Text Detection | Global coverage | LLM data | LLM finetuning
Pricing available upon request

CrawlBee | ML Training Data | LLM Data | Generative AI Data | Code Base Training Data | Healthcare Training Data
API available
Pricing available upon request

8kHz Conversational Speech Data | 15,000 Hours | Audio Data | Speech Recognition Data| Machine Learning (ML) Data
Free sample preview
API available
Starts at
$20,000 / purchase

AI Training Data | Annotated Checkout Flows for Retail, Restaurant, and Marketplace Websites
Free sample preview
Pricing available upon request

Large Language Model (LLM) Data | 10 M Hours of Urban Noise Level Measurement | CCPA, GDPR Compliant | 35 B + Data Points | 100% Traceable Consent
Free sample preview
Starts at
$2,500$2,250 / month

Large Language Model (LLM) Noise Level Data | 236 Countries Coverage | CCPA, GDPR Compliant | 35 B + Data Points | 100% Traceable Consent
Free sample preview
Starts at
$2,500$2,250 / month

Large Language Model (LLM) Training Data | 236 Countries | AI-Enhanced Ground Truth Based | 10M+ Hours of Measurements | 100% Traceable Consent
Free sample preview
Starts at
$5,000$4,500 / month

Noise-Level Time-Series Dataset — Over 10M Hours of Temporal Acoustic Data for AI Forecasting
Free sample preview
Starts at
$5,000$4,500 / month

Large Language Model (LLM) Noise Data | Noise Complaints + Urban Noise Levels | CCPA, GDPR Compliant | 100% Traceable Consent
Free sample preview
Starts at
$1,500$1,350 / month