Best Large Language Model (LLM) Datasets & Databases

Easily explore, compare & preview top Large Language Model (LLM) Datasets via Datarade.

Filter by

Free sample preview56

Country Coverage

United States of America64

Germany52

Spain51

+ 247 more

Attributes

Language Name9

Company Name7

ZIP Code7

+ 45 more

Use case

Artificial Intelligence (AI)47

LLM Training31

Deep Learning20

+ 45 more

Data Provider

Nexdata24

Canaria Inc.9

Oxford Languages8

+ 14 more

Delivery Method

Email67

S3 Bucket60

REST API51

+ 11 more

73 Large Language Model (LLM) Data Datasets

and 42 more countries

and 244 more countries

and 6 more attributes

and 245 more countries

and 231 more countries

and 97 more countries

and 245 more countries

and 109 more countries

and 245 more countries

and 244 more countries

and 32 more countries

Can't find the data you're looking for?

Let data providers come to you by posting your request

Post your request

More Large Language Model (LLM) Data Products

Discover related large language model (llm) data products.

Pricing available upon request

Pricing available upon request

Pricing available upon request

Pricing available upon request

Pricing available upon request

Pricing available upon request

Pricing available upon request

Pricing available upon request

Best Large Language Model (LLM) Datasets & Databases

Unsupervised Speech Data |1 Million Hours | Spontaneous Speech | LLM | Pre-training |Large Language Model(LLM) Data

FileMarket | 20,000 photos | AI Training Data | Large Language Model (LLM) Data | Machine Learning (ML) Data | Deep Learning (DL) Data |

Large Language Model (LLM) Data | Machine Learning (ML) Data | AI Training Data (RAG) for 1M+ Global Grocery, Restaurant, and Retail Stores

Large Language Model (LLM) Noise Level Data | Noise Complaints | CCPA, GDPR Compliant | 160k Data Points | 100% Traceable Consent

TagX | 10000+ Multilingual Image Dataset | Text Detection | Global coverage | LLM data | LLM finetuning

Machine Learning (ML) Data | 800M+ B2B Profiles | AI-Ready for Deep Learning (DL), NLP & LLM Training

Foundation Model Data Collection and Data Annotation | Large Language Model(LLM) Data | SFT Data| Red Teaming Services

Large Language Model (LLM) Training Data | 180+ Countries | AI-Enhanced Ground Truth Based | 10M+ Hours of Measurements | 100% Traceable Consent

TagX Data collection for AI/ ML training | LLM data | Data collection for AI development & model finetuning | Text, image, audio, and document data

French Language Datasets | 150+ Years of Research | AI | NLP | LLMs | Dictionary Display | Translation Data | EU, Africa, Canada Coverage

Can't find the data you're looking for?

More Large Language Model (LLM) Data Products

African English Accent Conversational Dataset — Gender, Age, City Metadata with Validated Speech Samples

FileMarket | Biometric Data | Human Palm Image Dataset: 20,000 Photos for Machine Learning (ML) Data and AI Model Training

American English Language Datasets | 150+ Years of Research | Textual Data | NLP | LLMs | TTS | Dictionary Display | Game | US English Coverage

Foundation Model Data Collection and Data Annotation | Large Language Model(LLM) Data | SFT Data| Red Teaming Services

Venue Noise-Level Dataset for AI — Real Acoustic Profiles from 10M+ POI Measurements

AI Training Data- Spontaneous Conversations On-Demand - Accent & Dialect Focus

Noise-Level Time-Series Dataset — Over 10M Hours of Temporal Acoustic Data for AI Forecasting

Large Language Model (LLM) Training Data | 180+ Countries | AI-Enhanced Ground Truth Based | 10M+ Hours of Measurements | 100% Traceable Consent

Machine Learning (ML) Data | 800M+ B2B Profiles | AI-Ready for Deep Learning (DL), NLP & LLM Training

Global Call Center & Conversational Audio Dataset — Multilingual, Validated, with Demographics + Custom Collection Available

Call Center Audio Recordings (100,000+ Hours, High-Quality) in Multiple Languages | Available now (off-the-shelf)

FileMarket | 20,000 photos | AI Training Data | Large Language Model (LLM) Data | Machine Learning (ML) Data | Deep Learning (DL) Data |

AI Training Audio Data - Scripted Conversations On-Demand - Accent & Dialect Focus

AI Training Data- Spontaneous Conversations On-Demand - Accent & Dialect Focus

AI Training - Text-To-Speech Dataset - Very Diverse Languages and Dialects

AI Training - Spontaneous Conversations Dataset - ANY LANGUAGE

EMEA Data Suite | 3.3M Translations | 1.9M Words | 23 Languages | Natural Language Processing (NLP) Data | Translation Data | TTS | EMEA Coverage

German Language Datasets | 393K Translations | NLP | Dictionary Display | Machine Learning (ML) Data | Translations | EU Coverage

Best Large Language Model (LLM) Datasets & Databases

Unsupervised Speech Data |1 Million Hours | Spontaneous Speech | LLM | Pre-training |Large Language Model(LLM) Data

FileMarket | 20,000 photos | AI Training Data | Large Language Model (LLM) Data | Machine Learning (ML) Data | Deep Learning (DL) Data |

Large Language Model (LLM) Data | Machine Learning (ML) Data | AI Training Data (RAG) for 1M+ Global Grocery, Restaurant, and Retail Stores

Large Language Model (LLM) Noise Level Data | Noise Complaints | CCPA, GDPR Compliant | 160k Data Points | 100% Traceable Consent

TagX | 10000+ Multilingual Image Dataset | Text Detection | Global coverage | LLM data | LLM finetuning

Machine Learning (ML) Data | 800M+ B2B Profiles | AI-Ready for Deep Learning (DL), NLP & LLM Training

Foundation Model Data Collection and Data Annotation | Large Language Model(LLM) Data | SFT Data| Red Teaming Services

Large Language Model (LLM) Training Data | 180+ Countries | AI-Enhanced Ground Truth Based | 10M+ Hours of Measurements | 100% Traceable Consent

TagX Data collection for AI/ ML training | LLM data | Data collection for AI development & model finetuning | Text, image, audio, and document data

French Language Datasets | 150+ Years of Research | AI | NLP | LLMs | Dictionary Display | Translation Data | EU, Africa, Canada Coverage

Can't find the data you're looking for?

More Large Language Model (LLM) Data Products

African English Accent Conversational Dataset — Gender, Age, City Metadata with Validated Speech Samples

FileMarket | Biometric Data | Human Palm Image Dataset: 20,000 Photos for Machine Learning (ML) Data and AI Model Training

American English Language Datasets | 150+ Years of Research | Textual Data | NLP | LLMs | TTS | Dictionary Display | Game | US English Coverage

Foundation Model Data Collection and Data Annotation | Large Language Model(LLM) Data | SFT Data| Red Teaming Services

Venue Noise-Level Dataset for AI — Real Acoustic Profiles from 10M+ POI Measurements

AI Training Data- Spontaneous Conversations On-Demand - Accent & Dialect Focus

Noise-Level Time-Series Dataset — Over 10M Hours of Temporal Acoustic Data for AI Forecasting

Large Language Model (LLM) Training Data | 180+ Countries | AI-Enhanced Ground Truth Based | 10M+ Hours of Measurements | 100% Traceable Consent

Machine Learning (ML) Data | 800M+ B2B Profiles | AI-Ready for Deep Learning (DL), NLP & LLM Training

Global Call Center & Conversational Audio Dataset — Multilingual, Validated, with Demographics + Custom Collection Available

Call Center Audio Recordings (100,000+ Hours, High-Quality) in Multiple Languages | Available now (off-the-shelf)

FileMarket | 20,000 photos | AI Training Data | Large Language Model (LLM) Data | Machine Learning (ML) Data | Deep Learning (DL) Data |

AI Training Audio Data - Scripted Conversations On-Demand - Accent & Dialect Focus

AI Training Data- Spontaneous Conversations On-Demand - Accent & Dialect Focus

AI Training - Text-To-Speech Dataset - Very Diverse Languages and Dialects

AI Training - Spontaneous Conversations Dataset - ANY LANGUAGE

EMEA Data Suite | 3.3M Translations | 1.9M Words | 23 Languages | Natural Language Processing (NLP) Data | Translation Data | TTS | EMEA Coverage

German Language Datasets | 393K Translations | NLP | Dictionary Display | Machine Learning (ML) Data | Translations | EU Coverage

Stay updated with Datarade