Best Large Language Model (LLM) Datasets & Databases
Easily explore, compare & preview top Large Language Model (LLM) Datasets via Datarade.
Refine your data search
Refine your data search
Recommended Large Language Model (LLM) Data Products
43 Results
Test Questions Data | 50 Millions | Foundation Model | Unsupervised Text Data | Large Language Model(LLM) Data
by
Nexdata
About Nexdata
Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of ... For more details, please visit us at https://www.nexdata.ai/datasets/llm?source=Datarade
Available for 8 countries
1 PB
5 years of historical data
90% Accuracy
Starts at
$20,000 / purchase
Large Language Model (LLM) Data | 800,000 SFX Professional Sound Effects | Human Metadata
by
Soundsnap
Language Model (LLM) Data use cases. ... learning and generative AI applications
Our audio dataset stands out from the rest and is ideal for Large
Available for 247 countries
800K audio files
10 years of historical data
85% 48 kHz 24 bit or better
Starts at
$100,000 / year
Free sample preview
Large Language Model (LLM) Data | Machine Learning (ML) Data | AI Training Data (RAG) for 1M+ Global Grocery, Restaurant, and Retail Stores
by
MealMe
Comprehensive training data on 1M+ stores across the US & Canada. ... Pricing: Real-time and historical pricing data for dynamic pricing strategies and recommendations.
Available for 250 countries
1B Records
1 years of historical data
Pricing available upon request
Free sample preview
Large Language Model (LLM) Data | 10 Million POI Average Noise Levels | 35 B + Data Points | 100% Traceable Consent
Connect with our experts for Street and Venue Noise-Level Data. ... acoustic data, combining over 35 billion datapoints with AI-driven interpolation, developed together
Available for 236 countries
10M hours
2 years of historical data
Starts at
$5,000$4,500 / month
Free sample preview
10% Datarade discount
20% revenue share
FileMarket | 20,000 photos | AI Training Data | Large Language Model (LLM) Data | Machine Learning (ML) Data | Deep Learning (DL) Data |
by
FileMarket
FileMarket provides premium Large Language Model (LLM) Data designed to support and enhance a wide range ... Key use cases of our Large Language Model (LLM) Data:
Text generation
Chatbots and virtual assistants
Available for 249 countries
20K photos
95% accuracy
Pricing available upon request
Free sample preview
Foundation Model Data Collection and Data Annotation | Large Language Model(LLM) Data | SFT Data| Red Teaming Services
by
Nexdata
provides flexible and customized Large Language Model(LLM) Data Data annotation services for tasks such ... We provide Large Language Model(LLM) Data cleaning and personnel support services based on the specific
Available for 119 countries
50 TB of text data
5 years of historical data
98% accuracy
Starts at
$20,000 / purchase
Free sample preview
Music Data for Large Language Models LLM | 50,000 Music Files | Updated Weekly | Royalty Free Music | Pre-cleared for Generative AI
by
Soundsnap
The number one music dataset in the world. 50,000 professional music track in all genres with human crafted metadata. All rights are cleared for use in machine learning and generative AI.
Available for 249 countries
50K music tracks
10 years of historical data
80% instrumental
Starts at
$500,000 / purchase
Free sample preview
Machine Learning (ML) Data | 800M+ B2B Profiles | AI-Ready for Deep Learning (DL), NLP & LLM Training
by
Xverum
From natural language processing (NLP) to predictive analytics, our data empowers a wide range of industries ... Ideal for chatbots, language models, and content categorization.
Available for 250 countries
730M Individual Profiles
3 years of historical data
100% Open Web Data
Starts at
$1,000$900 / month
Free sample preview
10% Datarade discount
Large Language Model (LLM) Noise Level Data | 236 Countries Coverage | CCPA, GDPR Compliant | 35 B + Data Points | 100% Traceable Consent
dataset is designed for cutting-edge AI use cases where real-world, multi-source information enhances model ... Combines 10M+ hours of noise data with mobility and POI visitation data.
Available for 236 countries
35B Data Points
2 years of historical data
95% Precision
Starts at
$2,500$2,250 / month
Free sample preview
10% Datarade discount
20% revenue share
FileMarket | Telegram Users Geolocation Data with IP & Consent | 50,000 Records | AI, ML, DL & LLM Training Data
by
FileMarket
Large Language Model (LLM) Data: The geolocation data can be used to train LLMs to better understand ... This data is specifically tailored for use in AI, ML, DL, and LLM models, as well as applications requiring
Available for 249 countries
50K records
Pricing available upon request
Free sample preview
Can't find the data you're looking for?
Let data providers come to you by posting your request
Post your request
More Large Language Model (LLM) Data Products
Discover related large language model (llm) data products.
50K music tracks
80% instrumental
249 countries covered
The number one music dataset in the world. 50,000 professional music track in all genres with human crafted metadata. All rights are cleared for use in machi...
15K Hours
98% sentence/word
82 countries covered
The Natural Language Processing (NLP) Data of in-car speech covers 20+ languages, including read, wake-up word, commend word, code-swithing, multimodal and n...
10M records per week
250 countries covered
Our Upwork dataset provides detailed freelance and remote work listings, client profiles, and project trends from a leading platform for freelancers. Perfect...
240 countries covered
At Bitext, we offer advanced linguistic tools designed for automated pre-labeling of datasets to help scale Data Annotation and Labeling (DAL) projects.
50K Hours
98% sentence/word
28 countries covered
The recorded text is a mixture multi-language sentences, covering general scenes and human-computer interaction scenes. The audio data is rich in content and...
35B Data Points
95% Precision
236 countries covered
Combines 10M+ hours of noise data with mobility and POI visitation data. Ideal for AI models combining environmental, mobility, and behavioral signals. CSV o...
730M Individual Profiles
100% Open Web Data
250 countries covered
Xverum’s Machine Learning (ML) data will help you to train LLMs and generative AI with 800M B2B profiles. 100+ attributes, global coverage, and GDPR-complian...
15K Hours
98% sentence/word
85 countries covered
Nexdata has off-the-shelf 15,000 hours Machine Learning (ML) Data of 8kHz conversational speech, covering 100+ countries including English, German, French, S...
35K Hours
98% sentence/word
81 countries covered
Nexdata has off-the-shelf 35,000 hours Machine Learning (ML) Data of 16kHz conversational speech, covering 100+ countries including English, German, French, ...
40K Hours
98% sentence/word
54 countries covered
The speech data is collected from native English speakers in 40 countries,covering a varity of pronunciation habits and characteristics. The script is design...
65K Hours
98% sentence/word
102 countries covered
Off-the-shelf Scripted Monologues Speech Datasets cover 100+ languages. All the Machine Learning (ML) Data are collected from native speakers, with signed au...
50K Hours
98% sentence/word
28 countries covered
The recorded text is a mixture multi-language sentences, covering general scenes and human-computer interaction scenes. The audio data is rich in content and...
10M Hours
95% Precision
236 countries covered
Starter dataset for AI teams with sampled noise (from 10M+ hours of measurements), mobility, and POI data. Ideal for rapid prototyping and AI research. CSV o...
35B Data Points
95% Precision
236 countries covered
Combines 10M+ hours of noise data with mobility and POI visitation data. Ideal for AI models combining environmental, mobility, and behavioral signals. CSV o...
35B Data Points
95% Accuracy
236 countries covered
Interpolated noise dataset built on 10M+ hours of real-world acoustic data combined with AI-generated predictions. Ideal for map generation, AI training, and...
35B Data Points
3.7% Horizontal Accuracy (Meters)
236 countries covered
Time-series dataset based on 10M+ hours of measured dBA data. Includes hourly, daily, and seasonal noise patterns. Ideal for AI models focused on forecasting...
10M Records
95% Precision
236 countries covered
Real-world venue noise-level data (restaurants, nightlife, gyms, etc.) based on 10M+ hours of measured dBA data. Ideal for AI training in acoustic classifica...
10M hours
236 countries covered
2 years of historical data
Silencio provides the world’s largest real-world street and venue noise-level dataset, combining over 35 billion datapoints with AI-powered interpolation. Fu...