Refine your data search
39 Results

Large Language Model (LLM) Data | 800,000 SFX Professional Sound Effects | Human Metadata

Language Model (LLM) Data use cases. ... learning and generative AI applications Our audio dataset stands out from the rest and is ideal for Large
Available for 247 countries
800K audio files
10 years of historical data
85% 48 kHz 24 bit or better
Starts at
$100,000 / year
Free sample preview

Nexdata | Unsupervised Text Data | 1 PB | Foundation Model | Pre-training Data | Large Language Model(LLM) Data

by Nexdata
About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours ... type; Format: jsonl; Language: English, Korean, French, German, Spanish e-books Data Volume
Available for 88 countries
1 PB
5 years of historical data
90% Accuracy
Starts at
$5,000 / purchase
Free sample preview

FileMarket | 20,000 photos | AI Training Data | Large Language Model (LLM) Data | Machine Learning (ML) Data | Deep Learning (DL) Data |

FileMarket provides premium Large Language Model (LLM) Data designed to support and enhance a wide range ... Key use cases of our Large Language Model (LLM) Data: Text generation Chatbots and virtual assistants
Available for 249 countries
20K photos
95% accuracy
Pricing available upon request
Free sample preview

Music Data for Large Language Models LLM | 50,000 Music Files | Updated Weekly | Royalty Free Music | Pre-cleared for Generative AI

The number one music dataset in the world. 50,000 professional music track in all genres with human crafted metadata. All rights are cleared for use in machine learning and generative AI.
Available for 249 countries
50K music tracks
10 years of historical data
80% instrumental
Starts at
$500,000 / purchase
Free sample preview

Nexdata | Foundation Model Data Collection and Data Annotation | Large Language Model(LLM) Data | SFT Data| Red Teaming Services

by Nexdata
provides flexible and customized Large Language Model(LLM) Data Data annotation services for tasks such ... We provide Large Language Model(LLM) Data cleaning and personnel support services based on the specific
Available for 119 countries
50 TB of text data
5 years of historical data
98% accuracy
Starts at
$5,000 / purchase
Free sample preview
5.0(1)

AI & ML Training Data | 800M Profiles for LLMs, Generative AI, NLP & Predictive Models

by Xverum
From natural language processing (NLP) to predictive analytics, our data empowers a wide range of industries ... Ideal for chatbots, language models, and content categorization.
Available for 250 countries
730M Individual Profiles
3 years of historical data
99% Complete and Fully Updated Data
Starts at
$1,000$900 / month
Free sample preview
10% Datarade discount

FileMarket | Telegram Users Geolocation Data with IP & Consent | 50,000 Records | AI, ML, DL & LLM Training Data

Large Language Model (LLM) Data: The geolocation data can be used to train LLMs to better understand ... This data is specifically tailored for use in AI, ML, DL, and LLM models, as well as applications requiring
Available for 249 countries
50K records
Pricing available upon request
Free sample preview

Nexdata | Image and Video Description Data | 1 PB | Multimodal Data | GenAI | LLM Data | Large Language Model(LLM) Data

by Nexdata
Language Model(LLM) Data, 1 million hours of Audio Data and 800TB of Annotated Imagery Data. ... formats;.xlsx (annotation file format) About Nexdata Nexdata owns off-the-shelf PB-level Large
Available for 81 countries
1 PB
5 years of historical data
95% Accuracy
Starts at
$5,000 / purchase

Bitext | AI Training Data | Hybrid Synthetic Data for LLM Finetuning | Custom Training and Evaluation Datasets for Chatbots

by bitext
Enhance your large language models (LLMs) globally with precise and comprehensive Synthetic Data from ... Use cases of our Hybrid Synthetic Data: LLM Finetuning Custom Chatbot Training Bias Mitigation
Available for 249 countries
9 Languages
100% Utterances Semantically Equivalent
Available Pricing:
One-off purchase
Monthly License
Yearly License
Free sample preview

AI Training Data (RAG) for Grocery, Restaurant, and Retail RAG Models – 1M+ Stores in US & Canada

by MealMe
Comprehensive training data on 1M+ stores across the US & Canada. ... Pricing: Real-time and historical pricing data for dynamic pricing strategies and recommendations.
Available for 250 countries
1B Records
1 years of historical data
Pricing available upon request
Free sample preview

Can't find the data you're looking for?

Let data providers come to you by posting your request

Post your request

More Large Language Model (LLM) Data Products

Discover related large language model (llm) data products.
20K images
95% accuracy
240 countries covered
Our pre-compiled biometric data set (human faces) includes comprehensive features such as 3D depth, segmentation of facial organs and accessories, key points...
9 Languages
100% Utterances Semantically Equivalent
249 countries covered
Enhance your AI models with Bitext's comprehensive Textual Data and access high-quality data with 100% semantically equivalent utterances across 20 verticals.
50K music tracks
80% instrumental
249 countries covered
The world's leading music dataset, featuring 50,000 professional tracks across all genres, complete with expertly crafted metadata. All rights are fully clea...
50K music tracks
80% instrumental
249 countries covered
The number one music dataset in the world. 50,000 professional music track in all genres with human crafted metadata. All rights are cleared for use in machi...
20K photos
95% accuracy
249 countries covered
Enhance your LLMs with our comprehensive and diverse large language model data sets, designed for optimal training and performance.
1B Records
250 countries covered
1 years of historical data
Comprehensive training data on 1M+ stores across the US & Canada. Includes detailed menus, inventory, pricing, and availability. Ideal for AI/ML models, powe...
20K photos
95% accuracy
8 countries covered
A comprehensive dataset of 20,000 human palm images from Bangladesh, Russia, Nigeria, Ukraine, and other countries. Ideal for AI model training, gesture reco...
20K voice memos
240 countries covered
We help clients source, curate, and transcribe data for AI and machine learning models. Our services include customized audio data collection and transcripti...
50K records
249 countries covered
Comprehensive dataset of Telegram users' geolocations with IP addresses, fully consented, comprising 50,000 records. Ideal for AI, ML, DL, and LLM training, ...
1M hours
95% Accuracy
74 countries covered
Off-the-shelf 1 million hours of Unsupervised speech dataset, covering 10+ languages(English, French, German, Japanese, Arabic, Mandarin and etc. , 100,000 h...
15K Hours
98% sentence/word
85 countries covered
Nexdata has off-the-shelf 15,000 hours Machine Learning (ML) Data of 8kHz conversational speech, covering 100+ countries including English, German, French, S...
35K Hours
98% sentence/word
81 countries covered
Nexdata has off-the-shelf 35,000 hours Machine Learning (ML) Data of 16kHz conversational speech, covering 100+ countries including English, German, French, ...
1B Records
250 countries covered
1 years of historical data
Comprehensive training data on 1M+ stores across the US & Canada. Includes detailed menus, inventory, pricing, and availability. Ideal for AI/ML models, powe...
1 PB
90% Accuracy
88 countries covered
Off-the-shelf 1PB unsupervised text data covers test questions, textbooks, e-books, papers, parallel copora, online Q&A, chating dialogue and etc.
1 PB
95% Accuracy
81 countries covered
Off-the-shelf 1PB image and video description data covers multiple scenes, languages, and domains.
1M hours
95% Accuracy
74 countries covered
Off-the-shelf 1 million hours of Unsupervised speech dataset, covering 10+ languages(English, French, German, Japanese, Arabic, Mandarin and etc. , 100,000 h...
20K hours
98% Word Accuracy Rate
41 countries covered
Off-the-shelf 20,000 hours of Casual Conversation Speech data, covering 30+ languages. Covering diverse domains like self-media, conversations, live streams,...
200 Countries
250 countries covered
16 years of historical data
Get 50TB of 10+ Years of Historical Data continuously, with live API and on demand historical datasets. We offer a firehose option, with 170+ languages and c...