Nexdata | Multilingual Unsupervised Speech Data |1 Million Hours | Spontaneous Speech | LLM | Pre-training |Large Language Model(LLM) Data product image in hero

Nexdata | Multilingual Unsupervised Speech Data |1 Million Hours | Spontaneous Speech | LLM | Pre-training |Large Language Model(LLM) Data

Nexdata
No reviews yetBadge iconVerified Data Provider
#
xxxxxxxxxx
Xxxxxxxxx
xxxxxx
xxxxxxxxxx
Xxxxx
Xxxxxx
Xxxxxxxxxx
Xxxxxx
1 xxxxxxxxxx Xxxxxxxxx xxxxxx xxxxxxxxxx Xxxxx Xxxxxx Xxxxxxxxxx Xxxxxx
2 Xxxxxxxxx Xxxxxxxxxx xxxxxxxxx Xxxxxxxxx xxxxxxxxx Xxxxxxx xxxxxx Xxxxx
3 xxxxxxxxxx xxxxxx Xxxxxxxxxx xxxxxx Xxxxx Xxxxxx xxxxx xxxxxxxx
4 xxxxxxx Xxxxx Xxxxxxxx xxxxxxxxxx xxxxxx Xxxxxxxxx xxxxxx Xxxxxxxxx
5 Xxxxxxxxx xxxxxxxxxx Xxxxxx Xxxxx xxxxxx xxxxxxx xxxxxxx Xxxxx
6 xxxxxx Xxxxxxxxxx xxxxxxxx xxxxxx Xxxxx Xxxxxxx xxxxxx Xxxxxxxx
7 Xxxxxxx Xxxxx xxxxxx xxxxxxxxxx Xxxxx xxxxxxxxxx xxxxxxxxx Xxxxxxx
8 xxxxxxxx xxxxxxxx Xxxxxxxxxx Xxxxxxxx Xxxxxxxx xxxxxxxxx Xxxxxxxxxx Xxxxxx
9 Xxxxxxxxx xxxxx xxxxxxx xxxxxxxxx Xxxxxx Xxxxxxx Xxxxxxxxx xxxxxxxxx
10 xxxxxxxxx Xxxxx xxxxxxxx Xxxxxxx xxxxxxxxx Xxxxxxx xxxxx Xxxxxxx
... xxxxxxx Xxxxx xxxxxxxxxx Xxxxxxx Xxxxx xxxxxxxxxx Xxxxxx xxxxxx
Request Data Sample
Volume
1M
hours
Data Quality
95%
Accuracy
Avail. Formats
.bin, .json, and .xml
File
Coverage
47
Countries
History
5
years

Description

Off-the-shelf 1 million hours of Unsupervised speech dataset, covering 10+ languages(English, French, German, Japanese, Arabic, Mandarin and etc. , 100,000 hours each). The content covers dialogues or monologues in 28 common domains, such as daily vlogs, travel, podcast, technology, beauty, etc.
1. Specifications Format: 16k Hz, 16 bit, wav, mono channel Content category: Dialogue or monologue in several common domains, such as daily vlogs, travel, podcast, technology, beauty, etc Language: English(USA, UK, Canada, Australia, India, Philippine, etc.), French, German, Japanese, Arabic(MSA, Gulf, Levantine, Egyptian accents, etc.), etc. Recording condition: Mixed(indoor, public place, entertainment,etc.) 2. About Nexdata Nexdata owns off-the-shelf PB-level Large Language Model(LLM) Data, 1 million hours of speech data and 800TB of Annotated Imagery Data. These ready-to-go data supports instant delivery, quickly improve the accuracy of AI models. For more details, please visit us at https://www.nexdata.ai/datasets/speechrecog?source=Datarade

Country Coverage

Africa (6)
Algeria
Egypt
Libya
Morocco
South Africa
Tunisia
Asia (8)
Hong Kong
Japan
Korea (Republic of)
Lebanon
Macao
Saudi Arabia
Taiwan
United Arab Emirates
Europe (16)
Austria
Belgium
France
Germany
Iceland
Ireland
Italy
Luxembourg
Malta
Netherlands
Poland
Portugal
Russian Federation
Spain
Switzerland
United Kingdom
North America (3)
Canada
Mexico
United States of America
Oceania (2)
Australia
New Zealand
South America (12)
Argentina
Brazil
Chile
Colombia
Dominica
Dominican Republic
Ecuador
Jamaica
Peru
Puerto Rico
Uruguay
Venezuela (Bolivarian Republic of)

History

5 years of historical data

Volume

1 million hours

Pricing

Free sample available
License Starts at
One-off purchase
$20,000 / purchase
Monthly License Not available
Yearly License Not available
Usage-based Not available

Suitable Company Sizes

Small Business
Medium-sized Business
Enterprise

Quality

Self-reported by the provider
95%
Accuracy

Delivery

Methods
S3 Bucket
SFTP
Email
UI Export
REST API
SOAP API
Streaming API
Feed API
Frequency
secondly
minutely
hourly
daily
weekly
monthly
quarterly
yearly
real-time
on-demand
Format
.bin
.json
.xml
.csv
.xls
.sql
.txt

Use Cases

Categories

Related Products

15K Hours
98% sentence/word
85 countries covered
Nexdata has off-the-shelf 15,000 hours Machine Learning (ML) Data of 8kHz conversational speech, covering 100+ countries including English, German, French, S...
1B Records
250 countries covered
1 years of historical data
Comprehensive training data on 1M+ stores across the US & Canada. Includes detailed menus, inventory, pricing, and availability. Ideal for AI/ML models, powe...
800K audio files
85% 48 kHz 24 bit or better
247 countries covered
The worldwide leading sound effects dataset, featuring 800,000 professional audio files across all categories, each accompanied by human-crafted metadata. Ad...
20K voice memos
240 countries covered
We help clients source, curate, and transcribe data for AI and machine learning models. Our services include customized audio data collection and transcripti...

Frequently asked questions

What is Nexdata Multilingual Unsupervised Speech Data 1 Million Hours Spontaneous Speech LLM Pre-training Large Language Model(LLM) Data?

Off-the-shelf 1 million hours of Unsupervised speech dataset, covering 10+ languages(English, French, German, Japanese, Arabic, Mandarin and etc. , 100,000 hours each). The content covers dialogues or monologues in 28 common domains, such as daily vlogs, travel, podcast, technology, beauty, etc.

What is Nexdata Multilingual Unsupervised Speech Data 1 Million Hours Spontaneous Speech LLM Pre-training Large Language Model(LLM) Data used for?

This product has 5 key use cases. Nexdata recommends using the data for Artificial Intelligence (AI), Machine Learning (ML), Deep Learning, Speech Recognition, and LLM Training. Global businesses and organizations buy Machine Learning (ML) Data from Nexdata to fuel their analytics and enrichment.

Who can use Nexdata Multilingual Unsupervised Speech Data 1 Million Hours Spontaneous Speech LLM Pre-training Large Language Model(LLM) Data?

This product is best suited if you’re a Medium-sized Business, Enterprise, or Small Business looking for Machine Learning (ML) Data. Get in touch with Nexdata to see what their data can do for your business and find out which integrations they provide.

How far back does the data in Nexdata Multilingual Unsupervised Speech Data 1 Million Hours Spontaneous Speech LLM Pre-training Large Language Model(LLM) Data go?

This product has 5 years of historical coverage. It can be delivered on a secondly, minutely, hourly, daily, weekly, monthly, quarterly, yearly, real-time, and on-demand basis.

Which countries does Nexdata Multilingual Unsupervised Speech Data 1 Million Hours Spontaneous Speech LLM Pre-training Large Language Model(LLM) Data cover?

This product includes data covering 47 countries like USA, Japan, Germany, United Kingdom, and France. Nexdata is headquartered in United States of America.

How much does Nexdata Multilingual Unsupervised Speech Data 1 Million Hours Spontaneous Speech LLM Pre-training Large Language Model(LLM) Data cost?

Pricing for Nexdata Multilingual Unsupervised Speech Data 1 Million Hours Spontaneous Speech LLM Pre-training Large Language Model(LLM) Data starts at USD20,000 per purchase. Connect with Nexdata to get a quote and arrange custom pricing models based on your data requirements.

How can I get Nexdata Multilingual Unsupervised Speech Data 1 Million Hours Spontaneous Speech LLM Pre-training Large Language Model(LLM) Data?

Businesses can buy Machine Learning (ML) Data from Nexdata and get the data via S3 Bucket, SFTP, Email, UI Export, REST API, SOAP API, Streaming API, and Feed API. Depending on your data requirements and subscription budget, Nexdata can deliver this product in .bin, .json, .xml, .csv, .xls, .sql, and .txt format.

What is the data quality of Nexdata Multilingual Unsupervised Speech Data 1 Million Hours Spontaneous Speech LLM Pre-training Large Language Model(LLM) Data?

Nexdata has reported that this product has the following quality and accuracy assurances: 95% Accuracy. You can compare and assess the data quality of Nexdata using Datarade’s data marketplace.

What are similar products to Nexdata Multilingual Unsupervised Speech Data 1 Million Hours Spontaneous Speech LLM Pre-training Large Language Model(LLM) Data?

This product has 3 related products. These alternatives include Nexdata Multilingual Conversational Speech Data 8kHz Telephone 15,000 Hours Audio Data Speech Recognition Data Machine Learning (ML) Data, Large Language Model (LLM) Data Machine Learning (ML) Data AI Training Data (RAG) for 1M+ Global Grocery, Restaurant, and Retail Stores, and Large Language Model (LLM) Data 800,000 SFX Professional Sound Effects Human Metadata. You can compare the best Machine Learning (ML) Data providers and products via Datarade’s data marketplace and get the right data for your use case.

Starts at
$20,000 / purchase
License Starts at
One-off purchase
$20,000 / purchase
Monthly License Not available
Yearly License Not available
Usage-based Not available

Nexdata

Sharpen Your AI with Better Data

Verified provider icon Verified Provider
1h Avg. response time
100% Response rate