Best Transcription Datasets & Databases

Transcription data is a valuable resource for various industries, providing accurate and reliable text versions of audio and video content. Whether you need it for research, training, or analysis purposes, having access to high-quality transcription datasets and databases is crucial. In this article, we will explore what transcription data is, its applications, and the best data sources to obtain and purchase this data on Datarade.ai.Learn more
22 Results

Nexdata | Multilingual Speech Synthesis Data | 400 Hours | TTS Data|Audio Data |AI & ML Training Data

by Nexdata
, 800TB of image/video data, about 2 billion pieces of NLP data. ... Speaker : native speaker Annotation Feature : word transcription, part-of-speech, phoneme boundary,
Available for 42 countries
400 hours
5 years of historical data
95% sentence accuracy
Starts at
$5,000 / purchase
Free sample preview

Picasso Podcast Data: Transcriptions of All Popular Podcasts (5K+ Podcasts)

by Picasso
It contains all transcriptions of the episodes of the most popular ~5k podcasts, updated weekly.
Available for 240 countries
5K Podcasts
Pricing available upon request

Broadcast Transcript Feed with Sentiment Analysis (GBTS)

by TVEyes
Broadcast TV transcripts are collected from cable providers, satellite, streaming or terrestrial broadcast. Information extraction and enrichment tasks performed on the associated transcripts inclu...
Available for 13 countries
8 Years
8 years of historical data
Available Pricing:
One-off purchase
Yearly License

DecaData: Online Purchase data- InstaCart, Shipt, DoorDash, UberEats

DecaData Online Purchase data provides a view in to the transaction velocity of online grocery delivery ... DecaData Online Purchase data provides a view in to the transaction velocity of online grocery delivery
Available for 1 countries
14 years
14 years of historical data
Pricing available upon request

Walmart (NYSE: WMT) | US Same Store Sales Prediction Data | Accurate (Corr: 0.85, MAPE: 3.8%) | Quarterly

Tap into Huq’s accurate same store sales prediction data to find an edge on the market. ... The data is available quarterly, ahead of analyst estimates. Delivered as a chart and CSV.
Available for 1 countries
1 Prediction per quarter
4 years of historical data
85% 0.85 net revenue correlation (17 quarters)
Starts at
£5,000 / purchase

Nexdata | Multilingual Children Speech Data| 10,000 Hours | AI & ML Training Data | Speech Recognition Data| Audio Data

by Nexdata
Norwegian, Finnish, Hungarian, Thai, Hindi, Indonesian, Vietnamese, Malay, Burmese, Filipino(Tagalog) Transcription ... , 800TB of Annotated Imagery Data, about 2 billion pieces of Natural Language Processing (NLP) Data.
Available for 44 countries
10K hours
5 years of historical data
95% sentence accuracy
Starts at
$10,000 / purchase
Free sample preview

US Public Companies Earning Calls Audio and Video Database - FactSquared Transcribe

FactSquared Transcribe provides automated, full-text, searchable, indexed feeds of audio and video content.
Available for 1 countries
Pricing available upon request

Nexdata | In-Car Speech Data | 15,000 Hours | AI & ML Training Data| Speech Recognition Data| Audio Data |Natural Language Processing (NLP) Data

by Nexdata
Device : High fidelity microphone; Binocular camera Language : 20 languages Transcription content : ... Imagery Data, about 2 billion pieces of Natural Language Processing (NLP) Data.
Available for 61 countries
15K Hours
5 years of historical data
98% sentence/word
Starts at
$5,000 / purchase
Free sample preview

Wall Street Horizon Corporate Event Data - Historical

Calendar Data ... We archive data as we publish it.
Available for 249 countries
15 years of historical data
Pricing available upon request

FactSquared Stock Sentiment Speech Analytics Data USA

FactSquared Analyze sits atop FactSquared Transcribe, which offers accurate, near-instant voice-to-text transcription ... FactSquared Analyze offers unique data-driven insights into what public figures are – and aren’t – saying
Available for 1 countries
Pricing available upon request

More Transcription Data Products

Discover related transcription data products.

10K hours
95% sentence accuracy
44 countries covered
Nexdata has off-the-shelf AI & ML Training Data of 10,000 hours children speech, covering more than 40 languages and accents. The recorded text contains comm...
65K Hours
98% sentence/word
94 countries covered
Off-the-shelf read speech data cover 100+ languages. All the Machine Learning (ML) Data are collected from native speakers, with signed authorization agreeme...
1K hour per month
99.5% word accuracy
136 countries covered
Nexdata provides multi-language, multi-timbre, multi-domain and multi-style speech synthesis data collection servicesfor Deep Learning Data.
15K Hours
98% sentence/word
61 countries covered
The Natural Language Processing (NLP) Data of in-car speech covers 20+ languages, including read, wake-up word, commend word, code-swithing, multimodal and n...
40K Hours
98% sentence/word
47 countries covered
The Natural Language Processing (NLP) Data is collected from native English speakers in 40 countries,covering a varity of pronunciation habits and characteri...
50K Hours
98% sentence/word
21 countries covered
The recorded text is a mixture multi-language sentences, covering general scenes and human-computer interaction scenes. The Natural Language Processing (NLP)...
63.1K audio recordings
USA covered
15 years of historical data
Helios MERCURY is built upon Helios quant technology, is specially designed for fundamental researchers and traders, and features an easy-to-understand dashb...
249 countries covered
15 years of historical data
Wall Street Horizon's historical corporate event data are the industry's largest, most detailed and most accurate archiving of this type of event data. We ar...
40K Hours
98% sentence/word
47 countries covered
The Natural Language Processing (NLP) Data is collected from native English speakers in 40 countries,covering a varity of pronunciation habits and characteri...
35K Hours
98% sentence/word
60 countries covered
Nexdata has off-the-shelf 35,000 hours Machine Learning (ML) Data of 16kHz conversational speech, covering 100+ countries including English, German, French, ...
1K hour per month
99.5% word accuracy
136 countries covered
Nexdata provides multi-language, multi-timbre, multi-domain and multi-style speech synthesis data collection servicesfor Deep Learning Data.
USA covered
15 years of historical data
A completely new type of product in the alternative data space that combines understanding in audio analytics, biosignals, neuroscience and AI to derive nove...
14 years
USA covered
14 years of historical data
DecaData Online Purchase data provides a view in to the transaction velocity of online grocery delivery services like InstaCart, Shipt, DoorDash and UberEats...
1K hour per month
99.5% word accuracy
136 countries covered
Nexdata provides multi-language, multi-timbre, multi-domain and multi-style speech synthesis data collection servicesfor Deep Learning Data.
100K hours per month
99.5% word accuracy
136 countries covered
Nexdata provides high-quality Natural Language Processing (NLP) Data services for speech cleaning, speech transcription, phoneme annotation etc, with word ac...
15K Hours
98% sentence/word
61 countries covered
The Natural Language Processing (NLP) Data of in-car speech covers 20+ languages, including read, wake-up word, commend word, code-swithing, multimodal and n...
50K Hours
98% sentence/word
21 countries covered
The recorded text is a mixture multi-language sentences, covering general scenes and human-computer interaction scenes. The Natural Language Processing (NLP)...
65K Hours
98% sentence/word
94 countries covered
Off-the-shelf read speech data cover 100+ languages. All the Machine Learning (ML) Data are collected from native speakers, with signed authorization agreeme...

Where can I buy Transcription Data?

Data providers and vendors listed on Datarade sell Transcription Data products and samples. Popular Transcription Data products and datasets available on our platform are Nexdata | Multilingual Speech Synthesis Data | 400 Hours | TTS Data|Audio Data |AI & ML Training Data by Nexdata, Picasso Podcast Data: Transcriptions of All Popular Podcasts (5K+ Podcasts) by Picasso, and Broadcast Transcript Feed with Sentiment Analysis (GBTS) by TVEyes.

How can I get Transcription Data?

You can get Transcription Data via a range of delivery methods - the right one for you depends on your use case. For example, historical Transcription Data is usually available to download in bulk and delivered using an S3 bucket. On the other hand, if your use case is time-critical, you can buy real-time Transcription Data APIs, feeds and streams to download the most up-to-date intelligence.