Takuya Umeki's picture

3 1 10

Takuya Umeki

consome2

otoearth

·

AI & ML interests

None yet

Recent Activity

reacted to their post with ❤️ about 11 hours ago

We’ve released two conversational speech datasets from oto on Hugging Face 🤗 Both are based on real, casual, full-duplex conversations, but with slightly different focuses. Dataset 1: Processed / curated subset https://huggingface.co/datasets/otoearth/otoSpeech-full-duplex-processed-141h * Full-duplex, spontaneous multi-speaker conversations * Participants filtered for high audio quality * PII removal and audio enhancement applied * Designed for training and benchmarking S2S or dialogue models Dataset 2: Larger raw(er) release https://huggingface.co/datasets/otoearth/otoSpeech-full-duplex-280h * Same collection pipeline, with broader coverage * More diversity in speakers, accents, and conversation styles * Useful for analysis, filtering, or custom preprocessing experiments We intentionally split the release to support different research workflows: clean and ready-to-use vs. more exploratory and research-oriented use. The datasets are currently private, but we’re happy to approve access requests — feel free to request access if you’re interested. If you’re working on speech-to-speech (S2S) models or are curious about full-duplex conversational data, we’d love to discuss and exchange ideas together. Feedback and ideas are very welcome!

posted an update about 11 hours ago

We’ve released two conversational speech datasets from oto on Hugging Face 🤗 Both are based on real, casual, full-duplex conversations, but with slightly different focuses. Dataset 1: Processed / curated subset https://huggingface.co/datasets/otoearth/otoSpeech-full-duplex-processed-141h * Full-duplex, spontaneous multi-speaker conversations * Participants filtered for high audio quality * PII removal and audio enhancement applied * Designed for training and benchmarking S2S or dialogue models Dataset 2: Larger raw(er) release https://huggingface.co/datasets/otoearth/otoSpeech-full-duplex-280h * Same collection pipeline, with broader coverage * More diversity in speakers, accents, and conversation styles * Useful for analysis, filtering, or custom preprocessing experiments We intentionally split the release to support different research workflows: clean and ready-to-use vs. more exploratory and research-oriented use. The datasets are currently private, but we’re happy to approve access requests — feel free to request access if you’re interested. If you’re working on speech-to-speech (S2S) models or are curious about full-duplex conversational data, we’d love to discuss and exchange ideas together. Feedback and ideas are very welcome!

upvoted a collection about 18 hours ago

otoSpeech Full-duplex Dataset

View all activity

Organizations

liked a dataset 3 days ago

otoearth/otoSpeech-full-duplex-processed-141h

Preview • Updated 3 days ago • 35 • 6

liked a dataset 10 days ago

otoearth/otoSpeech-full-duplex-280h

Preview • Updated 3 days ago • 203 • 4

liked 8 models 8 months ago

pyannote/speaker-diarization-3.1

Automatic Speech Recognition • Updated May 10, 2024 • 13.3M • 1.46k

pyannote/voice-activity-detection

Automatic Speech Recognition • Updated May 10, 2024 • 424k • 222

Qwen/Qwen2-Audio-7B-Instruct

Audio-Text-to-Text • 8B • Updated Jan 12, 2025 • 505k • 512

fixie-ai/ultravox-v0_5-llama-3_2-1b

Audio-Text-to-Text • 0.7B • Updated Nov 27, 2025 • 334k • 68

SWivid/F5-TTS

Text-to-Speech • Updated Mar 21, 2025 • 735k • 1.15k

hexgrad/Kokoro-82M

Text-to-Speech • Updated Apr 10, 2025 • 2.03M • • 5.61k

coqui/XTTS-v2

Text-to-Speech • Updated Dec 11, 2023 • 5.54M • 3.34k

nari-labs/Dia-1.6B

Text-to-Speech • Updated Jun 1, 2025 • 72.3k • • 2.83k