Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
tuandunghcmut
's Collections
Post-training Dataset
RL-Papers
MT-LLM
Visual Chain-of-Thought Reasoning Benchmarks
LLM for Security Benchmarks/Datasets
Visual-CoT/GCoT related
Text Embedding Papers
Quantized versions of LLMs/MLLMs
Multilingual Sentiment Analysis Dataset
LLM Series
LLM/MLLM (20B - 80B, fit on 1-2 A100/H100)
SLM
MLLM (100B - 300B)
Benchmarks for evaluating LLMs/MLLMs
Conversation Dataset
Multilingual Parallel Text Corpus
Multilingual Pretraining Corpus for Southeast Asian Language
Multilingual Pretraining Corpus for Southeast Asian Language
updated
Dec 2, 2025
Upvote
-
aisingapore/SEA-PILE-v2
Viewer
•
Updated
Apr 14, 2025
•
187M
•
1.53k
•
4
aisingapore/SEA-PILE-v1
Viewer
•
Updated
Dec 2, 2025
•
636M
•
2.22k
•
17
airesearch/scb_mt_enth_2020
Updated
Jan 18, 2024
•
159
•
9
aisingapore/WangchanLION-Web
Viewer
•
Updated
Sep 3, 2025
•
19.8M
•
90
•
3
aisingapore/WangchanLION-Curated
Viewer
•
Updated
Sep 3, 2025
•
402k
•
82
•
3
tuandunghcmut/PhoMT-MTet-Mixture
Viewer
•
Updated
Aug 11, 2025
•
7.62M
•
43
•
1
HuggingFaceFW/clean-wikipedia
Viewer
•
Updated
Oct 21, 2025
•
61.2M
•
1.16k
•
23
uonlp/CulturaX
Viewer
•
Updated
Dec 16, 2024
•
7.18B
•
38.9k
•
569
allenai/c4
Viewer
•
Updated
Jan 9, 2024
•
10.4B
•
596k
•
502
Upvote
-
Share collection
View history
Collection guide
Browse collections