mjkmain commited on
Commit
90763f4
Β·
verified Β·
1 Parent(s): fd01da5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +64 -11
README.md CHANGED
@@ -7,27 +7,73 @@ sdk: static
7
  pinned: false
8
  ---
9
 
10
- <h1 align="center">KORMo Research</h1>
11
  <p align="center">
12
- ν•œκ΅­μ–΄ 데이터와 μ–Έμ–΄λͺ¨λΈ 연ꡬλ₯Ό μœ„ν•œ μ˜€ν”ˆμ†ŒμŠ€ ν—ˆλΈŒ
13
  </p>
14
 
15
  ---
16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  ## 🧠 곡개 λͺ¨λΈ
18
 
19
- - **Tokenizer** β€” ν•œκ΅­μ–΄ ν‘œν˜„μ— μ΅œμ ν™”λœ EPK ν† ν¬λ‚˜μ΄μ €
20
- - **PT Model** β€” ν•œΒ·μ˜ λŒ€κ·œλͺ¨ λ°μ΄ν„°λ‘œ ν•™μŠ΅λœ <b>KORMo-10B</b> μ‚¬μ „ν•™μŠ΅ λͺ¨λΈ
21
- - **Mid-train Model** β€” Long-context 및 Reasoning λ°μ΄ν„°λ‘œ μΆ”κ°€ ν•™μŠ΅λœ 쀑간 단계 λͺ¨λΈ
22
- - **SFT Model** β€” Instruction λ°μ΄ν„°μ…‹μœΌλ‘œ λ―Έμ„Έ μ‘°μ •λœ κ³ μ„±λŠ₯ λͺ¨λΈ
23
- > πŸ’‘ λͺ¨λΈμ˜ 전체 ν•™μŠ΅ 이λ ₯κ³Ό μ²΄ν¬ν¬μΈνŠΈλŠ” 각 λͺ¨λΈ νŽ˜μ΄μ§€ μƒλ‹¨μ˜ **`Revisions` νƒ­**μ—μ„œ 확인할 수 μžˆμŠ΅λ‹ˆλ‹€.
24
 
 
25
 
26
  ---
27
 
28
  ## πŸ“š 곡개 데이터셋
29
 
30
- - **KOR-Clean** β€” ν’ˆμ§ˆ ν•„ν„°λ§λœ 토큰 ν•œκ΅­μ–΄ μ½”νΌμŠ€
31
  - **Instruction 데이터셋** β€” νŒŒμΈνŠœλ‹μš© 데이터셋
32
  - **Synthetic 데이터셋** β€” λŒ€κ·œλͺ¨ 생성 데이터 기반 ν•™μŠ΅ μžμ›
33
 
@@ -35,8 +81,8 @@ pinned: false
35
 
36
  ## πŸ†• λ‰΄μŠ€ πŸ—žοΈ
37
 
38
- - πŸͺ„ **ν•œκ΅­μ–΄ 졜초 LLM ν•™μŠ΅ μ½”λ“œ 및 데이터 곡개**
39
- - πŸš€ <b>KORMo-10B</b> 릴리즈 πŸŽ‰
40
 
41
  ---
42
 
@@ -45,4 +91,11 @@ pinned: false
45
  <a href="https://github.com/kormo-project"><img src="https://img.shields.io/badge/GitHub-black?logo=github&style=for-the-badge"></a>
46
  </p>
47
 
48
- ---
 
 
 
 
 
 
 
 
7
  pinned: false
8
  ---
9
 
10
+ <h1 align="center">KORMo: Korean Open Reasoning Model for Everyone</h1>
11
  <p align="center">
12
+ An open-source hub for Korean language data and model research
13
  </p>
14
 
15
  ---
16
 
17
+ <details open>
18
+ <summary><b>🌐 English (default)</b></summary>
19
+
20
+ ## 🧠 Open Models
21
+
22
+ - **KORMo-Team/KORMo-tokenizer** β€” A tokenizer optimized for bilingual (Korean–English) language representation
23
+ - **KORMo-Team/KORMo-10B-base** β€” The <b>KORMo-10B</b> pretrained model trained on large-scale Korean and English corpora
24
+ - **KORMo-Team/KORMo-10B-sft** β€” A fine-tuned model enhanced with long-context reasoning and instruction-following data
25
+
26
+ > πŸ’‘ You can explore the full training history and checkpoints in each model’s **`Revisions` tab** on Hugging Face.
27
+
28
+ ---
29
+
30
+ ## πŸ“š Open Datasets
31
+
32
+ - **KOR-Clean** β€” A high-quality filtered Korean text corpus
33
+ - **Instruction Dataset** β€” Supervised fine-tuning data for downstream tasks
34
+ - **Synthetic Dataset** β€” Large-scale synthetic data resources generated for model training
35
+
36
+ ---
37
+
38
+ ## πŸ†• News πŸ—žοΈ
39
+
40
+ - πŸͺ„ **The first fully open-source Korean LLM**
41
+ - πŸš€ <b>KORMo-10B</b> released on **October 13, 2025** πŸŽ‰
42
+
43
+ ---
44
+
45
+ ## 🌐 Links
46
+ <p align="center">
47
+ <a href="https://github.com/kormo-project"><img src="https://img.shields.io/badge/GitHub-black?logo=github&style=for-the-badge"></a>
48
+ </p>
49
+
50
+ ---
51
+
52
+ ### πŸ“– About KORMo
53
+
54
+ KORMo is an open research initiative dedicated to advancing Korean language understanding and generation through large-scale, fully open-source models and datasets.
55
+ We aim to make Korean NLP research transparent, reproducible, and accessible to the global community.
56
+
57
+ </details>
58
+
59
+ ---
60
+
61
+ <details>
62
+ <summary><b>πŸ‡°πŸ‡· ν•œκ΅­μ–΄</b></summary>
63
+
64
  ## 🧠 곡개 λͺ¨λΈ
65
 
66
+ - **KORMo-Team/KORMo-tokenizer** β€” ν•œκ΅­μ–΄/μ˜μ–΄ 이쀑 μ–Έμ–΄ ν‘œν˜„μ— μ΅œμ ν™”λœ ν† ν¬λ‚˜μ΄μ €
67
+ - **KORMo-Team/KORMo-10B-base** β€” ν•œΒ·μ˜ λŒ€κ·œλͺ¨ λ°μ΄ν„°λ‘œ ν•™μŠ΅λœ <b>KORMo-10B</b> μ‚¬μ „ν•™μŠ΅ λͺ¨λΈ
68
+ - **KORMo-Team/KORMo-10B-sft** β€” Long-context ν™•μž₯ 및 reasoning, instruction-following 데이터λ₯Ό 톡해 λ―Έμ„Έ μ‘°μ •λœ λͺ¨λΈ
 
 
69
 
70
+ > πŸ’‘ λͺ¨λΈμ˜ 전체 ν•™μŠ΅ 이λ ₯κ³Ό μ²΄ν¬ν¬μΈνŠΈλŠ” 각 λͺ¨λΈ νŽ˜μ΄μ§€ μƒλ‹¨μ˜ **`Revisions` νƒ­**μ—μ„œ 확인할 수 μžˆμŠ΅λ‹ˆλ‹€.
71
 
72
  ---
73
 
74
  ## πŸ“š 곡개 데이터셋
75
 
76
+ - **KOR-Clean** β€” ν’ˆμ§ˆ ν•„ν„°λ§λœ ν•œκ΅­μ–΄ μ½”νΌμŠ€
77
  - **Instruction 데이터셋** β€” νŒŒμΈνŠœλ‹μš© 데이터셋
78
  - **Synthetic 데이터셋** β€” λŒ€κ·œλͺ¨ 생성 데이터 기반 ν•™μŠ΅ μžμ›
79
 
 
81
 
82
  ## πŸ†• λ‰΄μŠ€ πŸ—žοΈ
83
 
84
+ - πŸͺ„ **ν•œκ΅­μ–΄ 졜초 fully open-source LLM**
85
+ - πŸš€ 2025.10.13 <b>KORMo-10B</b> 릴리즈 πŸŽ‰
86
 
87
  ---
88
 
 
91
  <a href="https://github.com/kormo-project"><img src="https://img.shields.io/badge/GitHub-black?logo=github&style=for-the-badge"></a>
92
  </p>
93
 
94
+ ---
95
+
96
+ ### πŸ“– KORMo μ†Œκ°œ
97
+
98
+ KORMoλŠ” ν•œκ΅­μ–΄ 이해와 생성을 μœ„ν•œ λŒ€κ·œλͺ¨ μ˜€ν”ˆμ†ŒμŠ€ μ–Έμ–΄λͺ¨λΈ 연ꡬ ν”„λ‘œμ νŠΈμž…λ‹ˆλ‹€.
99
+ λˆ„κ΅¬λ‚˜ μ ‘κ·Ό κ°€λŠ₯ν•œ 곡개 λͺ¨λΈκ³Ό 데이터셋을 톡해 ν•œκ΅­μ–΄ NLP μ—°κ΅¬μ˜ 투λͺ…μ„±κ³Ό μž¬ν˜„μ„±μ„ λ†’μ΄λŠ” 것을 λͺ©ν‘œλ‘œ ν•©λ‹ˆλ‹€.
100
+
101
+ </details>