Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,10 +1,58 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
---
|
| 9 |
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<p align="center">
|
| 2 |
+
<img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" width="100" />
|
| 3 |
+
</p>
|
| 4 |
+
|
| 5 |
+
<h1 align="center">π°π· KORMo Research</h1>
|
| 6 |
+
<p align="center">
|
| 7 |
+
κ³ νμ§ νκ΅μ΄ λ°μ΄ν°μ μΈμ΄λͺ¨λΈ μ°κ΅¬λ₯Ό μν μ€νμμ€ νλΈ
|
| 8 |
+
This is the home for <b>KORMo models</b> and <b>high-quality Korean pre-training datasets</b>.
|
| 9 |
+
</p>
|
| 10 |
+
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
## π§ κ³΅κ° λͺ¨λΈ
|
| 14 |
+
|
| 15 |
+
- π§Ή **Tokenizer** β νκ΅μ΄ μ μ© EPK ν ν¬λμ΄μ
|
| 16 |
+
β νκ΅μ΄ νν μ΅μ ν λ° λ€μ΄μ€νΈλ¦Ό μ±λ₯ κ°μ
|
| 17 |
+
|
| 18 |
+
- ποΈ **PT Model (Pretraining)** β 40B+ ν ν° λ°μ΄ν°λ‘ νμ΅λ <b>KORMo-10B</b> μ¬μ νμ΅ λͺ¨λΈ
|
| 19 |
+
β Old-both deduplication + νμ§ νν°λ§ μ μ©:contentReference[oaicite:1]{index=1}
|
| 20 |
+
|
| 21 |
+
- π§ **Mid-train Model** β μ€κ° μ€ν
νμ΅ μ²΄ν¬ν¬μΈνΈ 곡κ°
|
| 22 |
+
β νμ΅ κ³‘μ λ° μ±λ₯ λΆμμ νμ© κ°λ₯
|
| 23 |
+
|
| 24 |
+
- π§ **SFT Model** β instruction λ°μ΄ν°μ
μΌλ‘ λ―ΈμΈμ‘°μ λ λͺ¨λΈ
|
| 25 |
+
β κ³ μ±λ₯ μ§μ λ°λ₯΄κΈ°(following) λͺ¨λΈ
|
| 26 |
+
|
| 27 |
+
> π‘ **λͺ¨λΈμ λͺ¨λ 체ν¬ν¬μΈνΈλ₯Ό νμΈνλ €λ©΄** κ° λͺ¨λΈ νμ΄μ§ μλ¨μ `Revisions` νμ μ°Έκ³ νμΈμ.
|
| 28 |
+
|
| 29 |
---
|
| 30 |
+
|
| 31 |
+
## π κ³΅κ° λ°μ΄ν°μ
|
| 32 |
+
|
| 33 |
+
- π§Ή **KOR-Clean** β Old-both μ€λ³΅ μ κ±° λ° νμ§ νν°λ§λ 40B+ ν ν° νκ΅μ΄ μ½νΌμ€
|
| 34 |
+
β λΆλΒ·μ μ 보·λ
μ± λ°μ΄ν°λ₯Ό μ κ±°νμ¬ νμ΅ νμ§ ν₯μ:contentReference[oaicite:2]{index=2}
|
| 35 |
+
|
| 36 |
+
- π§Ύ **Instruction λ°μ΄ν°μ
** β νμΈνλμ© λͺ
λ Ήμ΄ κΈ°λ° λ°μ΄ν°μ
|
| 37 |
+
β μ€μΈκ³ μμ
κ³Ό μ μ¬ν λνΒ·μ§λ¬Έμλ΅ λ°μ΄ν° ꡬμ±:contentReference[oaicite:3]{index=3}
|
| 38 |
+
|
| 39 |
+
- π§ **Synthetic λ°μ΄ν°μ
** β λκ·λͺ¨ μμ± λ°μ΄ν° κΈ°λ° νμ΅ μμ
|
| 40 |
+
β μμ μ μΈ μ±λ₯ ν₯μ λ° λ€μμ± ν보:contentReference[oaicite:4]{index=4}
|
| 41 |
+
|
| 42 |
---
|
| 43 |
|
| 44 |
+
## π λ΄μ€ ποΈ
|
| 45 |
+
|
| 46 |
+
- πͺ **νκ΅μ΄ μ΅μ΄ LLM νμ΅ μ½λ λ° λ°μ΄ν° 곡κ°**
|
| 47 |
+
- π <b>KORMo-10B</b> λ¦΄λ¦¬μ¦ π
|
| 48 |
+
|
| 49 |
+
---
|
| 50 |
+
|
| 51 |
+
## π λ§ν¬
|
| 52 |
+
<p align="center">
|
| 53 |
+
<a href="https://github.com/kormo-project"><img src="https://img.shields.io/badge/GitHub-black?logo=github&style=for-the-badge"></a>
|
| 54 |
+
<a href="https://huggingface.co/kormo-project"><img src="https://img.shields.io/badge/HuggingFace-orange?logo=huggingface&logoColor=white&style=for-the-badge"></a>
|
| 55 |
+
<a href="https://kormo.ai"><img src="https://img.shields.io/badge/Website-blue?logo=web&style=for-the-badge"></a>
|
| 56 |
+
</p>
|
| 57 |
+
|
| 58 |
+
---
|