Update README.md
Browse files
README.md
CHANGED
|
@@ -26,6 +26,17 @@ model-index:
|
|
| 26 |
|
| 27 |
The core of the model is a Transformer encoder that processes the 8x8 board, considering piece types, locations (via positional embeddings), and whose turn it is (via a turn embedding). It outputs a **256-dimensional embedding vector** for a given position (represented by a FEN string).
|
| 28 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
## Intended Uses & Limitations
|
| 30 |
|
| 31 |
### Intended Use
|
|
|
|
| 26 |
|
| 27 |
The core of the model is a Transformer encoder that processes the 8x8 board, considering piece types, locations (via positional embeddings), and whose turn it is (via a turn embedding). It outputs a **256-dimensional embedding vector** for a given position (represented by a FEN string).
|
| 28 |
|
| 29 |
+
## Model Architecture and Training
|
| 30 |
+
|
| 31 |
+
The model adopts an encoder Transformer architecture with 6 layers each with 8 heads. The model has approximatly 4.5 million total parameters, all of which are trainable.
|
| 32 |
+
|
| 33 |
+
To encourage the model to learn comprehensive representations of chess positions, we employ a multi-task learning strategy combining two self-supervised objectives, mirroring techniques used in large language model pre-training:
|
| 34 |
+
|
| 35 |
+
- Masked Piece Prediction (MPP): Analogous to BERT’s Masked Language Model task, a random subset of pieces on the input board are masked (replaced with a mask token). The model’s objective is to predict the original identity of these masked pieces based on the surrounding context (the remaining pieces and whose turn it is). This task allows the model to understand typical piece configurations, legal placements, and the relationships between pieces. For MPP 10% of the pieces were masked.
|
| 36 |
+
- Moves Difference Prediction: This task involves presenting the model with two distinct board states (a start and an end position) from actual game sequences. The model must predict the number of moves (plies) separating these two positions. This objective encourages the model to learn about piece mobility, game dynamics, and the plausible evolution of a position over time.
|
| 37 |
+
|
| 38 |
+
Training used two distinct datasets, pre-processed into structured formats to facilitate the self-supervised tasks. These datasets are derived from a large corpus of chess games and positions, available from the Lichess database (https://database.lichess.org/) and the computerchess (https://www.computerchess.org.uk/ccrl/) database.
|
| 39 |
+
|
| 40 |
## Intended Uses & Limitations
|
| 41 |
|
| 42 |
### Intended Use
|