About the synthetic data
#2
by
Spico
- opened
Hi there, thanks for open-sourcing such great code embedding models. From the technical report, I find these models are trained on synthetic data generated by GPT-4o. Do you have any insights on data ablations? How well does synthetic data perform to improve the metric scores?
Thanks a lot~