About the synthetic data

#2
by Spico - opened

Hi there, thanks for open-sourcing such great code embedding models. From the technical report, I find these models are trained on synthetic data generated by GPT-4o. Do you have any insights on data ablations? How well does synthetic data perform to improve the metric scores?

Thanks a lot~

Sign up or log in to comment