The ultimate guide to RL environments: building and scaling them in the LLM era
📝
111
Building and scaling RL environments for LLM training
What kind of annotation tool did you use, if it's open could you please link it? I am hoping to fine-tune to use-cases without data, so I would need to first generate it and then add/fix some of the reasoning steps