your dataset is huge. i'd recommend scaling down at first. try working with a sample (about 1gb or less) to get the hang of the process. this way you won't be burning through GPU time on things that might not work right away. once you've got that working, you can always scale it back up.
im going to assume you'll have knowledge to a degree on working with PDF data: parsing -> cleaning -> structuring
next, decide on if you want to do fine tuning or a rag system. there are some pretty decent layman articles on medium that should give you some idea.
look into different GPUaaS like runpod, vast.ai who -iirc- provide hourly rates.
and last but not least: take it easy and enjoy what you're doing.