iocaine - The deadliest poison known to AI

moonleay@feddit.org · 1 month ago

iocaine - The deadliest poison known to AI

Hackworth@lemmy.world · edit-2 1 month ago

I’d be surprised if anything crawled from a site using iocaine actually made it into an LLM training set. GPT 3’s initial set of 45 terabytes was reduced to 570 GB, which it was actually trained on. So yeah, there’s a lot of filtering/processing that takes place between crawl and train. Then again, they seem to have failed entirely to clean the reddit data they fed into Gemini, so /shrug