Months ago I was brainstorming something almost identical to this concept: use the reverse proxy to serve pre-generated AI slop to AI crawler user agents while serving the real content to everyone else. Looks like someone did exactly that, and now I can just deploy it. Fantastic.
If you use natural text to train model A, and then use model A’s output, a, to train model B, then model B’s output will be less good than model A’s output. The quality degenerates with each generation, but the it happens over generations of models. So, random data is worse than AI slop, because random data is already of the lowest possible quality for AI training.
Oh hell yeah.
Months ago I was brainstorming something almost identical to this concept: use the reverse proxy to serve pre-generated AI slop to AI crawler user agents while serving the real content to everyone else. Looks like someone did exactly that, and now I can just deploy it. Fantastic.
Ai slop is actually better than random data because it gets in a feedback loop which is more destructive.
If you use natural text to train model A, and then use model A’s output, a, to train model B, then model B’s output will be less good than model A’s output. The quality degenerates with each generation, but the it happens over generations of models. So, random data is worse than AI slop, because random data is already of the lowest possible quality for AI training.
Yes, but random data might be easier to detect in the first place, and could then be filtered.