NYT looks like it’s updated it’s robots.txt file to disallow the Open AI bot from scraping it’s data. Pretty interested to see if they just update their user agent string or if they’ll respect it

  • AutoTL;DR@lemmings.worldB
    link
    fedilink
    English
    arrow-up
    5
    ·
    10 months ago

    This is the best summary I could come up with:


    Based on the Internet Archive’s Wayback Machine, it appears NYT blocked the crawler as early as August 17th.

    The change comes after the NYT updated its terms of service at the beginning of this month to prohibit the use of its content to train AI models.

    OpenAI didn’t immediately reply to a request for comment.

    The NYT is also considering legal action against OpenAI for intellectual property rights violations, NPR reported last week.

    If it did sue, the Times would be joining others like Sarah Silverman and two other authors who sued the company in July over its use of Books3, a dataset used to train ChatGPT that may have thousands of copyrighted works, as well as Matthew Butterick, a programmer and lawyer who alleges the company’s data scraping practices amount to software piracy.

    Update August 21st, 7:55PM ET: The New York Times declined to comment.


    The original article contains 202 words, the summary contains 146 words. Saved 28%. I’m a bot and I’m open source!