• SootyChimney [any]@hexbear.net
    link
    fedilink
    English
    arrow-up
    29
    arrow-down
    1
    ·
    11 months ago

    They used the “Torrance Tests of Creative Thinking”, a pseudo-scientific test that measures and evaluates absolutely nothing of any objective measure or value.

    • gravitas_deficiency@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      11
      ·
      11 months ago

      Hah, yeah, that was my kneejerk reaction too: I read that as “the metric we use to determine creativity was found to be wildly inaccurate, with ML regularly placing in the 99th percentile”.

  • Veritas@lemmy.mlOP
    link
    fedilink
    arrow-up
    29
    arrow-down
    1
    ·
    edit-2
    11 months ago

    Embarrassing, considering how un-creative and original GPT-4 is. It’s an actual struggle to get ChatGPT to think outside of the box. Claude 2 on the other hand is much better at it.

    But this goes to show how unimaginative the general population is if this truly is the case.

    • SpikesOtherDog@ani.social
      link
      fedilink
      arrow-up
      10
      ·
      11 months ago

      I have been playing with chat gpt for tabletop character creation. It’s not bad at coming up with new ideas. It is terrible at sticking to the rules of the game.

      • Veritas@lemmy.mlOP
        link
        fedilink
        arrow-up
        5
        arrow-down
        2
        ·
        11 months ago

        The context window is still too short for any story. They just forget about old messages and only remember the newest context.

        • SpikesOtherDog@ani.social
          link
          fedilink
          arrow-up
          3
          ·
          11 months ago

          That makes sense. The further back information would go, the harder it was to recall it. The answer wasn’t to think harder, but to fill in the gaps.

  • BabaIsPissed [he/him]@hexbear.net
    link
    fedilink
    English
    arrow-up
    6
    ·
    11 months ago

    evaluating LLM

    ask the researcher if they are testing form or meaning

    they don’t understand

    pull out illustrated diagram explaining what is form and what is meaning

    they laugh and say “the model is demonstrating creativity sir”

    looks at the test

    it’s form