• Mister Bean@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    4
    ·
    5 hours ago

    Couldn’t you just treat the socketed ram like another layer of memory effectively meaning that L1-3 are on the CPU “L4” would be soldered RAM and then L5 would be extra socketed RAM? Alternatively couldn’t you just treat it like really fast swap?

    • enumerator4829@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      3
      ·
      4 hours ago

      Wrote a longer reply to someone else, but briefly, yes, you are correct. Kinda.

      Caches won’t help with bandwidth-bound compute (read: ”AI”) it the streamed dataset is significantly larger than the cache. A cache will only speed up repeated access to a limited set of data.

    • Balder@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      3 hours ago

      Could it work?

      Yes, but it would require:

      • A redesigned memory controller capable of tiering RAM (which would be more complex).
      • OS-level support for dynamically assigning memory usage based on speed (Operating systems and applications assume all RAM operates at the same speed).
      • Applications/libraries optimized to take advantage of this tiering.

      Right now, the easiest solution for fast, high-bandwidth RAM is just to solder all of it.

    • barsoap@lemm.ee
      link
      fedilink
      English
      arrow-up
      2
      ·
      4 hours ago

      Using it as cache would reduce total capacity as cache implies coherence, and treating it as ordinary swap would mean copying to main memory before you access it which is silly when you can access it directly. That is you’d want to write a couple of lines of kernel code to use it effectively but it’s nowhere close to rocket science. Nowhere near as complicated as making proper use of NUMA architectures.