Cryptography nerd

  • 1 Post
  • 1.7K Comments
Joined 11 months ago
cake
Cake day: August 16th, 2023

help-circle
  • Yes, but should big companies with business models designed to be exploitative be allowed to act hypocritically?

    My problem isn’t with ML as such, or with learning over such large sets of works, etc, but these companies are designing their services specifically to push the people who’s works they rely on out of work.

    The irony of overfitting is that both having numerous copies of common works is a problem AND removing the duplicates would be a problem. They need an understanding of what’s representative for language, etc, but the training algorithms can’t learn that on their own and it’s not feasible go have humans teach it that and also the training algorithm can’t effectively detect duplicates and “tune down” their influence to stop replicating them exactly. Also, trying to do that latter thing algorithmically will ALSO break things as it would break its understanding of stuff like standard legalese and boilerplate language, etc.

    The current generation of generative ML doesn’t do what it says on the box, AND the companies running them deserve to get screwed over.

    And yes I understand the risk of screwing up fair use, which is why my suggestion is not to hinder learning, but to require the companies to track copyright status of samples and inform ends users of licensing status when the system detects a sample is substantially replicated in the output. This will not hurt anybody training on public domain or fairly licensed works, nor hurt anybody who tracks authorship when crawling for samples, and will also not hurt anybody who has designed their ML system to be sufficiently transformative that it never replicates copyrighted samples. It just hurts exploitative companies.








  • My Xperia 1 III used to be quite disappointing at times (was too focused on RAW output for editing, even stacked HDR shot RAWs) but the 1 V is legit good and I can tell the new sensor stacking improved light capture (less noise in low light) and auto mode is much better, while I still see limitations both in auto and manual it’s not so bad. The most annoying parts have to do with focus and color balance when zooming in certain light conditions, and contrast in complex scenes in auto mode.










  • Wine/Proton on Linux occasionally beats Windows on the same hardware in gaming, because there’s inefficiencies in the original environment which isn’t getting replicated unnecessarily.

    It’s not quite the same with CPU instruction translation, but the main efficiency gain from ARM is being designed to idle everything it can idle while this hasn’t been a design goal of x86 for ages. A substantial factor to efficiency is figuring out what you don’t have to do, and ARM is better suited for that.



  • The problem here is that we don’t have real AI.

    We have fancier generative machine learning, and despite the claims it does not in fact generalize that well from most inputs and a lot of recurring samples end up actually embedded in the model and can thus be replicated (there’s papers on this such as sample recovery attacks and more).

    They heavily embedd genre tropes and replicate existing bias and patterns much too strongly to truly claim nothing is being copied, the copying is more of a remix situation than accidental recreation.

    Elements of the originals is there, and many features can often be attributed to the original authors (especially since the models often learn to mimic the style of individual authors, which means it embedds information about features of copyrighted works from individual authors and how to replicate them)

    While it’s not a 1:1 replication in most instances, it frequently gets close enough that a human doing it would be sued.

    This photographer lost in court for recreating the features of another work too closely

    https://www.copyrightuser.org/educate/episode-1-case-file-1/