The OP is a popular shitposting account, doing what we in the biz refer to as “a bit.”
The OP is a popular shitposting account, doing what we in the biz refer to as “a bit.”
POV: you’re standing at the bottom of a slide at a playground in Boston
So the standard approach to this is so-called “perceptual hashing.” Effectively, using cryptographic hashes (sha256, etc.) doesn’t really work well in this case. Given a piece of illegal content, that content is likely to still be just as illegal with a single pixel changed – however, it’ll have a completely different cryptographic hash. So instead, a hash function that determines how “similar-looking” two images are, ignoring things like dimensions, color palette, JPEG compression artifacts, etc. This is obviously way fuzzier, and is prone to both false positives and negatives.
Because all this is inherently kinda fuzzy, the exact database of hashes is usually “secret sauce” if you will. If it were public, it would be super easy to circumvent. As an example, given an illegal image:
As a result even “public” databases are distributed with NDAs etc. This obviously does not jive well with an open source, federated network like Mastodon, and I have my doubts as to how willing the relevant agencies would be to give their databases to every rando with $5 to spin up a Pleroma instance on a VPS. A public DB might help in some cases, but unfortunately more illegal content is produced every day, and so it would be extremely hard to keep up with the bad actors.
In my opinion the biggest issue the author points out is that cached materials are sometimes retained even after moderator action. Which honestly just sounds like a straight up bug more than anything. Though if I were running an instance, the feds showing up at my door with a warrant because I’ve been accidentally distributing CSAM would be my nightmare scenario. And of course jurisdiction plays a part, too: an American user on a Canadian server might see drawn depictions of sexualized minors, think “weird but not illegal,” and now the Canadian admin has content that’s illegal in Canada on their Canadian server and has no idea.
IMO I think the best solution to this is something similar to what Renaud Chaput (Mastodon’s resident infra boffin) described in his recent blog post. Effectively, give admins a way to hand this off to pluggable third-party services. Admins that are worried about this sort of thing can then have some degree of safety via e.g. PhotoDNA, whereas others can take on additional risk and preserve additional privacy.
All that said: yeah the headline makes it sound like .social is some 8chan-esque hellhole, whereas in reality my feed is 99% German programmers sharing milquetoast political takes.
Oooh as a communist… where to even start. Most of this is US/anglo centric…
I truly do have optimism that we can build a better world. Every once in a while, it shines through the cracks: kids partying in the street while cops look on powerless, a little old lady cheering from the window while marchers chant “fuck 12,” even a single trans person finding a community that accepts them wholeheartedly.
But damn do you internet mfs make it hard sometimes.
My fave Lin Manuel-Miranda song is the one where he advocates on behalf of the US government for the privatization of the Puerto Rican electrical grid claiming it will improve reliability but then it actually becomes even less reliable. It’s really catchy 😊
Yup was just typing a comment to basically this effect. Federation adds a ton of overhead – you can still do things fairly efficiently, but every interaction having to fan out to (and fan in from!) many servers instead of like a single RDBMS is gonna cost you.
In all likelihood the code is not as efficient as it could be, but usually you get time to work those out gradually. A giant influx of users quickly turns “TODO: fix in the next six months” into “Oh god the servers are melting fuck fuck.”
That said, assuming the devs can get over this hump, I suspect using a compiled language will pay off long-term. Sure things will still be primarily IO-bound, but making things less CPU-bound is usually a good thing.
For some illustrative examples: Mastodon is in Ruby and hits dumb scaling limitations far more often than other fedi microblogs. Pleroma/Akkoma are Elixir (and BEAM is super well optimized for fast message passing/scaling/IO), Calckey (primarily Typescript) is moving some code to Rust, GoToSocial (Golang) is able to run in a fraction of the resources of Mastodon. The admins of one of the bigger tech instances recently announced they’re basically giving up on administrating Mastodon and are instead going to write a new server from scratch in a compiled language because it’s easier for them than scaling a Rails monolith.
TL;DR everything is IO-bound til it’s not.
one time i was pretty hammered and tried to light a cheeky lil kitchen cig on the stove and burnt my eyebrows/hair a bit. my barber at the time was this kinda alt girl and when i explained why i had all those weird baby hairs she was like “ah yeah makes sense same shit happened to me, had to draw in my brows for months.” anyway this was like 2018 if anyone was wondering.