Maven Imported 1.12 Million Fediverse Posts

hedge@beehaw.org · 15 days ago

Maven Imported 1.12 Million Fediverse Posts

Freeman@lemmings.world · 15 days ago

They pulled DMs of two users of the same instance?! Quite concerning tbh

Skull giver@popplesburger.hilciferous.nl · edit-2 15 days ago

ActivityPub doesn’t do DMs per se. Many ActivityPub implementations will use AP messages that are not posted on any public list or timeline. Basically, a Tweet with visibility set to “only people mentioned in this thread”.

This design makes it quite easy for AP servers to misimplement DMs. Asking a server for all messages of a particular user (to get their timeline) and forgetting to filter out messages not published globally is trivial to get wrong.

ActivityPub DMs are, in my opinion, not a good feature. This has come up before in Mastodon, where DMs mentioning a third account will add that account to the thread and destination of all future messages (and possibly authorise it for accessing past messages); one mention will give them full access to your “direct” messages.

I doubt this scraper did anything wrong here, I think it’s just a matter of a buggy server or users sending DMs that aren’t really DMs because of Fediverse software with GUI design flaws.

Edit: looks like it’s probably a Mastodon bug: https://hackers.town/@thegibson/112604700601089641

jherazob@beehaw.org · 15 days ago

I recall somebody’s working on actual, E2EE Mastodon DMs, but couldn’t give you details, i guess when it’s ready we’ll know when people start using it

Peter1986C@lemmings.world · edit-2 15 days ago

That would be Sup: https://github.com/theSupApp

By the same person who started Pixelfed.

jherazob@beehaw.org · 15 days ago

How the hell does he do so much? 😄

4am@lemm.ee · 15 days ago

Seems if the messages are sent in an inherently insecure fashion, all one would need to do is set up an instance that purposefully does not filter out all the things it’s supposed to be kind/competent enough to filter out, and boom it has everything.

Skull giver@popplesburger.hilciferous.nl · 15 days ago

Yes, just like on twitter, reddit, and most of the other platforms the Fediverse is trying to replace, server admins are free to read your messages. There’s no encryption. The Fediverse just adds more server admins to the mix.

I would not recommend using the DM function on most Fediverse platforms for things you’d like to keep private. While in most cases there are no privacy risks, there are also very few guardrails to ensure that.

You’re better off using a federated platform with encryption support like Matrix or XMPP. Neither of those are very safe if you don’t verify the other’s keys (although neither is any other chat service, even Signal) but both are much safer.

If it weren’t for the lack of shared credentials, I would’ve expected someone to add a minimal secure chat client to the Lemmy frontend already. Especially on the servers that host a Matrix server already

kevincox@lemmy.ml · 15 days ago

It’s not “inherently insecure” at least not to that degree. (Once could argue that lack of E2EE is insecure.) If you stand up an unrelated instance you shouldn’t be able to access private messages that don’t relate to an account on your instance. So only bugs in your instance, or your conversation partner’s instance, will be able to leak those messages.

IllNess@infosec.pub · 15 days ago

If we hit these AI companies with targeted suing, like how Scientology got their way with the IRS, maybe we then they can listen to not steal our shit.

The MPAA and RIAA have created all these laws and used our own government againat us. Maybe we can use these same laws and do the same.

sfera@beehaw.org · 15 days ago

I was confused for a minute, not understanding what (Apache) Maven has to do with social networks.

Pekka@feddit.nl · 14 days ago

Maybe we have some bias on this topic, but I had the same thought. Maven is such a well known tool in IT, that I’m surprised they just created a social network with the same name. Until they get a bit famous this won’t be good for SEO.

darkphotonstudio@beehaw.org · 14 days ago

I wouldn’t have a problem with all this scraping, if these companies had to release their models trained on this data as open source.

esaru@beehaw.org · 14 days ago

That’s a great idea. Can we not apply a license to that social content that forces AI models trained on it to be open source?