I have deeply mixed feelings about #ActivityPub's adoption of JSON-LD, as someone who's spent way too long dealing with it while building #Fedify.
-
@evan @cwebber @kopper @hongminhee Couldn’t we agree to standardize on expanded json-ld? We would not need any json-ld processor, we would not need to fetch or cache any context. There would be no way to shadow properties.
@gugurumbe @hongminhee @evan @cwebber
from my brief tests, compacting with no context (which is basically expanded json-ld, with very minor differences) compresses better, but standardizing on expanded ld would still be better than the status quo. yes backwards compatibility would be broken, but pretty much any other solution to this problem beyond not solving it would end up breaking it anyway
i'm still unsure about certain aspects of json-ld such as everything having the capability for multiple values, but without any context defined it's at least explicit and implementations can take that into account where it's actually helpful (sec:publicKeycomes to mind) and ignore it where it isn't
(edit: ignore the last part, i just re-checked and compact-with-no-context collapses arrays with single values, expanded would be clearer here)
RE: not-brain.d.on-t.work/notes/aihftmbjpxdyb9k7 -
@evan @cwebber @kopper @hongminhee Couldn’t we agree to standardize on expanded json-ld? We would not need any json-ld processor, we would not need to fetch or cache any context. There would be no way to shadow properties.
@gugurumbe @cwebber @kopper @hongminhee AS2 requires compacted JSON-LD.
-
@gugurumbe @cwebber @kopper @hongminhee AS2 requires compacted JSON-LD.
There is no data format we can choose to eliminate programmer errors in online protocols. That's a quixotic aim.
-
There is no data format we can choose to eliminate programmer errors in online protocols. That's a quixotic aim.
-
@cwebber @kopper @hongminhee It would be a huge backwards-incompatible change for almost zero benefit. People would still make mistakes in their ActivityPub implementations (sorry, Minhee, but that's life on an open network). We'd need to adopt another mechanism for defining extensions, and guess what? People are going to make mistakes with that, too.
@evan @cwebber @kopper @hongminhee maybe a compromise approach could be to specify a simpler “json-ld as it is used in practice”, similar to what HTML5 was, that remains backward compatible while simplifying the spec to the point that it is actually feasible to implement
-
@gugurumbe @kopper I don't think that's the model of ActivityPub. It's made to allow reading remote objects.
Most implementations pre-load or compile in the external contexts. I agree, it's a big performance hit to load context URLs at runtime.
-
@gugurumbe @kopper I don't think that's the model of ActivityPub. It's made to allow reading remote objects.
Most implementations pre-load or compile in the external contexts. I agree, it's a big performance hit to load context URLs at runtime.
@evan @gugurumbe it's infeasible to preload all contexts, pretty much every pleroma instance hosts their own context on their own instance for example. then there is the obvious interop problems of how to handle contexts for new extensions your software is not aware of (though pretending like they're empty might work i guess?) -
I would be strongly opposed to any effort to remove JSON-LD from AS2. We use it for a lot of extensions. Every AP server uses the Security vocabulary for public keys.
@evan @kopper @hongminhee The problem is that signing json-ld is extremely hard, because effectively you have to turn to the RDF graph normalization algorithm, which has extremely expensive compute times. The lack of signatures means that when I boost peoples' posts, it takes down their instance, since effectively *every* distributed post on the network doesn't actually get accepted as-is, users dial-back to check its contents.
Which, at that point, we might as well not distribute the contents at all when we post to inboxes! We could just publish with the object of the activity being the object's id uri
-
@evan @kopper @hongminhee The problem is that signing json-ld is extremely hard, because effectively you have to turn to the RDF graph normalization algorithm, which has extremely expensive compute times. The lack of signatures means that when I boost peoples' posts, it takes down their instance, since effectively *every* distributed post on the network doesn't actually get accepted as-is, users dial-back to check its contents.
Which, at that point, we might as well not distribute the contents at all when we post to inboxes! We could just publish with the object of the activity being the object's id uri
@cwebber @hongminhee @evan admittedly, codeberg.org/fediverse/fep/src/branch/main/fep/8b32/fep-8b32.md does kind of solve this specific problem. the json canonicalization used there is much lighter than rdf canonicalization (which iceshrimp had to implement in dotNetRdf specifically for its ld signature support, so tooling availability is not really an excuse in favor of json-ld either!) -
@cwebber @hongminhee @evan admittedly, codeberg.org/fediverse/fep/src/branch/main/fep/8b32/fep-8b32.md does kind of solve this specific problem. the json canonicalization used there is much lighter than rdf canonicalization (which iceshrimp had to implement in dotNetRdf specifically for its ld signature support, so tooling availability is not really an excuse in favor of json-ld either!)
@kopper @hongminhee @evan Interesting... I guess it means you can't re-compact with a new outer context, but maybe that's fine
-
@evan @kopper @hongminhee The problem is that signing json-ld is extremely hard, because effectively you have to turn to the RDF graph normalization algorithm, which has extremely expensive compute times. The lack of signatures means that when I boost peoples' posts, it takes down their instance, since effectively *every* distributed post on the network doesn't actually get accepted as-is, users dial-back to check its contents.
Which, at that point, we might as well not distribute the contents at all when we post to inboxes! We could just publish with the object of the activity being the object's id uri
@cwebber @kopper @hongminhee I talk about this in my book. Unless the receiving user is online at the time the server receives the Announce, it's ridiculous to fetch the content immediately. Receiving servers should pause a random number of minutes and then fetch the content. It avoids the thundering herd problem.
-
@cwebber @kopper @hongminhee I talk about this in my book. Unless the receiving user is online at the time the server receives the Announce, it's ridiculous to fetch the content immediately. Receiving servers should pause a random number of minutes and then fetch the content. It avoids the thundering herd problem.
@evan @cwebber @kopper @hongminhee I think that is a better algorithm than a brain dead exponential back off. Perhaps put the two together.
-
@evan @gugurumbe it's infeasible to preload all contexts, pretty much every pleroma instance hosts their own context on their own instance for example. then there is the obvious interop problems of how to handle contexts for new extensions your software is not aware of (though pretending like they're empty might work i guess?)
@kopper It does not; if a malicious context redefines the security properties then the JSON-LD processor will understand the data differently than the unaware processor.
-
@evan @cwebber @kopper @hongminhee I think that is a better algorithm than a brain dead exponential back off. Perhaps put the two together.
@patmikemid I call it trust, then verify. Usually caching the data with a ttl of a short number of minutes is enough.
-
@evan @gugurumbe it's infeasible to preload all contexts, pretty much every pleroma instance hosts their own context on their own instance for example. then there is the obvious interop problems of how to handle contexts for new extensions your software is not aware of (though pretending like they're empty might work i guess?)
-
@evan @gugurumbe i know what caching is, thanks. in fact, my current project is building one that's tailor made for solving the activitypub thundering herd problem (codeberg.org/KittyShopper/middleap)
i've been trying to keep civil through this thread largely because i started the conversation mentioning software i (temporarily) help maintain and therefore represent it even implicitly, but leaving that aside and letting my own personal thoughts enter the picture:
i think this passive aggressive reply is the last straw. thinking that i somehow know enough to write code for this protocol without knowing what a cache is? plugging your book in a network largely developed by poor minorities (i myself have the rough equivalent of less than 40 USD in my bank account total)? this inability to consider change? ("as2 requires compaction", because you're the one defining the spec saying it does), the inability to consider the people and software producing and building upon the data, as opposed to the data itself? the inability to consider the consequences of your specifications and how they're being used in the real world?
i honestly do not know if this line of thought is truly capable of leading this protocol out the slump it's currently in. if you're insistent on shooting yourself in the foot, so be it, but please take the time to consider how this behavior affects other people.
i've largely been burnt out of interacting in socialhub and other official protocol communities due to exactly this behavior, whether from you or others with influence on the final specs, and the only reason i keep trying is because of what's probably a self-destructive autistic hyperfixation on this niche network and trying to make it actually work for me and my friends, as opposed to receiving funding from the well-known genocide enablers at meta and trying to shove failing standards where they don't belong.
please be a better example. if the protocol was actually desirable then sure, you may have earnt it, after all, atproto is teeming with silicon valley e/acc death cult weirdos and yet people seem to prefer it. have you wondered why? or do you prefer to dismiss anything not coming from you without thinking about it -
@cwebber @kopper @hongminhee I talk about this in my book. Unless the receiving user is online at the time the server receives the Announce, it's ridiculous to fetch the content immediately. Receiving servers should pause a random number of minutes and then fetch the content. It avoids the thundering herd problem.
@evan@cosocial.ca @cwebber@social.coop @kopper@not-brain.d.on-t.work @hongminhee@hollo.social shared inboxes are a thing, Evan
The protocol isn't just how it was initially defined. Protocols evolve and change from their ideals to fit the needs of their operation, and getting rid of individual inboxes is one of those changes.
Social media platforms are real-time- you can't just defer stuff like that. -
@evan@cosocial.ca @cwebber@social.coop @kopper@not-brain.d.on-t.work @hongminhee@hollo.social shared inboxes are a thing, Evan
The protocol isn't just how it was initially defined. Protocols evolve and change from their ideals to fit the needs of their operation, and getting rid of individual inboxes is one of those changes.
Social media platforms are real-time- you can't just defer stuff like that.@julia @cwebber @hongminhee @kopper
Hi!

Nice to meet you. I'm well aware of `sharedInbox` and helped design it.Realtime is an illusion. You can make it pretty convincing.
Your users are mostly not online. Remote users are mostly not online. Tracking the last time remote and local users were seen can help you prioritize local and remote delivery.
It's a lot better to deliver to the tiny percent of users currently online first rather than delivering to the user named `aaaaaaaaamng` first.
-
@julia @cwebber @hongminhee @kopper
Hi!

Nice to meet you. I'm well aware of `sharedInbox` and helped design it.Realtime is an illusion. You can make it pretty convincing.
Your users are mostly not online. Remote users are mostly not online. Tracking the last time remote and local users were seen can help you prioritize local and remote delivery.
It's a lot better to deliver to the tiny percent of users currently online first rather than delivering to the user named `aaaaaaaaamng` first.
@evan@cosocial.ca @cwebber@social.coop @hongminhee@hollo.social @kopper@not-brain.d.on-t.work I feel like deferring activity resolution and publishing based on online status would only serve to create more reasons for your average person to feel that the fediverse is unstable- explaining the logistics of the herd problem to someone who doesn't know what a distributed system is is kinda difficult.
-
@evan@cosocial.ca @kopper@not-brain.d.on-t.work @gugurumbe@mastouille.fr Evan, to put it bluntly, the status quo only creates a further divide between the "big certified implementations" and the small implementation independent developers can make without worrying about making the implementation vulnerable-by-default
There could totally be a body that at least attempts to standardize well-known LD prefixes while retaining compatibility with JSON-LD (like IANA and protocol schemes), but there isn't
There could be a subset of JSON-LD that prohibits common pitfalls, but there is not one. In fact, there are very few high-quality openly available libraries that can process ActivityPub objects. There is no way to declare the actual shapes of objects without heavy fuzzing.
There is no safe amount of JSON-LD in a distributed network where context URIs may fade in and out of existence. I'm saying distributed because that's essentially what happens in practice over sufficient time.
Can we for the love of all that's serializable shrink the state space of this mess? It's possible and it's actionable, without anyone left out. We don't need a Rube-Goldberg machine to share a JPEG online