FEP-b2b8: Long-form Text

aschrijver@socialhub.activitypub.rocks

Hello!

This is a discussion thread for the proposed FEP-b2b8: Long-form Text.Please use this thread to discuss the proposed FEP and any potential problemsor improvements that can be addressed.

SummaryMulti-paragraph text is an important content type on the Social Web. This FEP defines best practices for representing and using properties of a long-form text object in Activity Streams 2.0.

cc @eprodrom

aschrijver@socialhub.activitypub.rocks

I am happy to mention that @thebaer of #software:writefreely has a pull request available that, when merged, will make Writefreely eligible to be mentioned in the "Implementations" section (see FEP document sections for more info). The announcement was made in this toot just now.

julian@community.nodebb.org

I will have to review the FEP for any recent changes, but NodeBB is also compatible with this FEP.

sortekanin@socialhub.activitypub.rocks

Personally I find the distinction between the Note, Document, Article and Page types in the Activity Vocabulary entirely arbitrary and they ought to all just be the same type.

I would rather suggest that implementations should consider all of these types to be completely equivalent to each other. If an implementation wants to differentiate how they present short-form and long-form text, then simply check the length of the content and act accordingly - don't rely on the arbitrary type field to tell you whether something is "long" or "short", whatever that means.

The actual length of the content should be the source of truth about whether something is long-form or short-form (according to whatever definition of short and long you want to use). The type field is not the source of truth of this information.

There is nothing preventing an implementation from sending a Note with 10 paragraphs or even 1000 paragraphs, so any implementation that hopes to handle such a thing would need to include checks for the length anyway - so again, it is much simpler and easier to just consider all these arbitrary types equivalent and let the actual length of the content decide how to present it.

aschrijver@socialhub.activitypub.rocks

SorteKanin:

Personally I find the distinction between the Note, Document, Article and Page types in the Activity Vocabulary entirely arbitrary and they ought to all just be the same type.

They are only arbitrary when we don't assign distinctive semantic meaning to them. Here are the meanings as described in ActivityStreams. I looked up at schema.org for equivalents and put this in for comparison..

| ActivityStreams | schema.org || :--- | :--- || Article: Represents any kind of multi-paragraph written work. | Article: An article, such as a news article or piece of investigative report. Newspapers and magazines have articles of many different types and this is intended to cover them all.| Document: Represents a document of any kind. | DigitalDocument: An electronic file or document.| Note: Represents a short written work typically less than a single paragraph in length. | Statement: A statement about something, for example a fun or interesting fact.| Page: Represents a Web Page. | WebPage: A web page.

My observation is that the type definition and intended purpose should be further clarified. In the comparison to schema.org you see that anArticle is usually some piece of text that is published to an audience. Whereas a Note is a brief statement, a notification status (some apps use the term 'statuses'). Page is confusing until you know it means web page.

These types convey semantically meaningful information. A Document isn't necessarily also an Article that can be published. A Note is what a library may present you on their Page to tell you a Document is unavailable. Etcetera.

SorteKanin:

There is nothing preventing an implementation from sending a Note with 10 paragraphs or even 1000 paragraphs

This is still no reason to throw the extra semantic context overboard. "Different type, different business logic" - should there be a need for that wrt increasing interoperability - is much easier to deal with than implicit rules. We see already that the urge of many developers who only deal in Notes, is to add extra properties to them to indicate special behaviors, instead of typing them properly as linked data intents.

sortekanin@socialhub.activitypub.rocks

aschrijver:

They are only arbitrary when we don’t assign distinctive semantic meaning to them.

But is there any meaningful semantic difference?

ActivityStreams says Article is just a multi-paragraph written work. Schema.org says it is specifically for news articles, but that's clearly not what this FEP is suggesting (and doesn't make sense with ActivityStreams' definition).
Document is literally a tautology and is completely meaningless (the definition may as well have been "A document is a document").
A Note's only distinguishing characteristic seems to be that it is short (schema.org's Statement is not at all how Notes are currently used on the fediverse and is clearly not what this FEP is suggesting).
Page is currently used by Lemmy for all posts in communities. Page also inherits from Document, which is sort of confusing (aren't pages usually part of a document, not the other way around?). And what is a web page other than HTML? But all of these things are essentially just HTML.

My point is that these things are so tenuously defined that it becomes vacuous. They all just boil down to HTML, or less technically, what most people associate with any general "post" on social media (at least those that aren't restricted to short-form content).

In addition, these definitions aren't fitting how these types are used on the fediverse at all. For instance, comments on Lemmy are currently Notes but have no length restriction.

EDIT: Even this post itself is posted on ActivityPub as a Note , despite having many paragraphs.

The only actual meaningful distinction between these types seem to be their length, with an arbitrary distinction between single-paragraph and multi-paragraph. But we don't need a standard to tell each implementation where to put the border between "short-form" and "long-form." Each implementation, or even each client, can easily choose by itself what they consider to be "short-form" and "long-form" by simply checking the length themselves.

julian@community.nodebb.org

No, there's not much meaningful semantic difference even in the wild. Granted, use of non-Note types is still rather limited currently, but we can draw some expectations (which come with a hefty dose of exceptions):

A Note is shorter than an Article (unless it is not), and vice versa (unless it is not)
An Article contains inline images (unless there aren't any)
Notes tend to contain attachments (unless there aren't any)

... I could go on, but everything I'd say would come with "(unless..)" alongside it.

I think what evan@cosocial.ca is attempting to do with the FEP is assign some suggestions as to how to classify content, and suggesting that there could be display differences on the implementor side for each individual type (unless there aren't any differences ha ha ha)

> Personally I find the distinction between the Note, Document, Article and Page types in the Activity Vocabulary entirely arbitrary and they ought to all just be the same type.

The problem here is that Note is now loaded with expectations so as to become highly-specific. You can't use inline images, you must cap attachments at 4, you may have to re-order attachments, etc.

evan@cosocial.ca

@julian those four types are very different.

sortekanin@socialhub.activitypub.rocks

julian:

The problem here is that Note is now loaded with expectations so as to become highly-specific. You can't use inline images, you must cap attachments at 4, you may have to re-order attachments, etc.

Says who? I don't see any such requirements in the spec. In Lemmy I can put as many images in a comment as a want. Here on Discourse I don't think there is a limit on any of these things either?

But again, if any implementation wants to handle content differently (like short or long form content, or content with lots of images, or whatever), then that's that implementation's imperative and you can't use these types to enforce anything anyway.

Handling all manner of arbitrary requirements from different implementations would also be way too complicated. Implementations should rather try to handle as broad a set of content as possible and display it in an appropriate way.

julian@community.nodebb.org

Says Mastodon, implicitly, because those are the restrictions you have to follow if you want your content adequately represented on there.

You can say it doesn't matter what Mastodon says, and you're right, but my users don't care about that, they just want their content displayed on Mastodon properly.

sortekanin@socialhub.activitypub.rocks

I wasn't aware Mastodon had such arbitrary requirements. I would say you should not handle this by restricting your users or your display of content to fit the whims of other implementations. That would only spread these arbitrary requirements.

One way to handle this is to display a warning to your users when they are writing a post. If they create a post that is incompatible with Mastodon or other known incompatible implementation, display a (small, nonintrusive) warning to your users that it may not display correctly or at all in those implementations.

Hopefully this would put pressure on Mastodon to remove these limitations or encourage people to use other implementations with fewer limitations.

aschrijver@socialhub.activitypub.rocks

SorteKanin:

That would only spread these arbitrary requirements.

This. I have advocated a lot for a standards movement that dares to set its own course depending on what's best for the ecosystem as a whole. But it is not what developers were interested or able to put their weight behind, and Mastodon stayed the post-facto interoperability leader plotting the direction. The activities around FEP's and Forum task force are a worthy effort to bring change to that reality.

SorteKanin:

My point is that these things are so tenuously defined that it becomes vacuous.

I agree. Yet then I'd focus on defining the types better, rather than having a single type that only takes concrete shape in the eye of the beholder.

sortekanin@socialhub.activitypub.rocks

aschrijver:

Yet then I’d focus on defining the types better

I'm all for well-defined types and I think that ought to be used more in more places (for instance FEP-1b12 groups) but let's be clear about what a type should provide - data about how to process the given content or object, in ways that can't be inferred from the rest of the data itself. That is, the type is metadata, not data. But when trying to answer a question about the data, such as "how long is this content", you should look at the data, not the metadata. The data is the source of truth.

I think this problem really stems from the use of ActivityStreams and its unfortunate formulation of these types, which has multiple issues. The redundancy of certain types, their ambiguity and the way they seem forced into an object-oriented inhertiance hierarchy, just to name the ones I can think of off the top of my head. Unfortunately we can't easily change the vocabulary to use.

trwnh@socialhub.activitypub.rocks

I think the semantics of fedi tend to be less about Note vs Article and more in the bucket of what SIOC calls a Post: http://rdfs.org/sioc/spec/#term_Post

They allow for subtypes like BlogPost, BoardPost, Comment, InstantMessage, MailMessage, WikiArticle. But those are less important than the fact that the Thing is a Post.

wrt the intention of AS2 Vocab:

Note and Article are intended to be contentful types, i.e. they typically have content
Document is intended to be an artifact or record of information, with subclasses of Image/Video/Audio/Page
- Page is not the same as an Article. There might be an Article on the Page, and the Article might be the main subject of the Page, but they are not the same.

Consider the following:

Some PageSome Article

The equivalent would be:

{ "@context": {"as": "https://www.w3.org/ns/activitystreams#"}, "@graph": [ { "@id": "#page", "@type": "as:Page", "as:name": "Some Page" }, { "@id": "#article", "@type": "as:Article", "as:name": "Some Article" } ]}

In general what Article boils down to is a sort of formality, an intent to publish something as an Article is interpreted as an intent to render it more like a blog than a social media post. Note that this is still a pretty contrived distinction, but it's the best we have.

Beyond that, I continue to have some issues with the FEP itself:

rimu1@socialhub.activitypub.rocks

Overall I'm fine with this FEP.

Mastodon uses 'summary' for content warnings so when Mastodon ignores this FEP and downgrades an article to a note they'll probably turn the summary into a content warning?

We could plan for Mastodon to drag the chain and just sidestep the issue by calling it something else?

julian@community.nodebb.org

Hi rimu@mastodon.nzoss.nz ! Actually two representatives from Mastodon were at the pre-FOSDEM meet to provide their input. Despite the illusion that they throw their weight around on purpose, it definitely is not the case.

In this case Mastodon actually already treats Article and Note differently! They are both converted into a "status", but content warnings only apply for Notes.

NodeBB already made the switch to publishing Articles with a summary property, and it is working quite well. It is not treated as a content warning (GtS on the other hand does! Funny how that works.)

Future iterations call for using preview to represent the Note (for Mastodon and other microblogs) while the threadiverse and long-formers can ingest the article directly!

trwnh@socialhub.activitypub.rocks

julian:

Future iterations call for using preview to represent the Note (for Mastodon and other microblogs) while the threadiverse and long-formers can ingest the article directly!

This is where I would point to my comment in https://codeberg.org/evanp/fep/issues/21#issuecomment-3765626

this use of preview muddies the waters because you can't necessarily assume that any property of the Article also apply to the preview Note.
taking the FEP at its word:
Especially for microblogging applications, the preview property is a useful fallback for supporting unrecognized object types like Article [...] For an article, the preview can be a Note that gives a well-formatted preview of the article content in its content property. For example, the name, summary, and a link to the url would be an appropriate representation.
if this is an appropriate representation to just take namesummary and url, then why not just do that directly? i mean, we already have a "useful fallback" described in AS2-Core: using the name and summary! https://www.w3.org/TR/activitystreams-core/#text-representations
i understand that there is a tension between publishers wanting to be accurate and publish Article objects vs. consumers special-casing Article objects and using fallback representations for them, but i don't see how preview helps here -- you'd still be special-casing.
what is really going on here is that you are dealing with a desire for multiple representations of the same resource, but you are forced to stuff it all into a single HTTP resource. so you are not really asking for a "preview", you are asking for an "alternate representation". rel=alternate, not rel=preview.

Matthias Pfefferle also raised this concern:

If we treat the preview Note as a standalone object that can be used as-is, we’d also need to include the context, replies, reposts, likes collections, audiences, and more. From my perspective, it’s impossible to ignore the surrounding object.

You can't just treat the preview as "the same object" and assume that any properties of the Article will also apply to the Note. They are separate objects, always.

The more idiomatic way to have an object be represented or interpreted in multiple ways is to use specific vocabularies and classes as appropriate. For example, instead of quibbling about the difference between Note and Article, you could capture the semantics of a "post" which is what you really care about:

{ "@id": "#something" "@type": [ "https://www.w3.org/ns/activitystreams#Note", "http://rdfs.org/sioc/ns#Post", "http://joinmastodon.org/ns#Status" ]}

Here, we are making the claim that #something is simultaneously a "Note", "Post", and "Status".

The core of the UX issue is that Note is overloaded to mean "Mastodon-style status" instead of actually defining an unambiguous class to represent the set of all things that can be considered Mastodon-style statuses.

You might think that it's bad for interop if everyone defines their own classes for everything, but that's not what I'm saying. What I'm saying is that reusing or overloading a class is also bad for interop, probably even worse. Think of types and classes as a sort of interface that can be fulfilled -- if the interface is fulfilled, the current object/thing belongs to the class or set representing all things that fulfill the interface. You should reuse classes that already exist, but only if the semantics match EXACTLY. If the semantics aren't an exact match, then you have introduced ambiguity.

We might say that for a "Post", what we really care about is the "content", plus some other supporting properties. This is the level at which "social media post" use cases should operate. In some ways, the AS2 content model is a bad fit for those use cases, because the AS2 content model is very heavily oriented toward Activities rather than Posts. The problems of "longform text" are largely artificial.

silverpill@mitra.social

@julian What is the purpose of preview if Mastodon can already render summary?

wistex@socialhub.activitypub.rocks

To me, the difference between an article and a note is not just its length, but also what formatting is allowed.

A Note typically is shorter but doesn't have to be, and has minimal formatting (typically limited to what Markdown or BBCode would allow).
Whereas an Article is more like a blog post or journalistic article, which typically contains formatting and layout beyond what BBCode or Markdown would allow.

This would be a much better distinction rather than using some arbitrary length as a factor. It would also solve the problem of blog posts looking weird when shown in a client.

aschrijver@socialhub.activitypub.rocks

Yes, that might work. Note having an agreed upon and well-defined constraint to its formatting, and a more general-purpose Article lacking that constraint.

For both types we should beware how base type definitions may or may not affect extended types. I think it not well defined what happens in terms of behavior for extended and/or multi-types (think e.g. "type": ["as:Note", "my:StickyNote"]).

Golden Hammer: Article versus Note?

This section is a bit of an aside, not directly related to the FEP. Yet relates to the overall design direction of AS/AP and how that may cause new trouble in the future as things are adopted by the installed base.

Wikipedia quote: "[Golden hammer is] the comfort zone state where you don't change anything to avoid risk. The problem with using the same tools every time you can is that you don't have enough arguments to make a choice because you have nothing to compare to and is limiting your knowledge."

For Articleschema.org suggests the following meaningful sub-types.

Article

And I copied that whole list because in AS/AP land by default the urge is to try to hammer every information model in the limited set of objects and activities that ActivityStreams provides, and not touch the whole extensibilty mechanism of the protocol specification.

As I see it AS only provides the toolbox of basic primitives to build from, but it is the extension mechanism where real domains are modeled for rich social networking use cases. We should go past the everything-is-a-note-or-it-doesnt-work-with-mastodon rule, but also not end up with everything-is-either-note-or-article-or-you-are-on-your-own.

Had schema.org been incorporated somehow in AS/AP we might have avoided the "if we only have a hammer, everything looks like a nail". #software:nodebb and #software:discourse would be building interoperable DiscussionForumPosting and #software:wordpress would have support for BlogPosting. Maybe #software:lemmy being a link aggregator may add support for Review and its subtypes.

The true power of the social web does not come from cramming society into a one-size-fits-all model, but instead to facilitate an ever growing support for interoperable (app-independent) domain designs. To move towards a heterogeneous social web. So imho..

Microblogging Forums crammed into microblogging Media publishing crammed into microblogging Forge federation means it microblogs

The direction to follow is for AS/AP to become more meaningful standards again wrt their promise of universal social networking. It means that we should step off of the Mastodon-first approach, and turning any social app into a microblogging app to then extend. The standards movement must find its own healthy path again.

AS/AP has huge flexibility. It's extension mechanism should be its strength, not its weakness. We must focus more on it, or AS/AP is doomed in protocol decay hell. It is meaningless if a project says "We are now fediverse-enabled". It is like saying "we have an internet connection".

Instead a project should say "We added Fediverse Microblogging support" or "Task management" support, or "Code reviewing" (better than the not-so-clear "forge federation" umbrella term).

Update copied from this toot:

The versatility of the Linked Data standards that AS/AP is based on is such that specific data models for any social networking use case can be defined.

While LD is suited as storage format for the social / knowledge graph, it is not out-of-the-box a good fit for the AS/AP extension mechanism. Using closed-world models based on strategic design (part of domain-driven design) would be best to define the interoperable msg exchange patterns that occur between actors.