Return to ActivityPub Conference 2019

The Semantic Social Network

This is another ActivityPub talk, this time by Pukkamustard. You can find it at https://archive.org/details/apconf-talks/Talk3_Pukkamustard_compressed.mov.

The idea of a Semantic Network is that the various kinds of data are linked. The model he uses has subjects linked to objects via predicates. There can be multiple objects for any subject, or multiple subjects for any object. And any subject can be an object of any other subject. By linking we can do interesting queries of the data. As an example, consider a search for vegetarian restaurants in Brussels. Brussels could be the subject, and take as an object Restaurants. And of course there could be other objects for Brussels, like Museums, and one object of that might be the Brussels Tram Museum. If everything is properly linked, you do queries like “Are there any vegetarian restaurants near the Tram Museum in Brussels? So far, this is just basic search. But when a single entity controls the data, like Google, it is not too difficult to manage.

But when it is a network of independent sites, you have to start thinking about how things will be named and labelled. What if one site labels what we want as a “restaurant” and another as a “cafe”? To avoid problems you need a naming convention. If you identify subjects and objects using URIs, the ambiguity disappears since every one has a single, unique URI. But then you need to add a “Name” field to make it useful to humans.

A good place to begin is at http://schema.org, where you can find the data to start to essentially do XML on everything. Looking there at the Restaurants, I see that it includes the property servesCuisine, which is a text field. You could use this to put in “vegetarian” as your text, and that takes care of one thing. Another field available is areaServed, which lets you identify where it is, and so on. It does not look like there is a direct link to nearby museums, but if each museum was an object with a similar geographic identifier you can see how it would link things. If you used URIs to name the properties you are pretty close to the Resource Description Framework (RDF), which is a W3C standard model for data interchange, and part of the Semantic Web project.

Where ActivityPub enters this picture is when you can have an agreed structure for identifying data. As an example, suppose Alice and Bob are on two different servers. Bob makes a post, and Alice likes it. Then Clarissa, on yet a third server, sees that Alice liked Bob’s post. The idea of federated media is that you should be able to link to any remote content in an understandable way. Of course, this could go beyond ActivityPub, since it is the agreed framework that matters, and in essence that is what the W3C is trying to establish with the RDF and the Semantic Web project. But because ActivityPub is a shared protocol, it makes it very easy to get there. This is why the speaker makes this definition:

The Fediverse is a distributed graph of interlinked content created by social interactions.

And that is why he calls it the Semantic Social Network. This in turn means that:

  • ActivityPub content can be seen as documents or as graphs.
  • A Graph can be traversed and queried in interesting ways.
  • There is no limit to the kind of data that can be created in a crowd-sourced manner.
  • Open data sets are publicly available – link to them. As an example, see the 5-star Open Data plan from Tim Berners-Lee.
  • There is a whole Semantic Web community with tools, research, and standards.

At the end there was some discussion about the use of json-ld (JavaScript Object Notation for Linked Data), which I gathered can be controversial, though I don’t quite get what the dispute is about.