Return to ActivityPub Conference 2019

Decentralised Hashtag Search and Subscription in Federated Social Networks

This talk by Trolli Schmittlauch can be found at https://archive.org/details/apconf-talks/Talk5_Schmittlauch_compressed.mov. He is a computer science student at the Technical University in Dresden. This talk is organized around some work he has done as a student, and is more of a proposal for discussion than a finished piece of work, though he has worked out a lot of his proposal.

Although hashtags may have started on Twitter, they have become a standard on virtually all social media platforms, which is a strong argument that they meet a need. They are used for events, political discussions, and general discussions. They have been used to coordinate demonstrations and other activities, and for social movements like #MeToo and #BlackLivesMatter. And of course hashtags are used in the Fediverse. The problem here is that in a decentralized environment you don’t see all of the posts around a given hashtag, just the ones on your node. This could push people to move to larger nodes to see more posts, which is the opposite impetus to what federation is about. So this is a problem.

Currently you can subscribe to someone on a different instance if you know their user name and the name of their instance. For example, I am @ahuka@octodon.social. If someone on another instance wants to follow my particular posts, they can use that full address to send a subscribe message to my instance, and from then on they will see all of my posts. But they would not see the posts of anyone else on my instance unless they had explicitly subscribed to them.

There is a partial solution right now, which is relays. For Mastodon, there is a relay at https://relay.mastodon.host/, and this is described as

“A service-type ActivityPub actor that will re-broadcast anything sent to it to anyone who subscribes to it.”

https://source.joinmastodon.org/mastodon/pub-relay

Well, that sounds like the firehose, which is not what we need. And there are other problems with it:

  • It is a centralised actor relaying all incoming posts
  • It is a single point of failure
  • Bringing in all posts is a huge load to place on a small instance
  • You only see posts sent since you subscribed

So the proposal from Trolli Schmittlauch is for an architecture that would:

  • Relay and subscribe – instances can subscribe to all public posts of a given hashtag
  • Store and query – Instances can retrieve 1 year history for a hashtag without needing to subscribe
  • Fully decentralised, no single point of authority for all tags.

To accomplish this, he proposes a core idea of a distributed hash table, based on Chord, that would distribute responsibility for hashtags among instances. Here is where we need to be careful, since we now have the term Hash in two different contexts. Chord is a distributed hash table (DHT), meaning that it stores SHA-1 hashes, and those hashes could represent anything. In this case it would represent hashtags, something completely different. In any case, the idea is that this would let you subscribe to a hashtag, just like you can now subscribe to an individual. The way this would work is that you would calculate the hashes for all given hashtags and all nodes (instances), and they would share a name space. Each node would then keep a routing table.

Then we can put it to work. The life cycle of a new post would now look like:

  • The publishing instance calculates the hash of each hashtag added to the post and looks up the responsible relay instance on the Distributed Hash Table (DHT) for each included hashtag.
  • The publishing instance sends a post to the responsible relay instance.
  • The relay instance looks up the responsible storage node on the DHT. Note that this implies that being a relay node and being a storage node can be separate roles.
  • The relay instance verifies the incoming post’s signature, then relays the posts URI (ID) to all subscribers and the storage node.
  • Subscribing instances can now retrieve the full authenticated post from the received post URI.

Note that there are other problems to be addressed, and he does address security (stopping man-in-the-middle attacks that would suppress certain hashtags), load balancing, and redundancy of nodes, but I am not going to go into all of the technical details here. You can view hist talk if you want that information.

In the last part he opens it up for discussion, beginning with the social aspects. Do we even want global hashtags in the Fediverse? There are positive benefits in allowing more conversation and coordination of activities (think of Tahrir Square as an example of this). But there might also be a downside in facilitating SPAM and harassment. Then there is the question of the visibility level. Should this only apply to Public posts, or should it also include unlisted posts? Would we maybe need a new level to make this work? Also, none of the necessary architecture exists in ActivityPub right now, including routing for the DHT.

From a security standpoint, we need to make sure that no attacker can gain control over a given hashtag, and also not introduce an arbitrary number of nodes. Again, think of Tahrir Square, and use the Egyptian Government as the attacker to see what is at stake here.

All in all I thought this was an excellent presentation and provided a lot of food for thought.

Note: The full paper on which this talk was based can be found at https://git.orlives.de/schmittlauch/paper_hashtag_federation/src/branch/master/paper_hashtag_federation.pdf

Listen to the audio version of this post onĀ Hacker Public Radio!