I have talked about PubSubHubbub – the not quite real time web previously, but I am still seeing misleading information about PubSubHubbub being posted on the Internet.
This post isn’t intended to promote using Comet servers, but to highlight the difference between PubSubHubbub and client push technologies such as Comet (which in my view includes HTML5 WebSocket). It is important to understand the difference and why the technology and protocols to implement the two areas are not suitable for the other. I have seen people consider PubSubHubbub for client delivery, and also using full streaming techniques for server to server delivery, but in both cases it is not as suitable.
PubSubHubbub is a server to server protocol, and this is the key point. It increases the speed at which servers such as Google’s servers for Google Reader receive updates to blogs and other RSS feeds. However, PubSubHubbub does not change the last hop, which is how client applications or browsers receive these updates to blogs.
Of course, if the server gets the update quicker, the client will too, but PubSubHubbub is only responsible for getting the update to the server.
There are lots of techniques for clients to receive real time updates, and it is becoming reasonably common for websites to have elements of live data. Often this is polling behind the interface, so although the user does not have to take any action, and the whole page does not reload, there is still a delay in the screen updating – the user is just not as aware of it.
The following diagram shows the sequence of events for PubSubHubbub. The green events happen once when an aggregating server, such as Google Reader’s servers, subscribe to a blog. The red events happen each time the blog is updated. The blue events are not part of PubSubHubbub, but show a typical polling client to complete the picture.
One of the key aspects of PubSubHubbub is that the Aggregator Server does not stay connected to the Hub. The subscription process is a single HTTP request but the Hub only has to remember a list of subscribers – not maintain open connections to them. When a new item is published on the blog, the blog server simply pokes the Hub to tell it the feed has changed, and the Hub then does the hard work of figuring out what is new and sending the new items to all the subscribers.
Informing the subscribers is a simple HTTP request, as are all the events in this workflow. There are no persistent connections other than HTTP Keep Alive efficiencies at a lower level.
PubSubHubbub for client delivery?
Delivery to clients is a very different use case. There could be many more clients, most of which will not be running at any one time. With a persistent streaming connection the server knows when a client is connected and only has to send updates to currently connected clients. With PubSubHubbub a subscription is just a list of URLs to inform – if the list contains a large number of clients that are no longer running then it is a large overhead to attempt to send out all the HTTP messages.
Another reason for PubSubHubbub not being suitable for client delivery is because clients are often not publically addressable. In other words the Hub cannot send an HTTP request to a client even if it did implement a web server interface. For this reason all real time client delivery techniques use either client polling or maintain an open connection initiated by the client.
HTTP Streaming for server to server?
Conversely, client delivery techniques are not as suitable for server to server communication here. If polling is used, we simply go back to how things were done before PubSubHubbub with all the problems associated with that – poll too often and you have large overheads. Poll too infrequently and you only get updates infrequently.
However, a full streaming connection would be ok right? Well not really, as blog updates and other RSS feeds are not millisecond critical, and scalability and stability are probably more important. A Hub having to maintain persistent connections to all server subscribers would add complexity and increase resource usage. Clients may have to use a persistent connection, but you can communicate with a server using a simple HTTP request.
There are some use cases where low latency is essential, and in these cases a streaming connection is more suitable. In these situations the number of clients is usually controlled more closely than a typical PubSubHubbub network of public blogs.
Conclusion
PubSubHubbub may not address the final hop to the client, and as a protocol is not suitable for that hop, but it works well for what it is designed for. The server to client hop is covered by Comet or other streaming solutions and together with PubSubHubbub provide end to end real time publication of RSS feeds.
Nice article 🙂 And yes, it’s good to stress the difference between all the realtime protocols.
I believe, however that sooner or later, even browsers will be addressable. Probably not via HTTP, but maybe with a websockets URI or an XMPP Jid!
Thanks Julien.
If that does happen I think it will be later rather than sooner unfortunately 🙂 Legacy browsers and network infrastructure will be the stumbling blocks.
Glad you wrote this article. It’s easy to overlook the client-side aspect of the problem. In the Google demo videos I was surprised it even worked so fast without long polling.