The State of Mobile XMPP in 2016
XMPP is not suited for mobile devices. That’s a myth that has been around for ages. It is mostly spread by people who want to sell you their own proprietary instant messaging solution. But it also gained some popularity through a blog post entitled The (Sad) State of Mobile XMPP in 2014 by Georg Lukas. While Georg wasn’t wrong with his status report per se, it is important to understand that he describes a temporary problem. There was a brief period of time where XMPP actually had some catching up to do. XMPP was fine before, when we had different requirements—In fact I was using an XMPP client on my Nokia E71 back in 2008 (That’s more than two years before WhatsApp was even invented)—and it’s fine now in 2016.
Why should I care?
There is Signal, there is Whatsapp, there is Slack. Why should I even bother to use XMPP? Well there are two kind of answers. The political kind and the personal one. The political one is easy to understand: While Signal and Whatsapp nowadays all use end-to-end encryption your messages are routed over centralized servers. That means the vendor still knows exactly who you are communicating with and when. Those vendors fall under US jurisdiction and have to cooperate with various Three-letter Agencies. Unfortunately these agencies are murdering people based on exactly that information. We call that information—the with, whom and when—meta data. With XMPP this meta data still exists, but it is not kept in a central place. Instead it is spread over several thousands XMPP providers through various jurisdictions all around the globe. When you are chatting with someone on XMPP, only you, your server operator and the server operator of your contact know with whom you are communicating. We call that federation. Federation is also what makes XMPP more resistant towards censorship. While a country like Brazil can simply flip the off switch on WhatsApp, it would be impossible for them to switch off all XMPP providers.
Fortunately for most readers it is rather unlikely to become the victim of a drone strike. It is also unlikely that our governments will start to restrict our access to WhatsApp any time soon.
But XMPP gives you another freedom. The freedom to choose your user experience. User experience describes the way we use something. The way it looks and feels. Different people might have different preferences and different workflows. Some people might prefer to use full size hardware keyboards instead of their tiny software keyboards and thus prefer to use a PC. Other people might even go so far to prefer a completely text based user experience in their terminal emulators. But it can even be much simpler than that. Some users might like to see the presences and status messages of their contacts. Some other users are confused by that flood of information. Proprietary instant messaging solutions do not give you that choice. You are stuck with whatever some designer thought was right for most people. You can not even properly use Whatsapp and Signal on your PC and have to use yet another proprietary solution like Slack to do basically the same thing. Slack in turn fails to provide a decent mobile experience.
XMPP just works everywhere. You only need one account to cater to all your instant messaging needs. If your workflow changes you just pick a different client.
Now that we talked about the benefits of XMPP let us get back to the technical details. XMPP uses a long standing TCP connection to the server. Within that connection there is something called a session. Within a session you are authenticated towards your server and are - in layman terms - online. For years this was a completely acceptable behaviour. You start your computer. You open your XMPP client. It creates a TCP connection to your server and starts a session within that. You are online. You exchange messages. At the end of the day you close your XMPP client. It finishes the session and closes the TCP connection. You are offline.
In a mobile world TCP connections tend to have a very short life span. Your phone constantly switches between WiFi and 3G. If you drive through one of those cliche tunnels that don’t have 3G repeaters or base stations in them, your connection might even drop out entirely for a few minutes. When you tie your session to your TCP connection, an unreliable connection means an unreliable session. In the best case scenario it means your contacts see you constantly switching between offline and online. In the worst case it means you might even lose messages. Fortunately the solution to this is pretty simple. Instead of starting a new session with every TCP connection, you just resume the previous one. If your TCP connection drops out unexpectedly, your server keeps your session open for a while (usually about 5 minutes). This buys you enough time to reach the other end of that tunnel. As a side effect the same technology gives you the guarantee that a message you sent has actually reached the server. Meaning if the message changes from the sending state to the sent state, it actually was sent. If it was not, your client would automatically resend that message in the next session. That technology is called XEP-0198: Stream Management and is nowadays available in all actively maintained servers and clients.
Keeping sessions open indefinitely, even though you are not really connected, is undesirable for two reasons. First of all an open session makes you show up as online to your contacts which might not be a good idea if you really are not and second of all it will block resources on your server. However keeping the session open is unnecessary if all you want is to receive messages that were sent to you while your device was offline. The solution to this is called XEP-0313: Message Archive Management. It allows a client that failed to resume a session, because it was offline for more than a couple of minutes, to request the backlog of messages from the server. To the user this process is seamless. They usually will not notice whether the messages came from a resumed session or the archive. That technology is currently being rolled out. Conversations on Android has been capable of this since early 2015. On the server side ejabberd has support for this and there is a module for Prosody.
XMPP maintains a constant TCP connection to your server. It is a fact that an idle TCP connection—that is a TCP connection that is not sending or receiving data—has virtually zero impact on battery life. Nonetheless every time you do receive data, your device will wake up from a sleeping state and actually start to consume power not only for the brief second when you receive data, but also for a few seconds after that, because it will naturally assume that there is more data incoming. The simple trick to keeping a low battery profile is to exchange as little data as possible as infrequent as possible. A lot of the data that is traditionally exchanged over XMPP is not important to us if we are not actively looking at our chat app. The prime example for this is presence information. If your phone is in your pocket you really do not care who is currently online. It would be perfectly fine to receive this data only when the chat app is actually open. Yet there is other information we would like to receive instantly: usually information that triggers a notification like messages. Fortunately XMPP has a solution to this as well called XEP-0352: Client State Indication. That extension has been around since late 2014 and is supported by several servers as well as Conversations. The concept is extremely simple and straightforward. The client lets the server know whether it is currently open in foreground and if the user is actively interacting with it or if the client is not being used right now. Based on that information the server can withhold unimportant information and keep the TCP connection idle for long periods of time. (Remember idle TCP connection don’t influence your battery life!) This approach works so well that you regularly have to scroll all the way to the bottom of the battery stats in Android to find Conversations. Sometimes the battery consumption of Conversations is even lower than those of the Google Play Services.
Images and multiple devices
The synchronization of messages, that allow me seamlessly switch between devices has become completely natural to me over the recent years. I got so used to chatting with someone from my desktop PC, quickly grabbing my mobile phone when I go to the kitchen, continue the same conversation there and switch back to my computer after returning to my desk, that I can not comprehend how competing solutions like WhatsApp or Slack are still struggling with this. The technology that powers this is called XEP-0280: Message Carbons and has been around for many years now. Several servers and clients have support for it.
Up until recently the multi device experience was limited by the fact that file transfer in XMPP was P2P. That meant all files were sent directly between two devices. Sending a photo from my mobile phone I had to guess whether my contact wanted to see the picture on their phone or on their computer. And of course I would not have received a copy of that picture on my own computer, leaving the message history on that device incomplete.
The obvious solution, called XEP-0363: HTTP File Upload, is to upload the file to your own server and then distribute a link to that file to all participating devices. The individual clients can - and will - display that as a normal inline file in your message history. After it’s introduction in 2015 HTTP File Upload has seen an ultra fast adoption rate. Several servers and clients already have support for it.
Mobile ready encryption
Any discussion on end-to-end encryption in XMPP should begin with a reminder that in a federated system end-to-end encryption is not always necessary. When you trust your provider, encrypting the transport layer is sufficient. XMPP does that and some clients like Conversations simply won’t even connect without encrypting the transport layer (TLS). If you actually have something to hide you can simply operate your own server and get all your friends onto it. That’s also the reason why organizations and companies, that have to trust their IT department anyway, usually have very little interest in end-to-end encryption. They simply don’t need it.
End-to-end encryption caters primarily to the needs of average users who don’t run their own servers and don’t fully trust their provider. Some users might also want to use end-to-end encryption in case their server gets hacked to have an extra layer of security.
In any case—if we ignore OTR which was never made for XMPP and has always been a barely functioning hack—there are two methods for end-to-encryption in XMPP. PGP and OMEMO. Those two methods are not competing with each other but are dealing with different requirements. Remember when we talked about XMPP letting you choose your user experience at the beginning of this article?
OMEMO has some nice crypto features that can come in handy if you are actually being prosecuted or otherwise under attack like deniability and forward secrecy. Unfortunately those traits prevent you from keeping a server-side archive of your messages. A new device of yours won’t have access to the messages that were sent before that device existed. (But it does not influence the ability to get the backlog of messages after being offline for a short period of time.) Here is where PGP comes in. PGP does not offer these advanced cryptographic qualities but still prevents your server operator from snooping. And in return you always have access to your own archive.
XMPP gives you the freedom to choose. Don’t care about the archive? Choose OMEMO. Care about the archive? Choose PGP.
Neither of those encryption methods will have any negative impact on your ability to synchronize messages between multiple devices, exchange images or sending messages to offline contacts.
OMEMO is currently available on Conversations and as a plugin for Gajim with more clients like ChatSecure on iOS working on it. PGP has been around in the XMPP community for several years but is currently being reworked into a more modern extension called XEP-0374: OpenPGP for XMPP Instant Messaging that promises to make the onboarding easier for novice users.
An Excurse on Push
There are a lot of misconceptions regarding push that can lead to heated debates among people who fail to understand what it does and what it is used for.
So what is push exactly? Traditionally when you design a protocol that transmits any kind of information you have the choice between designing it as a push protocol or as pull protocol. In a pull protocol you ask the server in some interval if there is new information available. POP3, the protocol we used to retrieve our email with was a pull protocol. Your email client would—every five minutes or so—ask the server: Hey, do you have mail for me? Sometimes the server would respond with: Yes. Here is your new mail. The newer IMAP on the other hand can act as a push protocol. Client and server maintain an idle TCP connection between each other (Remember idle connections don’t consume resources) and every time there is new mail the server just pushes that mail to the client. Using pull protocols on mobile devices can have huge impacts on battery life because your application has to wake up in short intervals and ask the server for new information. Unfortunately pull protocols are much simpler to design and especially in the mobile app market where budgets are extremely tight developers tend to create pull protocols. As a consequence vendors of mobile operating systems (Google, Apple, Mircosoft) offered a way out. They told app developers they could still use their own bad designed pull protocols but instead of pulling in set intervals they would receive an event from a centralized, well designed, push protocol stack operated by them. The only thing an app developer has to do is to notify the vendor’s server which then would notify the user’s phone which in turn would do a pull using the developers pull protocol.
To make a long story short, if you have the money and the resources to design your own push protocol there is simply no need to use the one provided by the vendor.
However having the choice to fall back to the vendor provided push service instead of pulling the data in set intervals did not stop unqualified or underpaid developers from still doing so. The consequences are battery drain and people blaming the phone and the phones vendor for the low battery life. (Instead of blaming the app developer who is actually responsible.) But vendors - understandably - don’t like to be blamed for something they didn’t cause and step by step are limiting the potential damage a bad app can cause. Fortunately on Android you can still manually whitelist a good behaving app. On iOS you can not do this. Thus on iOS an app developer has to fallback to using the vendor provided push service even when their own protocol is already a well designed push protocol like XMPP. An alternative to simply restricting apps and their background activity would have been to let the user identify (and uninstall) potentially bad designed apps themselves. And to be fair, Google is constantly working on that as well and keeps improving the battery stats. But the average user probably won’t ever be looking at those.
In summary using a push service to control your push protocol (like XMPP) is pretty much useless in itself and only becomes a requirement once the operating system blocks other push protocols. To put that in layman terms. On Android you don’t need the push service operated by the vendor on iOS you do.
XMPP nowadays can interact with various push services. The XEP-0357: Push Notifications specifies a way for your own XMPP server to contact the server of the app developer which in turn contacts the push service which then contacts your phone. The detour over the server of your app developer is a limitation imposed by the provider of the push service. Only the developer holds a private key that can wake up the app. The information pushed through that chain is of a pure please refresh kind that just tells your app to wake up and pull new information from the server. Neither the app developer nor the push service will see any actual content.
While unnecessary from a technical standpoint Conversation does have support to push wake-up events through GCM (Googles push service). This might come in handy to users who don’t feel like whitelisting the app but also provides a tech demo and a proof of concept for app developers on other platforms that are forced to use the vendors push services due to artificial restrictions of the respective platform.
XMPP has managed to overcome issues and restrictions from its past. However as requirements shift XMPP will face many more challenges in the future. But there is no doubt that XMPP will find ways to manage those challenges as well. XMPP, after more than 16 years of existence, has already outlived many of its competitors. Surely it can outlive a few more.