Introducing the HTTP Transport Layer

Warning: this post is not about technical details. Expect high doses of abstraction, nitpickery, and wanna-be-philosophy.

The web runs on HTTP. It doesn’t run on SOAP, XMLRPC, WS-*, IPsec, and other friends from the grave, but on the shoulders of a protocol that is easy for humans to read and hard (even in 2013) for machines to parse.

At its core, though, HTTP is a transport-layer protocol: request something with a path, a method and a bunch of other headers (and maybe a blob), and you get back another bunch of headers plus a blob. This is what I call the HTTP Transport Layer. Resource-oriented application concepts like REST are layered above this transport layer, which is basically a specialized RPC scheme.

The most common implementation of the HTTP Transport Layer used today is still HTTP/1.1, but there are already competing alternatives such as SPDY (the starting point for upcoming HTTP/2.0 work), and rising demand for improved security, privacy, resistance and performance of networking are bound to lead to further advances and new network protocols.

The implications are interesting: while it could be argued that SPDY and the like should use its own URI scheme, that would not make a lot of semantic sense; spdy://google.com is still semantically the same resource as http://google.com. Contrast this with ftp://google.com.

The difference boils down to the ambiguous definition of a Uniform Resource Identifier (URI), which can be both a Name and a Locator. Basically, it’s a name if it identifies (erm..) what I’m looking at, and it’s a locator if it gives me a clear indicator of how I would go ahead and fetch the thing. For example, a post in your application could have a URL of http://myblog.com/posts/1 (which others know how to resolve) and have a URN of myblog-post:1. The un-intuitiveness of the second example already shows how pervasive the “locator” role is in reality.

The http: URI scheme used to have purely locational character: if I wanted to fetch http://google.com, I would take that as an order to establish a TCP connection to google.com:80, and talk HTTP/1.1 over that connection.

But now I want to go to https://google.com. Again, my course of action is determined by the URI scheme: the spec says, establish a TCP connection to google.com:443, do the TLS dance and then talk HTTP/1.1 again.

It can be argued that this is a clear indicator that in the examples above, the URI is 100% a locator; even though I am fetching the same resource (“Google’s front page”) twice, there are two different things that I work with. But, as I see it, there is a little bit of semantic meaning: The https resource is actually “Google’s front page that is verified to be sent by Google”. It becomes clear that there is a breach of layers here; this resource by definition interferes with the transport layer, because what “verified to be sent by Google” actually means depends on the way I fetched the resource in the first place (and the verification policy, of course, which is a big topic in itself.)

What I am proposing is to start treating http: and https: URIs as names. http://google.com would then be defined as “a resource I can fetch from google.com at path /, using some implementation of the HTTP Transport Layer that both google.com and I know.” Note that the mechanism used is not defined, and everybody has to ensure to degrade gracefully if the other end doesn’t support their favorite transport.

The real-world implication is this: if the thing you’re developing provides an infrastructure for creating what I above described as the “HTTP Transport Layer” (and you’re not working within HTTP/1.1), then make sure you design it well enough so that clients can seamlessly upgrade to your transport layer from whatever they are currently using, which at the time of writing is most likely HTTP/1.1.

To be more specific, that means determining whether your transport is supported by a site should not cause a performance loss. HTTPS itself is a bad example there: Check whether google.com:443 responds, if not: fall back to HTTP/1.1. It’s so bad that, in fact, nobody even tries. I personally like the way Google did it with SPDY; adding next protocol negotiation to TLS is a step in the right direction, away from magic port numbers and to mutual, well, negotiation of the protocol to be used next. Transport layers built on top of TLS (which is definitely not the worst spot to be in) should use NPN to provide a clear discovery and upgrade path for supporting clients and servers.

Transitioning HTTP away from the role of a concrete protocol, to the role of an abstract transport layer, will open up new possibilities for the web, and will play a key role for the deployment and acceptance of new technology that will make web browsing, communication, and distributed computing more efficient, secure, and generally awesome.