A Web-Centric Approach to State Transition

People implementing systems using REST often do not know how to translate their familiar state-transition diagrams into resources. This article shows how.

Introduction

The word "state" has two meanings -- these can be somewhat confusing. The first meaning of "state" is the data associated with an object. The second is like a "step" in a multi-step conversation or computation.

The latter type of state is often encountered in networking circles. For instance you can think of an HTTP request-response pair as being a variety of states:

The transition between states is communicated by a variety of syntactic devices. But there is no way for either party to ask the other what state the conversation is in. The current state is implicit: both the client and server are expected to "just remember."

The problem is when we extend this implicit state transition model to our applications built on top of HTTP, encompassing multiple messages. This decreases reliability and makes it more difficult to deploy intermediaries and other third party participants.

People creating Web Services protocols from scratch (on top of SOAP or pure XML) will usually diagram the state transitions described by a protocol and then implement those transitions on the client and server side. "First the client sends this message, which will put the server in either this state or that state. Next the client may send this other message which will put the conversation in the next state. etc." What you can say and how it is interpreted depends on which of the enumerated states you are in.

This approach is inherently against the principles of REST because the meaning of a message will depend on what state the client and the server are in, and the state is implicit. This means that third parties trying to interpret the conversation also need the state transition table and can only be brought into the conversation if they are told what state the discussion is in. This implies some sort of naming scheme for transitions which in turn implies complicated specifications like XLANG, XFDL etc. The web services world seems as if it is deeply wedded to implict state-based flow control!

Let's compare this situation to the world of programming language APIs. Good programming language API designers go to great lengths to avoid having methods that work when the object is in one state but fail when it is in another state. And when it really is necessary to do that, they always make sure that the state is something you can inspect on the object. So for instance you can ask file objects if they are open or closed before you try to read from them. That way, the success of a method always depends only on observable properties of the object, not on some magic, hidden away concept of "the object's current state." The REST strategy is similar. The current state of a conversation should be represented by a resource addressable by a URI. The allowed actions can always be determined based only on the information available from that resource or from other resources it points to.

Programming language APIs avoid implicit state transitions because they are inherently error prone and in a sense they violate good type system practices. After all, given an object with a particular type, the methods you can call should depend more on the object's type and observable properties than on what methods you've called on it before. In the rare case where method calling order matters, it can be represented through properties and sub-objects.

Similarly, a REST web service does not use implicit state transitions. REST uses connected, hyperlinked resources to represent the current state of a discussion.

A State-shifting Example

We're going to use an extremely simple example. Hopefully its levity will reinforce rather than distract from the lesson. In order to cross the Bridge of Death, the client must answer three questions.

The server wants to collect the information one item at a time because it may choose a different second question based upon your answer to the first. That's why this problem is representative of the sorts of things that involve evolution of state and state transitions.

Let's first discuss the non-REST way to do it. The most simplistic solution would involve an XML message going in one direction that says: "QUESTION 1:..." and the answer goes back "ANSWER 1:..." and "QUESTION 2:..." and "ANSWER 2:..." etc. Third parties cannot be easily brought into this conversation because there is no record of it. It does not generate its own record as the REST/HTTP version will.

Let's discuss the REST/HTTP way to solve this problem:

-->
GET /cross_bridge 

<--
200 OK

<challenge><p>Stop! 
Who would cross the Bridge of Death must answer me 
these questions three, 
ere the other side he see. 
Do you agree?</p>
<method>POST</method> your answer in 
<uri>/sessions</uri></challenge>

We start with a basic static page. The client does a "GET" and the server responds with a document. Using content negotiation we could deliver HTML or XML or anything else, depending on the user-agent. In this case we've delivered XML. The document says that the client should POST to create a new session. We do that next:

-->
POST /sessions

<answer continue="true">Ask me the questions, bridgekeeper. I am not afraid.</p>

<--
201 Created
Location: /sessions/42

<challenge session="/sessions/42">
<p>Very well. What... is your name?</p>
<method>PUT</method> your answer in 
<uri>/sessions/42/name</uri>
</challenge>

The resource at "/sessions/42" represents this session. At any point, either party, or any third party, may determine the state of the conversation by examining that resource. The resource "/sessions" is probably a list of all sessions. It could be useful for looking at answers that have been given previously or for mirroring the sessions on another computer. Next we must answer the first question:

-->
PUT /sessions/42/name

<answer continue="true">My name is 'Sir Launcelot of Camelot'</answer> 

<--
201 Created
Location: /sessions/42/name

...

I've elided the body of this PUT for reasons that will become clear in a moment. The "Location" header tells us where we can find an answer element containing the person's name. We can check what other information we can provide by doing a GET on our main URI, "/sessions/42".

-->
GET /sessions/42

<--
200 OK
<challenge session="/sessions/42">
<name href="/sessions/42/name"/>, 
<p>what... is your quest? 
<method>PUT</method> your answer in 
<uri>/sessions/42/quest</uri></challenge> 

Note that the state of the conversation is completely communicated at this URI. It is clear that the service already knows my name. It is clear that it now needs an answer to a question about my quest.

A sufficiently link-smart schema language (e.g. Schematron or one of the semantic web languages) could require that the "name" link does point to the document types it is supposed to. The client and server web services toolkits could enforce this rule.

Our protocol is a little bit inefficient because we have an essentially empty response from the PUT and then we have to do another GET to get the real data. That adds network latency to the system. There are two ways to handle this. One is to use HTTP 1.1 pipelining. The simpler way is to return the representation of "/sessions/42" in the response from the PUT. HTTP is very flexible about what the response to a PUT should mean. Returning a representation of the current state of the transaction is perfectly acceptable.

That's why I elided the body of the previous PUT in order to avoid confusing things. Really, it could have been the <challenge> element. I'll return the full challenge element with the answer to the next question. But remember that the location element is that location where the PUT resource went. It is not the location of the resource holding the <challenge> element.

-->
PUT /sessions/42/quest

<answer continue="true">To seek the Holy Grail.</answer> 

<--
201 Created
Location: /sessions/42/quest

<challenge session="/sessions/42">
<name href="/sessions/42/name"/>,
<quest href="/quest="/sessions/42/quest"/>
<p>What... is the air-speed velocity of an unladen swallow?
<method>PUT</method> your answer in 
<uri>/sessions/42/hard_question</uri></challenge> 

This time we returned the full challenge to avoid an extra GET. You could still do the GET and find the same information, but you do not have to. Please keep in mind what I said about the Location. It is the location of the resource that was created from the PUT. If you wanted to GET the resource containing the <challenge> element, you would still do that at the "/sessions/42" URI.

One last round-trip:

-->
PUT /sessions/42/hard_question

<answer continue="confused">Err....What do you mean? An African or European swallow?</answer> 

<--
406 Not Acceptable

<p>Answer not correct!</p>

And we're done (unsuccessfully in this case).

A major advantage of making the conversation's state explicit at the "/sessions/42" URI is the ability to integrate third parties. A third party could be brought in at any point to complete the transaction. For instance we could have brought in a service that knows about the wind-speed of unladen swallows at the appropriate point.

Other interesting third parties would include loggers and auditors. One of these third parties could be handed a single URI and it could walk from one to the other gathering information. A logger would just download similar to a recursive "wget" or an IE "Save Web Page". An auditor would look at each URI and validate it against schemas and business rules. Business rules can be expressed in terms of the state of the transaction at a particular moment rather than in terms of the current state of the transaction.

In general, any third party can be appraised of the entire transaction through a single URI. In many cases the third parties do not even have to understand the semantics of your paritcular application. For instance a logger could stupidly follow the links without knowing what the XML elements mean. An XML Schema validator could follow the links and validate based upon namespaces without knowing what the element types mean. One issue is that it must be possible to recognize links, so it might be advisable to use XLink rather than ad hoc link attributes and elements.

Another nice thing is that if it makes sense for the service, the client could go back and change its existing answers merely by PUT-ting to the appropriate URIs. If you change your mind about your quest, just go in and change the data! The server is always in control of data in its namespace, however, so it could also make that URI read-only.

As usual with REST services, one benefit is that it would be trivial to make the service so that it works both with HTML-based web browsers and XML-based web services clients.

Also note that the client participant can choose to keep track of its progress through a locally maintained state table, but it could also choose not to do so. In other words we could simplify the construction of the client by having it just answer the questions presented to it rather than having it keep track of how many questions it has already been asked.

Who Owns the Resources?

It may make sense in some circumstances for the client not to submit data to be incorporated into new resources, but actually the URI for an existing resource. For instance, if the person's quest changes once per hour, and it is engaged in thousand of "bridge crossing transactions", it would be difficult for it to go around updating the data at those URIs constantly. It might make more sense for the data to live on a single server and have references to that one instance of the data everywhere else. This is trivial to do thanks to the power of URIs. Instead of PUT-ting a constant string in an XML element, you would just PUT a hyperlinking element.

PUT /sessions/42/name

<answer href="http://mysite.com/myname"/>

A Stateless Example

We could also take a completely different approach to this service. We could send all of the state back and forth with every message. Here is what such a conversation might look like:

-->
GET /cross_bridge

<-- 
200 OK
<challenge><p>Stop! 
Who would cross the Bridge of Death must answer me 
these questions three, 
ere the other side he see. 
Do you agree?</p>
If so, <method>GET</method> your next question from 
<uri>/questions/name_question</uri></challenge>

-->
GET /questions/name_question

<--
200 OK

<challenge>
<p>Very well. What... is your name?</p>
<method>GET</method> the next question by combining your answer
with this URI: <uri>/questions/quest_question</uri>
</challenge>

-->
GET /questions/quest_question?name=Sir+Launcelot+of+Camelot

<--
200 OK

<challenge">
<p>What... is your quest? 
<method>GET</method> the next question by combining your answer with this URI: 
<uri>/questions/hard_question?name=Sir+Launcelot+of+Camelot</uri></challenge> 

-->
GET /questions/hard_question?name=Sir+Launcelot+of+Camelot
		&quest=To+Seek+The+Holy+Grail

<--
200 OK

<challenge>
<p>What... is the air-speed velocity of an unladen swallow?
<method>GET</method> the next question by combining your answer with 
<uri>/questions/hard_question?name=Sir+Launcelot+of+Camelot
			&quest=To+Seek+The+Holy+Grail</uri></challenge> 

-->
GET /questions/hard_question?name=Sir+Launcelot+of+Camelot
	&quest=To+Seek+The+Holy+Grail&air_speed=unknown

<--
406 Not Acceptable

<p>Answer not correct!</p>

This strategy is nice because it involves no server resources. Basically the question is whether you want the created resources to be persistent objects or transient responses to questions.

Lessons about REST state management

REST basically advises you choose between these two strategies: send all of the state back and forth (in this case using GET and URIs) or capture parts of the state as resources with URIs and transmit links to that state back and forth. What REST advises against is state without a URI name. Cookes are an example of un-REST-like data. Cookes are implicit, so third parties have no access to them and even the client application will not typically have any way to know what cookies were in effect when a particular message was created hours ago. REST stands for REpresentational State Transfer - the constant transfer of state (or references to state) back and forth as representations are key to its design.