logo AtomServer, An Introduction


Chris Berry, Bryon Jacob. Updated 05/01/08

Introduction

AtomServer is a normalized, universal data store, implemented as a REST-ful web service, and designed as an Atom Store. AtomServer is based on the following concepts and protocols, which are explained in detail below.
Parts of this document were culled from books and the Internet. These sources are acknowledged below.

REST

There is nothing all that remarkable about REST (Representational State Transfer). It is not some new amazing technology. REST is really just the realization that every web application - every web site - is, in fact, a service. And that these "web services" can scale to enormous size, while delivering an unparalleled loose coupling between the client and the service.  It is probably more remarkable that it took us so long to apply the principles of the web to distributed computing. But old habits die hard.

It is important to understand that REST is a design pattern. It's not a technology like SOAP or HTTP.  Instead, REST is a novel technique to decompose an application into smaller parts, so that the whole works better in a distributed setting.  REST is a proven design pattern for building loosely-coupled, highly-scalable applications. There are important benefits to sticking to the REST design pattern;
The basic technologies of REST are the basic technologies of the web; the HTTP protocol, the URI naming standard, and the XML markup language. It is the simplicity of these combined technologies that gives REST its unprecedented power.  REST is not collapsing under the complicated morass of protocols and standards (SOAP, WSDL, XSD, WS-* , ...)  that are smothering "Big Web Services".  This quote from John Gall sums it up perfectly;

"A complex system that works is invariably found to have evolved from a simple system that worked."

So why is it called Representational State Transfer? The explanation is as follows; the web is comprised of resources. A resource is any item of interest defined by some URL. For example, a website may define a resource for a particular Order with, say, this URL:

    http://www.foo.com/orders/12345

And when the client accesses that URL, a Representation of the resource is returned (e.g., 12345.en.xml). The representation places the client application in a State. If the client traverses some other hyperlink embedded in 12345.en.xml, then another resource is accessed. And the new representation places the client application into yet another state. Thus, the client application changes (Transfers) state with each resource representation. Which you can see yields the term; Representational State Transfer.

The fundamental principles of the REST design pattern are;

Atom

Fundamentally, Atom is an XML vocabulary for describing lists of timestamped entries. These entries can be anything, although because Atom was originally conceived to replace RSS (Rich Site Summary) the entries often contain human authored text, such as a weblog or news site. Thus, the internal structure of an Atom entry (i.e. the XML elements and attributes) conveys the semantics of publishing; authors, languages, titles, and so on.

In fact, the entire idea behind web syndication (i.e. RSS and Atom) is lifted from the world of publishing.  In the world of newspapers, a syndicate (e.g. the Associated Press) distributes information to subscribers, allowing each publication to tailor the content of the information it receives. Comics, news stories, and opinion columns often are distributed by syndicates, providing greater exposure for the authors and more content for the readers. Web syndication is identical; web content is distributed to subscribers (as feeds) who tailor it to their needs, often aggregating it and providing it (as aggregated feeds) to further downstream subscribers.

Atom, like RSS, provides the basis for a web syndication framework. The Atom Publishing Protocol (APP) leverages the work done on the Atom Syndication Format and the basics of HTTP to form a simple, yet powerful, publishing protocol. It is useful because so many web services are, in a broad sense, ways of publishing information. Furthermore, there are a large number of web service clients out there that understand the semantics of Atom documents. This includes both browser-based clients, as well as reusable clients written in practically every computer language you might imagine.

Atom lists are feeds, and the items in the lists are entries. Atom is a RESTful protocol. These days, most weblogs, wikis, news services, etc. expose a special resource whose representation is an Atom feed. The entries in the feed describe and link to other resources; weblog or wiki entries or news stories published on the site. The client can consume these resources with a feed reader or some other external program.  An example Atom feed might look something like this;

<?xml version="1.0"?>
<feed xmlns="http://purl.org/atom/ns#">
  <link rel="alternate" href="http://example.com/MyBlog"/>
  <updated>2007-04-14T20:00:39Z</updated>
  <author>Chris Berry</author>  
  <title>My Weblog</title>
  <id>urn:1aaabbbb-2ccc-3ddd-4567ffffffff</id>
  <entry >
    <title>First Post</title>
    <link rel="edit" href="http://example.com/MyBlog/1234"/>
    <updated>2007-04-14T20:00:39Z</updated>
    <id>urn:3bbbcccc-2ccc-1ddd-1234ffffffff</id>
    <summary>
      Details about my summer vacation
    </summary>
  </entry>
</feed>

In this example you can see some of the tags tags that convey the semantics of publishing; author, title, link, updated, and so on. The feed as a whole contains an author, and since the entry does not, it inherits the author information from the feed. The feed has a link that presumably points to an alternate URI for the underlying feed resource. The entry also has a link, which identifies the entry as a resource in its own right. The entry contains a summary, which the feed reader would most likely expose to the user. Presumably, to get the full blog, the user must subsequently GET the entry's URI.

An Atom document is basically a listing of published resources. You can use Atom to represent practically any published resource - a set of purchase orders, images, a list of search results, whatever. Or you can omit the link  element in the entry and use Atom as a container for the original content.

APP is all about pushing around Atom entries. And it is important to note that entries, like feeds, are first class citizens within the protocol. Each entry has a corresponding representation, and thus, each entry has a corresponding URI that represents it. 

Atom is, by definition, RESTful. Do an HTTP GET on that URI to retrieve an entry representing the underlying data; PUT a new entry to the URI to update the represented data. HTTP DELETE on that URI and the represented data is deleted. The entries that are used to represent the data are grouped together in a collection. That, too, is a resource and has its own URI.

Atom feeds and entries must contain certain elements. For example, an updated element, which may be used along with the standard HTTP Headers If-Modified-Since and If-Unmodified-Since - or alternately with ETags - to provide a mechanism to return only the entries that have changed. Obviously, this can yield a significant performance improvement. Likewise, Atom can make use of the standard HTTP Cache Headers to provide further performance enhancements.

If an application doesn't quite fit the Atom schema, it is possible to embed XML tags from other namespaces in a Atom feed. It is even possible to define a custom namespace and embed its elements in your Atom feeds. Clients that do not understand your special elements will see a normal Atom feed with some mysterious data in it, which it is required to simply ignore.

Atom is an important addition to a RESTful architecture.  It provides a standard for both program control, and for error processing. By definition, a client knows exactly how to interact with Atom and what will happen (error codes, etc.) when things go wrong. This makes it easy to write generic clients, and to mash together disparate Atom feeds into something greater than the individual feeds alone.

Atom also provides a mechanism for assigning categories to Entries. This is a very powerful concept. It essentially allows Clients to extend the original content, making it much richer, yet leaving it untouched.

Additional Paging Specs

When a client requests a feed of all entries that have changed since some particular date, they are essentially doing a time-sensitive search, where the search parameter is the If-Modified-Since date.  It is not hard too imagine that a feed request like this might produce a huge set of results. In order to save bandwidth, and to avoid overwhelming the client with possibly irrelevant data, it is common to divide large feeds into successive "pages", giving the user the ability to chain through the pages as they wish (e.g. Google).

Atom does not address the problem of "paging", so a couple of Internet specifications have emerged. OpenSearch is one XML vocabulary that's sometimes embedded in Atom documents. OpenSearch is a Creative-Commons-licensed specification, that was created by Amazon in 2003. OpenSearch defines a RESTful protocol for searching, including a format for advertising what kind of search your site supports, and specifying how to return your search results in Atom or RSS. An OpenSearch-enabled web service returns the results of a query as an Atom feed, with the individual results represented as Atom entries.

Some aspects of a list of search results cannot be represented in a standard Atom feed. So OpenSearch defines three new elements in the opensearch namespace:
Unfortunately OpenSearch  does not well address the paging problem for time-based data (i.e. If-Modified-Since) because this data is not static and thus, startIndex is not sufficient for determining the next page of results, particularly when the Server does not want to maintain state for all individual searches. Another specification, Feed Paging,  addresses the problem more generically, adding links for next and previous which the client can use to chain through pages of results.

The Atom Store

In the last couple of years these two ideas, the Atom Publishing Protocol and additional Paging Specs, that have started to show up together. The most important and seminal example of this being GData; Google's API for managing data on the web.

Atom already provides the capability to return an Atom entry for each dataset you choose to represent on your site, so returning a bunch of them in the form of a feed, in response to an OpenSearch request, isn't such a great leap.  When we stop limiting Atom entries to web content like weblogs, and expand APP to the management of general data, we arrive at an Atom Store; a generic, inter-linked blob of Atom entries, which you can edit using the APP, and then search over using OpenSearch

One huge advantage of an Atom Store is that it's built on top of REST-ful services. That means that we get the advantages of REST -- caching and uniform interfaces and hypermedia as the engine of application state. For both OpenSearch and Atom there is a self describing XML document that describes the capabilities of each endpoint. That allows another service to come along and wrap several Atom Stores together by reading those description documents and then presenting itself as an Atom Store, an aggregate of all those stores it uses. And this aggregate store might be a melange of your disparate data, or on the other hand, it might be a uniform series of servers each with a subset of a huge store, yielding, in turn, a monster database.

Acknowledgments

Giving credit where it is due. Parts of this document were culled from various sources;