Chris Berry, Bryon Jacob. Updated 05/01/08
Introduction
AtomServer is a normalized, universal data store, implemented as a REST-ful web service, and designed as an Atom Store. AtomServer is based on the following concepts and protocols, which are explained in detail below.Parts of this document were culled from books and the Internet. These sources are acknowledged below.
REST
There is nothing all that remarkable about REST (Representational State Transfer). It is not some new amazing technology. REST is really just the realization that every web application - every web site - is, in fact, a service. And that these "web services" can scale to enormous size, while delivering an unparalleled loose coupling between the client and the service. It is probably more remarkable that it took us so long to apply the principles of the web to distributed computing. But old habits die hard.It is important to understand that REST is a design pattern. It's not a technology like SOAP or HTTP. Instead, REST is a novel technique to decompose an application into smaller parts, so that the whole works better in a distributed setting. REST is a proven design pattern for building loosely-coupled, highly-scalable applications. There are important benefits to sticking to the REST design pattern;
- Simple. REST is incredibly simple to define. There are just a handful of principles and well defined semantics associated with it.
- Scalable. REST leads to a very scalable solution by promoting a stateless protocol and allowing state to be distributed across the web.
- Layered. REST allows any
number of intermediaries, such as proxies, gateways, and firewalls.
Similarly, because REST is ultimately just a web site, albeit one that
adheres to a design pattern, one can easily layer aspects such as
Security, Compression, etc. on an as needed basis, as they would with
any web site.
"A
complex system that works is invariably found to have evolved from a
simple system that worked."
So why is it called Representational State Transfer? The explanation is as follows; the web is comprised of resources. A resource is any item of interest defined by some URL. For example, a website may define a resource for a particular Order with, say, this URL:
http://www.foo.com/orders/12345
And when the client accesses that URL, a Representation of the resource is returned (e.g., 12345.en.xml). The representation places the client application in a State. If the client traverses some other hyperlink embedded in 12345.en.xml, then another resource is accessed. And the new representation places the client application into yet another state. Thus, the client application changes (Transfers) state with each resource representation. Which you can see yields the term; Representational State Transfer.
The fundamental principles of the REST design pattern are;
- Everything is a Resource. A resource can be thought of a distant thing that you can interact with, but do not manipulate directly.
- Resources have names. Every resource has at least one name. This name must be unique and can only refer to one resource. On the web, URIs are perfect match for this requirement. They enable a resource to be found by any application.
- Resources support simple verbs. Every resource is interacted with using a universally predefined set of "verbs". These verbs are not defined by individual resources, but across all resources. On the web, the verbs are the standard HTTP methods POST, GET, PUT, and DELETE. Each verb has clearly defined semantics that can be relied upon. This is critical so intermediaries can do the right thing.
- Resources have Representations. Since resources can be thought of far away things, it is not possible to manipulate the resource directly. Instead, through communication, we can exchange representations for the resource. The representation can be intended for human interaction (e.g. a HTML page) or for machine-to-machine interaction (e.g. XML or even binary data).
Atom
Fundamentally, Atom is an XML vocabulary for describing lists of timestamped entries. These entries can be anything, although because Atom was originally conceived to replace RSS (Rich Site Summary) the entries often contain human authored text, such as a weblog or news site. Thus, the internal structure of an Atom entry (i.e. the XML elements and attributes) conveys the semantics of publishing; authors, languages, titles, and so on.In fact, the entire idea behind web syndication (i.e. RSS and Atom) is lifted from the world of publishing. In the world of newspapers, a syndicate (e.g. the Associated Press) distributes information to subscribers, allowing each publication to tailor the content of the information it receives. Comics, news stories, and opinion columns often are distributed by syndicates, providing greater exposure for the authors and more content for the readers. Web syndication is identical; web content is distributed to subscribers (as feeds) who tailor it to their needs, often aggregating it and providing it (as aggregated feeds) to further downstream subscribers.
Atom, like RSS, provides the basis for a web syndication framework. The Atom Publishing Protocol (APP) leverages the work done on the Atom Syndication Format and the basics of HTTP to form a simple, yet powerful, publishing protocol. It is useful because so many web services are, in a broad sense, ways of publishing information. Furthermore, there are a large number of web service clients out there that understand the semantics of Atom documents. This includes both browser-based clients, as well as reusable clients written in practically every computer language you might imagine.
Atom lists are feeds, and the items in the lists are entries. Atom is a RESTful protocol. These days, most weblogs, wikis, news services, etc. expose a special resource whose representation is an Atom feed. The entries in the feed describe and link to other resources; weblog or wiki entries or news stories published on the site. The client can consume these resources with a feed reader or some other external program. An example Atom feed might look something like this;
<?xml
version="1.0"?>
<feed xmlns="http://purl.org/atom/ns#">
<link rel="alternate" href="http://example.com/MyBlog"/>
<updated>2007-04-14T20:00:39Z</updated>
<author>Chris Berry</author>
<title>My Weblog</title>
<id>urn:1aaabbbb-2ccc-3ddd-4567ffffffff</id>
<entry >
<title>First Post</title>
<link rel="edit" href="http://example.com/MyBlog/1234"/>
<updated>2007-04-14T20:00:39Z</updated>
<id>urn:3bbbcccc-2ccc-1ddd-1234ffffffff</id>
<summary>
Details about my summer vacation
</summary>
</entry>
</feed>
In this example you can see some of the tags tags that convey the
semantics of publishing; author, title,
link, updated, and so on. The feed as a
whole contains an author, and
since the entry
does not, it inherits the author information from
the feed.
The feed
has a link
that presumably points to an alternate URI for the underlying feed
resource. The entry
also has a link,
which identifies the entry as a
resource in its own right. The entry
contains a summary,
which the feed reader would most likely expose to the user. Presumably,
to get the full blog, the user must subsequently GET the entry's URI.
<feed xmlns="http://purl.org/atom/ns#">
<link rel="alternate" href="http://example.com/MyBlog"/>
<updated>2007-04-14T20:00:39Z</updated>
<author>Chris Berry</author>
<title>My Weblog</title>
<id>urn:1aaabbbb-2ccc-3ddd-4567ffffffff</id>
<entry >
<title>First Post</title>
<link rel="edit" href="http://example.com/MyBlog/1234"/>
<updated>2007-04-14T20:00:39Z</updated>
<id>urn:3bbbcccc-2ccc-1ddd-1234ffffffff</id>
<summary>
Details about my summer vacation
</summary>
</entry>
</feed>
An Atom document is basically a listing of published resources. You can use Atom to represent practically any published resource - a set of purchase orders, images, a list of search results, whatever. Or you can omit the link element in the entry and use Atom as a container for the original content.
APP is all about pushing around Atom entries. And it is important to note that entries, like feeds, are first class citizens within the protocol. Each entry has a corresponding representation, and thus, each entry has a corresponding URI that represents it.
Atom is, by definition, RESTful. Do an HTTP GET on that URI to retrieve an entry representing the underlying data; PUT a new entry to the URI to update the represented data. HTTP DELETE on that URI and the represented data is deleted. The entries that are used to represent the data are grouped together in a collection. That, too, is a resource and has its own URI.
Atom feeds and entries must contain certain elements. For example, an updated element, which may be used along with the standard HTTP Headers If-Modified-Since and If-Unmodified-Since - or alternately with ETags - to provide a mechanism to return only the entries that have changed. Obviously, this can yield a significant performance improvement. Likewise, Atom can make use of the standard HTTP Cache Headers to provide further performance enhancements.
If an application doesn't quite fit the Atom schema, it is possible to embed XML tags from other namespaces in a Atom feed. It is even possible to define a custom namespace and embed its elements in your Atom feeds. Clients that do not understand your special elements will see a normal Atom feed with some mysterious data in it, which it is required to simply ignore.
Atom is an important addition to a RESTful architecture. It provides a standard for both program control, and for error processing. By definition, a client knows exactly how to interact with Atom and what will happen (error codes, etc.) when things go wrong. This makes it easy to write generic clients, and to mash together disparate Atom feeds into something greater than the individual feeds alone.
Atom also provides a mechanism for assigning categories to Entries. This is a very powerful concept. It essentially allows Clients to extend the original content, making it much richer, yet leaving it untouched.
Additional Paging Specs
When a client requests a feed of all entries that have changed since
some particular date, they are essentially doing a time-sensitive
search, where the search parameter is the If-Modified-Since date.
It is not hard too imagine that a feed request like this might produce
a huge set of results. In order to save bandwidth, and to avoid
overwhelming the client with possibly irrelevant data, it is common to
divide large feeds into successive "pages", giving the user the ability
to chain through the pages as they wish (e.g. Google). Atom does not address the problem of "paging", so a couple of Internet specifications have emerged. OpenSearch is one XML vocabulary that's sometimes embedded in Atom documents. OpenSearch is a Creative-Commons-licensed specification, that was created by Amazon in 2003. OpenSearch defines a RESTful protocol for searching, including a format for advertising what kind of search your site supports, and specifying how to return your search results in Atom or RSS. An OpenSearch-enabled web service returns the results of a query as an Atom feed, with the individual results represented as Atom entries.
Some aspects of a list of search results cannot be represented in a standard Atom feed. So OpenSearch defines three new elements in the opensearch namespace:
- totalResults The total number of results that matched the query
- itemsPerPage How many items are returned in a single "page" of search results.
- startIndex If all search results are numbered from zero to totalResults, the the first result in this feed document is entry number startIndex. When combined with itemsPerPage you can use this to figure out what "page" of results you are on.
The Atom Store
In the last couple of years these two ideas, the Atom Publishing
Protocol and
additional Paging Specs, that have started to show up together. The
most important
and seminal example of this being GData;
Google's API for managing data on the web. Atom already provides the capability to return an Atom entry for each dataset you choose to represent on your site, so returning a bunch of them in the form of a feed, in response to an OpenSearch request, isn't such a great leap. When we stop limiting Atom entries to web content like weblogs, and expand APP to the management of general data, we arrive at an Atom Store; a generic, inter-linked blob of Atom entries, which you can edit using the APP, and then search over using OpenSearch
One huge advantage of an Atom Store is that it's built on top of REST-ful services. That means that we get the advantages of REST -- caching and uniform interfaces and hypermedia as the engine of application state. For both OpenSearch and Atom there is a self describing XML document that describes the capabilities of each endpoint. That allows another service to come along and wrap several Atom Stores together by reading those description documents and then presenting itself as an Atom Store, an aggregate of all those stores it uses. And this aggregate store might be a melange of your disparate data, or on the other hand, it might be a uniform series of servers each with a subset of a huge store, yielding, in turn, a monster database.
Acknowledgments
Giving credit where it is due. Parts of this document were culled from various sources;- RESTful Web Services, by Leonard Richardson & Sam Ruby
- An Introduction to REST, by Steve Bjorg
- At the Forge -
Aggregating with Atom, by Reuven Lerner
- Dreaming of an Atom Store: A Database for the Web, by Joe Gregorio
AtomServer,
An Introduction