As the author of EtherPad I'm familiar with CRDT, which is a cousin of OT. They ...

cemerick · on May 13, 2014

Hi, author here. I'm not sure you read the whole piece. :-) (Modern) APIs are a very limited mechanism of state transfer that happens to be paired with often side-effecting operations. Thus, a "synchronization" (I don't think that word is particularly useful because reasons) mechanism paired with reactive computational services _does_ replace APIs, and offers the ability to do much, much more.

OTs (operational transforms) _are_ a related precursor to CRDTs only in that they are both ways to reconcile concurrent changes, but that's really the limit of the connection. Unfortunately, the substrate for OTs (text, integer-indexed sequences of characters) is fundamentally not amenable to commutative operations. This makes implementing OTs _very_ difficult and error-prone, and certain combinations of concurrent operations are completely unreconcilable (a result that came out of a Korean group's study, can't find the cite for it right now).

_urga · on May 13, 2014

I think the paper you are referencing might be [1]?

It's one of my favorite papers on CRDTs and provides practical pseudocode for learning how to implement CRDTs yourself.

The structures they present are simple to understand and have good performance characteristics compared to similar CRDTs [2].

A key insight from the second paper is to write CRDTs that optimize for applying remote operations over applying local operations, as the ratio of remote operations to local operations will be greater. i.e. 100 clients making 1 change to a CRDT will require all 100 clients to each apply 99 remote operations and 1 local operation.

[1] Replicated abstract data types: Building blocks for collaborative applications - http://dl.acm.org/citation.cfm?id=1931272

[2] Evaluating CRDTs for Real-time Document Editing - http://hal.archives-ouvertes.fr/docs/00/62/95/03/PDF/doce63-...

cemerick · on May 13, 2014

The cite I'm missing at the moment is a multi-year study that catalogued all known operational transforms over text (there were many more than I imagined prior), along with proofs showing that certain combinations of concurrent operations simply could not be reconciled consistently.

Thanks for the other pointers, though!

dgreensp · on May 13, 2014

There's actually an interesting deeper connection between OT and CRDT, in which OT comes across as a special case of CRDT.

Suppose your state is a text document or array of characters (we could also examine other kinds of state like an unordered set of objects with properties, but it's less interesting). CRDT assigns a semi-permanent name to each unique data element (character), which is typically a string that indexes into a tree. It's permanent unless the names get too long, in which case you rebalance the tree. The papers I've read treat the rebalancing as an offline operation, to be done one day at 3am when no one is using the system, but in principle you could do it online, as long as you save enough information to rewrite the names in any operations you receive that were meant for the old tree to apply to the new tree. OT is equivalent to rebalancing the tree after every operation. You don't actually need a tree, then, and the names are just numbers (in the case of an array). Names are scoped to a revision, and operations are always rewritten to use the appropriate names before applying them.

Another maintenance operation you might do on a CRDT tree is to remove "garbage" (deleted elements, which you keep around so that you can perform insertion operations relative to them). OT always delete garbage immediately, and operations that refer to a deleted element are rewritten (when they are transformed against the operation that deleted the element).

I'm not saying one is better than the other. People seem to have an easier time wrapping their heads around CRDT, but maybe just because OT hasn't been explained well. The CRDT tree and name strings sounds like kind of a pain to implement versus OT's arrays, but I've only implemented OT and not CRDT.

Saying that APIs are a "mechanism of state transfer" is as overbroad as saying function calls are a mechanism of state transfer. The article at first seems to provide itself an out, by saying that only a certain class of APIs is being considered, but then it defines API as a "set of names." Similarly, you say that any application touching more than one computer is a distributed system, and then you preemptively defend against exceptions by saying, "If this doesn't apply to you, maybe you don't have a distributed system."

More concretely, APIs do a lot of stuff. They send and receive text messages and emails; they transcode video; they turn on your coffee maker; they post to your Facebook wall. Often there is little or no shared representation, except perhaps the status of the operation, which can typically be communicated in a simple way.

Don't get me wrong, I think more APIs could work by synchronizing state. Basically, use something equivalent to a git repo under the hood. Gmail could work this way. Maybe mail servers could even work this way.

Posting to a Facebook wall doesn't work this way. The way to make posting to a Facebook wall use CRDT would be to replace API calls like addPost and deletePost (say) with a single API call "updateWall" which performs arbitrary operations on a user's wall. Thanks to CRDT, this operation never fails (though the client may still want to know when it has completed). In casual conversation at Meteor, we call it the "Lotus Notes" model when all operations go through the data layer, which synchronizes over the network. Asana's internal framework also uses this model, so a couple Meteor devs who worked at Asana have experience with it. The main drawback is that it is difficult to perform validation and security checks. If the Facebook API only has "updateWall," Facebook must determine whether the diff it receives constitutes a valid operation or series of operations for user A to perform on user B's wall (for example, you can add any number of posts to anyone's wall, but only delete posts off your own). This is much more complicated than having addPost and deletePost, each with the appropriate security checks, and knowing that no other operations are permitted.

To abolish The API completely like you say, you'd have to not just have updateWall but basically one, unnamed API call for all of Facebook, and then you could say there's no API.

cemerick · on May 14, 2014

A lot of different distributed storage and computation architectures are special cases of CRDTs, just with different sets of commutative operations and/or convergent types of state. (One of the aspects of CRDTs that I most appreciate, as it provides a framework within which one can compare different technologies in a thoroughgoing way.) Ones I like to cite as common examples that people have often touched before are datastores like Riak, CouchDB, and S3.

The document model treatment you describe is talked about some in the Shapiro et al. paper as a "continuous sequence", and is roughly what was used by Logoot and Treedoc. The latter is explored more thoroughly here: http://arxiv.org/abs/0907.0929.

I was only talking about network APIs in the original piece. The "set of names" bit was there to establish the lineage between "classic" programming language/library APIs and those that touch the network.

APIs themselves do exactly nothing. It is the computational service on the other side of an API that does something. This conflation is exactly the sort of thing that is allowed and encouraged by the construction of APIs as "just another function you call in your runtime".

I find the Facebook examples you offer very curious. APIs have no inherent model for authentication and authorization, and the same goes for CRDTs. So, why do you think that verifying authorization over a set of operations or set of modifications to some state is any different than verifying authorization on N operations attempted via N API endpoints? I'll certainly grant that the latter comes with a body of current programming practice and infrastructure, but that hardly an endorsement of its relative quality or suitability for the job-to-be-done.

My preferred characterization is that the Facebook API would be replaced with a data model. The original piece already hints at a number of advantages to such an architecture, and omits many others that I'll talk about at a later date.

dgreensp · on May 20, 2014

I'll certainly grant that the latter comes with a body of current programming practice and infrastructure, but that hardly an endorsement of its relative quality or suitability for the job-to-be-done.

You don't see "comes with a body of current programming practice and infrastructure" as an endorsement for "suitability for the job-to-be-done"? :) It doesn't bear directly on the main discussion we're having, but for people who are trying to get things done, I would say it's quite a strong endorsement of a particular practice to say that we understand how to apply it successfully and that there are tools, known patterns, and infrastructure around it.

Verifying authorization on a set of operations is hard or easy depending on what the operations are. High-level operations that correspond to the actions that users are presented with in the user interface tend to be easy to secure and validate. Low-level database operations tend to be hard to secure and validate as your data model becomes complicated, because you basically have to reverse-engineer the high-level operation that licensed the low-level operations (which could be many and spread across different tables).

I think the best way to expose a Facebook wall (say) via CRDT is to define a Wall datatype whose operations are the permitted high-level actions you can take on a wall. Then they are easy to validate, as you say. This is sort of how Google Docs is implemented -- the core of the application consumes high-level operations (insert column, sort rows, etc.) from different users and updates the document state, and then this core is replicated in different data centers. Most discussions I've seen seem to assume that CRDT and OT operations are simple, generic operations, but I think the real magic is in treating it as a paradigm like OO and defining datatypes within it.

I still have a hard time conceptualizing an API call as a timeless, commutative modification to a model. I love CRDT and OT, but I've just seen too many network APIs to put them in a box. Meteor actually has more shared data model between client and server than any other framework I know of. The general case of APIs, for us, is you tell the server to go do something, and it tells you when it's done it, and you also get streaming updates for the parts of the data model you care about (and a marker telling at what point the changes you caused landed).

Basically my main point is that you still need high-level verbs that carry intent, whether they are API calls or operations. Otherwise you have data changing and no accounting for it. It's like how banking is mainly about transactions, not about telling each other how much money is in account X or Y.

gritzko · on May 13, 2014

I'm the author (the leading one) of Yandex Live Letters, which is a CRDT-based EtherPad-like thing. Some flavours of CRDT are indeed related to OT. My favorite technique (pure op-based CRDT variant) is very much operation-centric, but instead of transformations (like in OT), it employs per-operation Lamport identifiers.

Based on our new project named Swarm [1] I may say that CRDT and "async RPC" fits rather nicely together.

OT indeed behaves poorly in a highly asynchronous environment. I suspect, that is the reason why Google Docs doesn't have decent offline mode yet. CRDT (any flavor) is async-friendly.

[1] http://slideshare.net/gritzko/swarm-34428560

_urga · on May 13, 2014

I think operational transformation is more of a predecessor to CRDTs than a cousin, and OT simply does not work offline, whereas CRDTs do.