Scalability through client-driven workflows

This post discusses a way to increase a service architecture’s scalability by removing any communication paths between services and instead burdening clients with that communication.

Bureaucracy in real life is a trade-off between ease of work for the applicant vs. ease of work for officers

In “Les 12 travaux d’Astérix” [1], Asterix and Obelix are on a Kafkaesque quest to find the elusive permit A38 which triggers in turn a cascade of a “side quests” for forms, permits, certificates and signatures from various departments in “the house that drives you mad” – an obvious allusion to the state bureaucracy of the time. Every such dependency that the heroes try to resolve, instead of bringing them closer to obtaining A38, only uncovers further prerequisites (forms, permits etc) which need to be resolved, leading to a recursive, never ending struggle.

Les 12 travaux d’Astérix – Find Permit A 38 in “The Place That Sends You Mad”
Source http://www.ravennanotizie.it/articoli/2014/08/28/la-posta-dei-lettori-lospedale-disfunzionale.html

Each office our heroes visit adds a piece needed to complete A38 either directly, or by completing a dependency of a dependency… Sometimes offices provide information (properly signed with a signature and a stamp) or they validate information and sign and stamp documents.

The degree of bureaucracy in various organisations probably ranges from the refreshing experience of modern self service to character-building adventures in “the house that drives you mad”, with anything in between being well in the realm of the possible. The issue at heart with bureaucracy may be a complicated processes – but I as an applicant won’t feel much of this complexity unless it directly involves me. If I fill out a simple expense reimbursement form, hand it in and get my money back, then that was a rather painless process which I wouldn’t hesitate to reuse. My colleagues involved in processing the expense claim may have a different view on the process as they do all the bureaucratic work for me to get me 3,50 back:

Example of an expense claim workflow where the back office handles all process steps

The bureaucratic counter-example requires me to do much of the work: getting all of these documents myself is a lot of work which requires me to know how and where to apply for them, possibly performing conversions and a lot of writing. In order to counter fraud and forgery, every document is signed and stamped, printed on official company paper and my boss needs to approve each step – after all, I’m the potentially unreliable courier with a motive…

Example of the same workflow where the client handles all process steps

From the organisation’s point of view sending me from office to office and getting everything signed alleviates work and responsibility from the departments involved, because I essentially am doing their work:

Decoupling: they don’t have to talk to each other or forward applications to other department. In the event of a possible reorganisation, departments don’t need to hash out their communication paths since there is no communication between them.

Single responsibility: they can concentrate on few task only, they are only responsible for their part and not for the process end-to-end.

Authentication, authorisation: all a department has to know is how to validate the legitimacy of a request, the authenticity of any supporting documents and how to issue a signed document.

Stateless: they don’t need to keep track of a long running process since they never communicate with other departments; they work in a fire-and-forget manner.

Error handling: if something goes wrong they’ll tell me to come back tomorrow and resubmit my application. If that doesn’t work either, I have to come up with a solution (e.g. contact a different office in the department or figure out an alternative workflow) – getting the application processed is in my interest, after all.

This setup is as obviously inhumane as it is scalable; in fact, its scalability is limited only by the amount of office space and width of stairs one can build.

Increasing scalability with bureaucracy in IT systems

If SOA, DDD and Conway’s law have taught us anything it is that fast, resilient and maintainable IT systems must replicate real-life organisations; and that they will succeed and fail in exactly the same way their role models succeed and fail.

We may easily discard bureaucratic organisations as failures, but in all honesty: they were (and to some point are) ruling the world and didn’t fail until after collapsing under the peer pressure of nimble organisations with automated business workflows. No matter how complicated a process is, if it is automated there won’t be a human (other than IT people) around to complain about it.

The expense claim example we talked about earlier revealed two different ways of wiring together a system: either by allowing services to communicate with each other directly or by isolating services and having the client talk to each service at each step in the workflow. Unsurprisingly, the pros and cons of the “coupled” vs the “decoupled” service architecture are exactly the same as in the case of the real life organisation where departments do or don’t communicate directly. The advantages and disadvantages of coupled vs decoupled service architectures are obvious when looking at a dependency schematic:

A coupled architecture moves the burden of communication (discovering services, adapting to their APIs and domain models and exchanging messages) from the client to one or more services. In order to complete a business process, a client will talk to a single service which reduces network traffic and allows simpler client logic. The service will get all extra information it needs in order to execute the process, trigger sub-workflows and perform validations by communicating with other services. Each service needs to maintain state for the duration of the process execution, reason about error handling (roll back? compensate? retry?), discover other services and communicate with them. The complexity of maintaining a coupled architecture involves aligning a service’s roadmap with other service owners which makes service governance an important task.

A decoupled architecture on the other hand reverses the advantages and disadvantages of a coupled architecture. Here, services are not aware of each other; they do not need to discover each other, a network connection between them is not required, they execute only atomic tasks for which they are responsible, they don’t reflect state of other services and the repertoire of error conditions they need to handle is limited to their own implementation. Since services don’t directly depend on each other, service managers can disengage their own release schedule from that of other services. The down side is that clients, however, are now tasked with service discovery, error handling, maintaining state, severely increased network communication and complex logic. And security.

Authentication, authenticity, legitimacy, validity…

Coupled architectures often deal with security aspects like access control, authenticating users and validating input by invoking other services. Referring back to the expense claim example, if a client (programme) submits an electronic document containing my personnel number, the expense incurred and project’s cost centre to the imaginary reimbursement service, that service would validate my personnel number and employment status with the company’s HR system, check my project assignments with the ERP and finally place the item with the company’s payroll system. This is all possible because services trust each other and there are only few communication points with external clients (like me).

In a decoupled architecture however services interact only with the untrusted me, thus any data the client provides has to be digitally signed by the sources. Here the client first authenticates (me) against the HR system and obtains a digitally signed document verifying my identity and employment status:

Step 1: obtain a digitally signed document that proves my identity

In:

GET /HR/login?username=george&password=unguessabl3

Out:
{

“personnelId”:”gg123″,

“employmentStatus”:”employed”,

“signature”:”FF3ZALF5LLBAAAAAAA03″

}

Step 2: get all active cost centres I’m allowed to book against:

In:
POST /ERP/costcentres/
{

“identity”: {

“personnelId”:”gg123″,

“employmentStatus”:”employed”,

“signature”:”FF3ZALF5LLBAAAAAAA03″
}

}

Out:
{

   “costCentres“: {
“items”:[
   {
   “id”:”C1111″,
   “label”:”Project 1″,
   “signature”:”IDF74723JHSHK”
   },
   {
   “id”:”C1122“,
   “label”:”Project 2“,
   “signature”:”IHAH234AJFKLD“
   }
                  ],

“signature”:”26LXDD3290843JKDF“

}

Step 3: Submit my claim to a billing service:

In:
POST /BILLING/reimbursement
{

“identity”: {

“personnelId”:”gg123″,

“employmentStatus”:”employed”,

                  “signature”:”FF3ZALF5LLBAAAAAAA03″
               },
   “costCentre”: {
“id”:”C1111″,
“label”:”Project 1″,
“signature”:”IDF74723JHSHK”
   },
“amount”:3.50

}

Out:
{

“reimbursement“: {
“status“:“OK”,
“executionDate”:”12/02/2019″,

“signature”:”LDJFLN12341234“

}

… you get the idea. Whereas in a “coupled” architecture the ERP service would verify my identity with the HR service and the billing service would verify the cost centre I gave it with the ERP service, here the signatures on the documents I submit prove the legitimacy of my claim.

The “signatures” from the examples above are intentionally simple, but in essence they must fulfil the following requirements:

self-contained: a recipient should be able to verify the signature without further communication
complete: the signature should be sufficient to validate the contents of a document
tamper-proof: it shouldn’t be possible for anybody else than the document issuer to issue a document with the same signature
authentic: the issuer should be identifiable by the signature
expiring: signatures must contain an expiration date

An obvious implementation of a document signing mechanism that has all of the above properties is a cryptographically signed hash code accompanying the document; the signature is included in the document and must be submitted by clients as part of their request to other services that need information contained in the document.

JSON web token

A recent implementation that got industry-wide traction is JSON web token (JWT) [2] which can be made to cover all of the above requirements. JWT seems to be used so far only for authenticating users, but there doesn’t seem to be a limitation in principle in extending its use to the cases outlined earlier. A possible issue that may hinder JWT’s adoption is the large range of possible encryption algorithms; some of them encode the entire document in the hash code, which inflates the document significantly, essentially doubling its size.

I’ll discuss a new, proprietary encoding scheme in a subsequent post which leverages JSON by adding a non-redundant signature.

[2016.08.11] If you are interested in an early sample implementation have a look at https://github.com/ggeorgovassilis/client-driven-workflows