This post is about software application architecture patterns that simplify application design by leveraging cloud features.

New: Download the cloud pattern cheat sheet

Packaged configuration

Packaged configuration cue card

What

Configuration is packaged with deployment artefacts

Motivation

Simplify system, increase resilience by removing runtime dependency on configuration service

How

Configuration is managed in configuration repository, CI/CD combines generic application artefact with stage/tenant-specific configuration and deploys it

When

Multiple stages / tenants, build pipeline is flexible

Cons

No runtime update of configuration, configuration changes require redeployment

PaaS products often come with a facility for distributing configuration such as 3rd-party systems (e.g. spring cloud config [1]), tuning parameters and i18n (locale-specific translations of UI elements) to services and applications at run-time, which means that:

facility needs to be maintained
it is critical for operations and uptime
it need a service discovery mechanism
applications need to implement a specific configuration API

Application configuration in a cloud environment leverages the fact that deployments are fast and cheap enough that configuration can be deployed together with the deployment artefact (e.g. a WAR file). The illustration depicts a possible implementation of that pattern: a CI server produces the application binary and stores it in an artefact repository, a further CI stage then pulls the deployment artefact from said repository and combines it with configuration files from a dedicated configuration repository into a stage/tenant-specific deployment artefact. That artefact is then deployed to the cloud runtime and alternatively stored again in the artefact repository.

The “Packaged configuration” architectural pattern moves a runtime concern into the build phase, thus eliminating a central runtime service, simplifying the system design and increasing its reliability. It does, however, require a powerful build pipeline and managed access to the configuration repository. Also, configuration updates require redeployments which may affect system uptime (but not necessarily, see later about “Swarm Uptime”).

Natural multi-tenancy

Multi-tenancy pattern cue card

What

Each logical tenant and/or stage is an isolated installation

Motivation

Simplify system by designing only for single tenant use

How

Each tenant runs in a dedicated and isolated environment

When

Groups of users requiring isolated setups

Cons

Problematic when users have access to multiple tenants

A typical multi-tenant-capable application is able to run multiple tenants on a single appliance, which requires that the tenant differentiator is present on all APIs, sessions and persistence entities. Furthermore, the application needs to include access control logic which hides other tenants from a logged in user. Furthermore multi-tenant applications can’t scale easily for selected tenants but only as a system.

Cloud platforms simplify multi-tenancy by provisioning isolated and distinct environments for each tenant or stage, allowing application designers to push tenant access control logic to the cloud environment (also see “Security and access control”).

Possible limits to this pattern are use cases where information needs to be exchanged between tenants, e.g. users who can access multiple tenants during a session, request reports that span multiple tenants and exchange information between tenants.

Swarm uptime

Uptime pattern cue card

What

Combining uptime of multiple, unreliable service instances into high application uptime

Motivation

Lower cost by using cheap, volatile instances

How

The cloud runtime sends tasks to running service instances, but not to failing instances. Failing instances are restarted automatically.

When

App instances don’t rely on internal state, or that state can be restored easily. App instances boot quickly.

Cons

Containerised applications must be able to handle arbitrary restarts

In the good, old days of computing, application uptime was bound by the upper limit of server uptime… thus server maintenance also brought down the applications running on it . Cloud-aware applications distribute tasks to multiple service instances on multiple servers, so a true application downtime occurs only when large parts of the data centre are offline. For all of that to work, service instances need to be able to start quickly and be either stateless or resume conversations with clients from a serialized persistence store.

Automated maintenance

Automation pattern cue card

What

Offload maintenance tasks to runtime platform

Motivation

Simplify application & project design

How

The cloud runtime handles maintenance tasks transparently

When

Periodic instance restarts to combat latent resource leaks, clean filesystem, backup, store logs, produce metrics

Cons

Can’t handle overly specific tasks

Regardless how diligent a programmer I am, at some point my application is going to fail inexplicably in production; it won’t handle more requests, won’t connect to a database, run out of memory or just consume all CPU without any obvious reason. The immediate solution is almost always to restart the application and, in tougher cases, the server. Some times the issue might occur too rarely for a debugging and fixing cycle to make economic sense, then a nightly cron job restarting the server is an acceptable workaround. This is just one example of a maintenance tasks, more being making backups of local data and configuration, rotating and storing log files, and collecting metrics. Much of that had to be previously implemented either in the application or scripted by administrators in cron jobs or shell scripts.

Cloud runtimes handle such jobs routinely as part of their service description, which simplifies both developers’ and administrators’ life, not to mention change management. Tying in with “Swarm uptime”, that’s another great example how cloud runtimes simplify application design.

Outscale caching

Pattern cue card

What

Achieve high application performance

Motivation

Simplify application design by avoiding the complexity of caching

How

Scale service instances instead of caching data

When

Caching (and cache invalidation) are overly complex for the domain, service instances don’t rely on internal state, service invocation cascades are shallow, latency requirements can be met

Cons

Potentially resource-hungry

The primary motivation for the “Outscale caching” pattern is to throw hardware at a problem. While caching is the second best way to speed up a computation (the first best way being avoiding the computation in the first place…), it doesn’t come without issues: caches need to be primed first (slowing down startup), they consume memory and they hinder scalability as they need to be synced. Caching is a trade-off between staleness and performance. Invalidating caches can be tricky, syncing caches between app instances too.

Service discovery

Service discovery pattern cue card

What

Broker between service instances

Motivation

Simplify application design by avoiding the complexity of service discovery

How

The cloud platform injects service location information as part of “Packaged configuration” and routes requests dynamically at runtime

When

Always

Cons

None

Configuring an application can involve providing IP addresses and ports of other services, with DNS helping a bit by mapping names to IP addresses. When service locations change, then all dependent services need to get the new configuration, or wait till DNS updates the stale entry. More downtime ahead.

A cloud runtime will orchestrate dependent services between each other by not only injecting aliases for their locations as part of “Packaged configuration”, but by also dynamically routing requests to life service instances and around failing instances (see “Swarm Uptime”). Some modern micro service architectures I have seen implement service discovery with brokers like Eureka in order to remove the dependency on a particular cloud runtime, introducing more complexity.

Security and access control

Security pattern cue card

What

Secure application easily

Motivation

Simplify application design by delegating (more) access control to the cloud runtime

How

Less infrastructure is exposed, access is controlled by the platform

When

Always

Cons

Reduced flexibility

Cloud runtimes famously eliminate the server from a system overview as servers are replaced by containers which are volatile and run with minimal software and privileges. Access paths to applications are strictly defined through the cloud network routers and subject to the cloud’s own access control. Last not least, environment isolation reduces the need for application-level access check logic because the cloud runtime won’t permit certain users to access an application.

Infrastructure as code

Infrastructure pattern cue card

What

Configure infrastructure through (versioned & audited) code

Motivation

Simplify change management

How

Infrastructure as code allows defining entire systems through declarative scripts

When

In a cloud environment

Cons

Hard to get used to, slows down development, won’t work on non-cloud-native infrastructure

Infrastructure as code is a major contributor towards repeatable deployments and software defined infrastructure. Developers or administrators prescribe the exact constitution of a system in a set of configuration files with all its components, deployment artefacts and dependencies. The configuration resides in a versioned and audited source code repository from when there cloud runtime obtains configuration and constructs suitable application environments.

A possible downside is that trial and error development can slow down to a standstill when every minor change has to be coded into configuration and then deployed to a cloud environment as, despite all advances, typical enterprise applications will still take several minutes to start even on contemporary cloud platforms. Furthermore, the development lifecycle of applications relying on 3rd party systems that do not reside in the cloud (as is sometimes the case with mainframe services and databases) won’t benefit as much from infrastructure as code.