Defensive software architecture

In this, hopefully, last post on this blog of the year I will try to gather my scattered thoughts on software architecture and specifically the need for defensive software architecture, and substantiate it with anecdotal evidence from my own project work. I will first talk about my definition of architecture and that of an architect (I do use those terms interchangeably) because there seem to be many definitions none of which directly serve the point of why defensive architecture is important to a software project.

Architecture in a software project is a fuzzy term for a real thing

There are so overwhelmingly many definitions of architecture in enterprises and so many facets of it that I suspect, none of them really serves all needs. The majority consensus seems to be that an architect needs to have many skills and, if in doubt, can do everything [5] which, I think, is an improvement over another view which wants the architect to only produce designs. And this is OK for many reasons.

First, the brain doesn’t store dictionary knowledge but encyclopedic knowledge, so coming up easily with definitions about anything doesn’t exactly feel natural. Information isn’t stored as succinct definitions but as a graph of related examples and experiences. A well-formulated definition of what an architect’s role is would probably leave more questions than answers if not accompanied by sufficiently concrete examples.

Second, Dan North [4] uses in his “business faster” track the quote [3] “all models are wrong, but some are useful“. A model is a set of assumptions about reality derived for a particular purpose, so that we can work with these assumptions and still arrive at useful results. We can debate endlessly about the correct definition of an architect’s role, but if he already fulfils a project need well then that is a good place to start looking for a useful definition.

Third, the job description of an IT architect was only recently derived from that of a senior programmer anyway, so it’s understandable that there may be still dispute about it. My former colleague Olaf Grossmann [2], whom I consider a major influence in how I think about defensive architecture, defines the job description of an architect comprehensively as “shelf for senior programmers who lack people skills but must be promoted nonetheless“.

Adam Smith identifies division of labour [1] as the major productivity enhancement and hence it’s no surprise that we find well-defined roles and separation of concerns in IT projects all over the world. A small project, e.g. a proof of concept or pilot may not require elaborate QA, procurement or project management and hence may not assign dedicated roles to dedicated people; the few people who work in the small project share roles and responsibilities. Larger projects however will have dedicated roles assigned to individuals or groups of people with the distinction of roles and responsibilities becoming more accentuated as projects grow. The space in a project that is left when all roles have been assigned is that of the architect, and that shall be my definition for the remainder of this post.

The architect connects roles in a project

In my vain attempt to define the role of an architect I tried to define other roles I felt more familiar with, but no matter how hard I tried, I never could do it without referring to a further role: a programmer implements the technical design in source code, a functional analyst documents business processes as dictated by the application owner… you get the gist. Assuming that every role in the project is highly specialized and optimized, the only person that makes sure everybody is talking to the right counterparts and that processes are working is the project manager – but that role is often not technical enough to deal with anything that comes after functional analysis, and that’s where the architect steps in.

Defensive (software) architecture minimizes the impact of risks

My job [6] involves reasoning about and dealing with risk in enterprise IT strategy on many levels. Cutting to the chase, an organization effects change through projects → projects have a defined budget, a defined scope and a defined action plan → risk affects the budget, scope and action plan → risk affects the project.

Projects are real processes in the real world and thus have dependencies on factors of the real world: an unanticipated change request, malfunctioning infrastructure, seasonal flu, delayed third party dependencies and legislative amendments are only some indicative risks that negatively impact a project. And just as a seasoned sailor will reef the sail in a storm and make sure no loose cargo is lying around, an experienced IT architect will design processes and systems to deal flexibly with unanticipated risk, which often requires compromises.

Projects use standard tools and techniques to deal with risk:

On the process level there are techniques that deal with fluctuating requirements and uncertain workforce composition, with Agile being the most prominent representative.
Temporary infrastructure shortages are best mitigated with cloud platforms and infrastructure virtualization
The overall development cost and the cost of changes in particular are reduced by thoroughly automating repetitive tasks

The most important deliverable of a software project is the application or system of applications (and infrastructure) under development. Software architecture needs to be drafted with risk mitigation techniques such as Agile, virtualization and automation in mind; the best risk mitigation techniques won’t save our derailing project if we haven’t built the necessary hooks and nozzles into the software.

Software architecture for agility means decoupling and testing

I find short user stories to be the most enticing element of Agile software development; application owners can quickly steer development into a new direction when they please so and software architects can contain the impact of (bad) technical decisions within a few functional components… if the software (delivery process) has been set up with incremental changes in mind.

Component decoupling ensures that components have no dependencies on each other in the source code. Their functional dependencies are implemented through exchanging messages at runtime instead of invoking methods which must be known at compile time. A component can thus be changed in a single unit of work without immediately evolving into a monumental task of adjusting hundreds of components which is far too much work to be carried out by a single person within the sprint and without checking something into source control that wouldn’t compile.

Practitioners of test driven development (TDD) will work in the reassuring comfort of an excellent regression test suite which allows for simple or extensive component changes without unknowingly breaking other functions or components. Proper test coverage becomes even more important with decoupling where that extra safety check of a compiler nagging at incompatible interface changes goes away.

It is not a coincidence that decoupling and TDD walk hand in hand: you can’t test components where there are none, and when components know too much of each other [7] they become untestable as the test environment cannot possibly satisfy all their run-time dependencies. In a research project a while ago I worked on extending a monolithic Flex application which ran in a Flash container on the browser. Since there were no clearly distinct components, every previous attempt at extending the application resulted in copying, pasting and modifying code that “somehow” did something similar. And because Flex required a very specific setup and licensing, unit test couldn’t be run by every developer which quickly lead to many broken windows [8] were nobody dared to touch any code.

Software architecture for virtualization means coding against principles, not implementations

The spectrum of cloud platforms varies greatly from infra-red container provisioning such as Docker to more elaborate container managers like Cloud Foundry and Openshift, with full-fledged service platforms such as GAE or AWS in the far ultraviolet range.

They all have their strengths and weaknesses and properly using a platform’s feature can greatly reduce a code base’s complexity, development and maintenance effort. A feature like Amazon Lambdas greatly increases scalability but requires absolutely 100% stateless application design… not to mention the Amazon cloud which disqualifies it for on-premise operation. Using a platform feature is a considerable bet on stability: you need to know that the virtualization platform won’t change half way through the project and that its features will be available both in production and development.

One of my recent projects was about modernizing several legacy applications to a modern Spring/Angular stack on a cloud platform with extra bonus points for an open-ended installation date of said cloud platform. That meant that we might have had to go into production without ever having seen the target cloud platform but once installed, our applications should be able to leverage the cloud’s features.

We knew from our extensive research that the smallest common denominator for all cloud platforms was: stateless Java web applications, no HTTP sessions, no file system access, no direct communication between application instances, no guaranteed instance uptime, with any persistence or message queues provided by cloud implementations.

Taking these constraints into consideration we knew that our applications had to exhibit certain architectural features if we wanted to make our deadlines:

Environment independence: run on our local machines for development and possibly bare Linux servers until the cloud platform was installed and we could start playing with it.
Dependencies: We also know that we wouldn’t be able to install our own binaries (other than direct application JAR dependencies) in the JVMs used by the platform which included deploying security certificates. The application would have to provide these dependencies as part of the deployment.
Statelessness: we couldn’t store any state in the applications (session beans, stateful beans, HTTP sessions) since the platform guaranteed an uptime only for a group of application instances as a whole but not for any particular instance. Any application state would have to be stored in a cloud-managed persistence store such as an RDBMS or Redis.
Communication: we knew that only HTTP from the load balancer towards application instances was possible; if an application instance needed to talk to another or a service, it would have to do that over a cloud-managed message queue.
Configuration: different cloud platforms have different ways of providing configuration to applications, ranging from special environment variables they pass to our applications up to broker agents.

The solution was in all cases the same: design against a principle and code against an interface. When we needed an environment feature we encapsulated it in a component that would just pass through call to feature rich cloud platforms and otherwise provide a full implementation for simpler cloud platforms. We generally designed our applications to store any state in cloud-provisioned databases and communicate with each other through message-queues, which were again hidden as implementations behind generic communication components. Configuration was thankfully handled by the excellent spring-cloud library which provides a common configuration API to Spring applications regardless whether they run on a developer machine, a Linux server or one of the many cloud platforms.

Software architecture for automation means keeping things simple and walking that extra mile

Computers are good at performing plain tasks fast, people aren’t good at performing any tasks fast. A computer will do something faster than a human or not do it at all, which makes it an ideal candidate for carrying out automated tasks such as building and deploying software packages, running QA tests and preparing reports. We’ll all agree that these tasks are useful, even necessary and that the software development workflow greatly benefits from these tasks being carried out swiftly without human intervention.

Now, that brings me back to “computers perform tasks either fast or not at all”: if the task is complex and requires considerable intelligence, a computer might not be able to execute it. Ordering complex tasks is easy: you just describe to a team member what you’d like to have done and he’ll figure out the rest. Ordering simple, automated tasks is not easy: you have to invest considerable thought, discipline and upfront manual work into redefining a complex task as a sequence of simple, automatable tasks which the computer can carry out and not all tasks are worth it. In typical software projects the automation of aforementioned build- and QA tasks reduces costs to such a great extent that any non-automatable tasks introduced by a user story will stand out at any sprint planning [9] meeting.

I already praised testing as a great way of improving software, and luckily it is a task that can be completely automated, even including end-to-end testing with different browsers [10]. Generating reports of code metrics, building and deploying packages to multiple environments is also easily automatable and in the age of virtualization and cloud platforms, even different environments can be provisioned automatically.

Risk is dealt with on many levels and the architect is the only role that moves on all these levels

How does the definition of an architect stick with all of this? That was probably the longest streak of “bear with me” I’ve asked readers of this blog to… bear with. A quick recap: we talked about the different types of risks that jeopardize a software project, about the tools that projects (generally speaking) have to deal with those risks and how a software architecture needs to be prepared (specifically speaking) to be able to use these tools and hence mitigate risk.

A common pattern should emerge by now: countering risk means dealing with it on many levels which is where people, processes, requirements, infrastructure, tools and software interface. Whoever deals with risks on the big scope needs to have the big picture while at the same time being able to go into details when countermeasures need to become concrete… and that is, by my definition, the architect.

Wrapping it up

We’ve talked about how a software project is threatened by risk on any of its execution levels: infrastructure, requirements, HR topics, regulations. We then went into general project management techniques that deal with these threats on all levels provided that the software design plays along, which is the responsibility of the software architect to assure.