The art of copy & paste in programming

Code structures help fit code into your head

Every programming course starts with showing a “hello world”, moves on to computations, I/O, control structures, data structures and code structures. The necessity for data and code structures doesn’t become obvious until one has written a larger programme. Until then (and often even then) it feels like a necessary evil, a bureaucratic impediment, one we take for the team so that a programme can be read and understood by those who’ll grade our assignment or inherit our project as we move on to new adventures.

In my early programming days I resisted code structures and wrote “code blankets” (σεντόνια for my fellow Greek alumni) which not only made it difficult to pass programmes on to other people, but I soon found out that the lack of structure didn’t help understanding a programme’s intention I had written myself a few weeks earlier. The point being: I didn’t suffer from the effects of bad coding practices because I was coding for my own learning experience and the fun that came with it; other people’s opinions and needs were not a priority at that time.

Code structure conveys intention, not function

Fast forward to adulthood and pay-the-rent needs; intention conveyance and code maintainability have become major concerns in the open source community and our professional lives, so much that they outweigh by far the functional correctness of a programme: in modern economy of scale it is more important that a programme is easily read, corrected and executed by somebody else than being functionally correct but unmaintainable.

For a while I had a personal rule of moving code into a function or class when the functionality was used at least twice in the programme and the resulting total line count wouldn’t exceed the original count. But because code reusability is only one of many concerns alongside implicit code documentation, I soon moved on to grouping code into functions and classes whenever they clearly delineated functional scope.

Why is it so hard to structure code? Because structure is an up-front effort that documents functionality but doesn’t contribute functionality further delaying a working result. Structure pays off only in the long run when the size of a code base requires collaboration or when it would overtax the short-term memory of the main contributor.

Let’s look at this simple programme for calculating the consumer end price (base price + VAT):

function main()
    const vat = 0.21;
    print("Base price:")
    var base_price = read(stdin)
    var consumer_price = vat * base_price
    print("Consumer price: "+consumer_price)
end

and here’s a structured (and slightly over-engineered) version:

class VatCalculatorApp{

var console = getConsole()

function getConsole()
   return new TerminalConsole()
// was: return new NetworkConsole("localhost",8080)
end


function getConfiguration()
   return System.readConfiguration("vatcalculator.ini")
end


function getVat()
    getConfiguration().getProperty("vat")
end


function computeVat(basePrice)
   return basePrice * getVat()
end


function askForBasePrice()
   console.print("Enter base price:")
   return console.read()
end


function showConsumerPrice(consumer_price)
   console.print("Consumer price: "+consumer_price)
end


function main()
   var base_price = askForBasePrice()
   var consumer_price = computeVat(base_price)
   showConsumerPrice(consumer_price)
end
}

That’s quite a lot of code for a simple thing, but it adds a few more features: the VAT is configurable and I/O can be flexibly routed over a local terminal or a network socket. By packaging reusable functions into classes or other code packages such as DLLs or JAR files, code can be reused not only in multiple places in a programme, but it can be shared among multiple programmes.

Code structures create dependencies

This second version however doesn’t come without drawbacks: a function is an API contract between the function provider (the function implementation) and the function caller (the code that invokes the function); changes in the API contract require changes in every caller. Cleaning a programme of code duplication by moving said duplicates into reusable functions is sometimes easier than refactoring code that already builds on reusable functions, because refactoring can be hindered by the API contract of the functions under refactoring. If a function needs to become more configurable, e.g. in our earlier example a VAT function needs to be passed in for variable VATs depending on the product code, then all callers suddenly need to deal with the concept of a configurable VAT; they need to construct the configuration object and pull the various data together. That becomes especially annoying if the old callers don’t need the new functionality because their products fall under the old VAT category.

To make things worse, all caller code that depends on changed code must be tested after the change. Modern test automation tools make this easier, but test have nevertheless to be written (I remind you that I just complained about the overhead of code structure, imagine how I feel about tests; actually it’s not that bad) which would not have been the case if only one instance of replicated code was changed where the caller is the same as the implementation.

Code duplication guards against unintended change

“But”, one might object, “the change to that one code instance means that all other recurrences must also be changed”. That depends on the case. Let’s add a check to the askForBasePrice function that guards against negative numbers:

function askForBasePrice()
   do
     console.print("Enter base price:")
     var price = console.read()
     if price < 0 then console.print("Price can't be negative") 
   while price<0
   return price
end

Seems to make sense, right? It turns out that negative VAT makes sense elsewhere [2] and I just pissed off my Japanese users. By moving code into its own silos one creates dependencies which need to be met at compile- or run-time, leading to well-known dependency problems [1]. There are various approaches to keeping dependency problems at bay, like version management, private packaging, static compilation and containerisation.

Code duplication in the container age

The Go programming language [3] makes dependency management a major priority by taking the opinionated stance that programs should be compilable on all platforms and statically linked at compile time, which means nothing else than that all dependencies are packaged into the programme binary.
Build tools like Ivy and Maven, concepts like application containerisation (e.g. Docker) and micro services as an architectural deployment pattern all complement and advance decoupling as part of a defensive software architecture strategy [4] into what essentially is code duplication at scale bundled with the comfort of build automation.
By bundling entire applications with their dependencies, databases and even operating systems (e.g. VM images) a programmer most effectively defines a stable base for an application, shielding it against many compile- and run-time dependency incompatibilities. In this scenario, reusable code is deployed multiple times as multiple copies throughout a (distributed) system, being used at multiple places in the form of binary dependencies or remote services where most likely even different versions of the code are deployed and operable at the same time.
“But”, one might argue again, “this is not code duplication”. While indeed it may not seem like code duplication at the source code level, it certainly is code duplication at the runtime level and it forces the programmer to think about which versions are deployed where and what the implications are, ranging from nuisances like the precise meaning of a singleton [5] to service discovery [6] and API versioning [7]. Even better, by automatic code duplication through deployment automation, the pain of structuring code “just because” is deferred to even never, because each programme is complete in its isolated view.