# Sustainable software development

Sustainable software development requires a theory of control that takes into account the complex nature of the development process. Simplistic models that do not account for interdependence often lead to unnecessary overhead and fail to recognize and eliminate bottlenecks in the development process, leading to poor sustainability. In this article we propose a simple yet expressive theory that allows us to model sustainability and develop effective control mechanisms for sustainable software development.

## Introduction

What do software development and complex numbers have in common? Don’t worry, we are only going to talk about complex numbers in a few paragraphs of the introduction.

### Interdependence

If you are new to complex numbers, there are many approaches to developing an intuition for this object. Complex numbers can be introduced as the set of numbers that satisfy algebraic equations that cannot be solved in the domain of real numbers, for example: $$x^2+1=0$$. The idea is to extend the set of real numbers to contain solutions for them.

Another, geometric, approach introduces complex numbers as points on a two-dimensional plane with algebraic operations defined on them, like addition and multiplication. This approach works best if you already have a solid understanding of real numbers and basic geometry. In a way, it uses the concept of real numbers as the building blocks for this new construction, allowing you to reuse any knowledge and intuition you already possess.

Yet another way to approach complex numbers is to consider them as abstract entities that can be “observed” by projecting them onto a simpler domain (of real numbers).

An interesting aspect of the last approach is the clear distinction between the independently existing abstract numbers and our more or less concrete but simplified perception of them. Our view of a number, be it the real or imaginary projection, contains only limited information about it, and in order to fully characterize the number, we must take both projections into account.

An implication of that is that as long as we perform only certain operations on the numbers – or dealing with an operationally closed subset of them – we can somewhat blissfully ignore the distinction between the simplifying projection and the number itself. For example, adding two complex numbers yields a number whose real part is equal to the sum of the real parts of the addends. Analogously, multiplication of two numbers yields a number whose real part is equal to the product of the real parts of the multiplicands – as long as the imaginary parts of the operands are zero. We might effectively substitute the real projections for the numbers themselves, because the projected behavior is consistent with the arithmetic laws of the real numbers, which we are accustomed to.

But if we forget about the fact that in reality we are dealing with complex numbers and confuse the projection with the number itself, we will notice that the projected behavior is not consistent with the laws of real arithmetics. In particular, multiplying two complex numbers whose imaginary parts are not zero yields a number whose real part is not the product of the real parts of the multiplicands. For example, given $$a=1+i$$ and $$b=2-i$$:

\begin{align*} \Re(a)=1 \\ \Re(b)=2 \\ \Re(a)\cdot\Re(b)=2 \\ \Re(a\cdot b)=3 \end{align*}

Confusion can arise when we do not know a priori whether we are dealing with phenomena involving complex or real arithmetic.

The main takeaway here is that since our theories are based on perception, even if they happen to be accurate most of the time, confusing perception with reality can lead to models that will fail us in unexpected ways. The reason for this is interdependence, and we will discuss it in more detail later. Accordingly, when we base our decisions on models that fail to accurately describe important aspects of reality, we will get into trouble.

The same thinking applies to software development. The software development process is a complex dynamic system that we cannot grasp completely by itself – we can only develop an understanding of it by taking projections. This allows us to reduce complexity to a manageable level, but special care must be taken not to oversimplify. Similarly to complex numbers, if we perform certain operations on a software system (extend it, change it, etc.), a simplistic model might yield reliable results most of the time, but fail in some other cases, especially if the chosen model happens to be single-dimensional. There are hidden variables that influence the result of operations significantly and unexpectedly.

In this article we are going to present a simple (but not too simple) and expressive model of the software development process, which will allow us to study sustainability and show us how to make better decisions. Furthermore, we will show that the software development process cannot be effectively reduced to a single-dimensional measure of progress and effective control cannot be based on that simplistic measure. We will explain why this is true and provide implications for software practitioners. Finally, from this point forward we will not talk about complex numbers any more. But before we introduce the theory more formally, let us define the problem context in some more depth.

### Growth

Software development is essentially about growth. Computation requires algorithms to be expressed in code; source code is text; and writing is an inherently incremental process. Although there are many ways to structure that process, it is basically impossible to produce any amount of text instantaneously; all texts have been (in fact, must have been) written one word at a time. Change also can be considered a variation of growth: old features die off and are eventually removed from the codebase, whereas new features and capabilities are developed and integrated into the system.

One important aspect of growth is sustainability. What does sustainability mean in the context of software development and what conditions must be met for growth to be sustainable?

There are many paradigms of growth in software engineering, albeit not labeled as such. Eventually, one of them has become de facto standard: continuous integration. It enables organic growth by continuously augmenting the system with small increments, keeping it healthy at all times. Sustainability in the context of this paradigm means that the number and size of the increments is kept at an acceptable rate, even when the system has been growing considerably. In other words, sustainable development means that the system is scalable with regard to scope.

Inquiring on the second question (regarding conditions for sustainability) is a central theme of this article. Our hypothesis is that for organic growth to be sustainable, two distinct but interdependent system qualities must be balanced: structure and function. Other authors considered similar decompositions, for example: internal (conceptual) vs. external (perceived) integrity, or system capability vs. architectural runway. There is a significant overlap between these concepts, and they can be generalized into one general concept.

A system cannot function without structure. For example, an organism that is able to run requires a solid skeleton to which muscle is attached. Similarly, cognition requires a brain, and metabolism requires a digestive system.

On the other hand, structure must be shaped by function. Every software system has a specific purpose and that purpose demands a specifically tailored architecture. Although sometimes there is a tendency to build “one framework to rule them all” (especially with junior developers), experience shows that this simply does not work.

The main point here is that structure and function are strongly interdependent. Taking only one of these aspects into account and ignoring the other one is simplistic and often leads to misappropriate actions that might solve acute problems in the short term but worsen the situation in the long term (or do not work at all). We will come back to this theme later in the article.

## Theory

Here we will present a simple theory of growth in software development. We begin by defining the concepts more formally.

### Dimensionality and interdependence

As mentioned in the introduction, complex systems including software are inherently multi-dimensional. It is not possible to reduce a complex system to a simpler one (of lower dimensionality) without losing significant information. Our view and understanding of the system must be based on multiple projections that we take in order to reduce the complexity of individual components to a manageable level; we call the target domains of these projections components or aspects.

These components influence each other. Although there is no universally right number of dimensions for decomposition, the majority of phenomena can be explained by projecting the system into two or three components. For our purposes of analyzing software systems, we choose to call them function (or functional capability) and structure (or structural integrity). We denote the software system under development with $$S$$; its functional and structural projections will be denoted with $$S_1$$ and $$S_2$$ accordingly.

### Force and change

Software systems can only be changed (grown) by applying some force (effort). In the particular domain of software development, writing new code and refactoring existing code can be considered an application of force.

Like complex systems, force itself can be decomposed into its constituent parts. For example, force and its effect on the system can extend along multiple dimensions: functional change (incrementation) or structural change (refactoring). The fact that refactoring is possible in the first place indicates that our choice of decomposition into function and structure makes sense.

Analogously to the software system, we denote the force applied to the system with $$F$$; the functional and structural projections of it will be denoted with $$F_1$$ and $$F_2$$ resp. The total magnitude of the force that can be generated in a time interval is the capacity of the development team. We denote the capacity with $$C$$ and express its relation with force as $$|F|=C$$, where $$|\cdot|$$ is the sum-norm: $$F_1+F_2=C$$. We will return to this assumption later.

### Effectiveness and velocity

Systems tend to resist or amplify change. As mentioned above, it is not possible to change the system directly; force must be applied and transformed into effect instead; how the applied force translates into actual change is called effectiveness. It depends on the magnitude and composition of the force, as well as the current state of the system, especially its current interplay between its functional and structural components.

For example, implementing a new feature on a messy codebase is difficult; in extreme cases it is practically impossible to add new features without breaking some other critical parts of the system, causing more harm than good. This can be considered a negative effect.

We model the transformation of force into effect by multiplying it with an effectiveness model, that we denote with $$E(S)$$ or simply $$E$$. Again, the functional and structural projections of it will be denoted with $$E_1$$ and $$E_2$$ resp. Formally, the effect on the system can be expressed as $$E\circ F$$, where $$\circ$$ denotes pointwise multiplication.

Agile methodologies coined the term velocity roughly denoting the amount of estimated work a team can accomplish in a time interval of some fixed length. It is estimated by the team based on prior experience of working with the system and is usually measured in story points, which is a relative measure of complexity.

We generalize this notion of velocity (denoting it with $$V$$) by equating it to the actual effect a team produces on the system, that is: $$V=E\circ F$$.

### Technical debt and architectural runway

Before we proceed and put everything together, let us consider how the ideas expressed so far relate to a selection of other similar concepts in sustainable software development. Some metaphors have been proposed by different authors, but many of them essentially describe very similar things. We are going to consider two of them.

The term technical debt was introduced by Ward Cunningham in 1992. It is a metaphor used to describe the negative effect of postponed refactoring. Martin Fowler calls it cruft.

Though certainly illustrative, a weakness of this notion lies in the fact that there is generally no such thing as too little debt. Perhaps, staying in the realm of this metaphor, we can call it technical credit, but for some reasons this term has never been used in the literature.

Architectural runway is a concept used in SAFe describing the existing code and infrastructure needed to implement features without massive refactoring. Is essence, this is the opposite to technical debt.

It is remarkable that the SAFe methodology explicitly distinguishes development of new capabilities (features) from implementing architectural enablers (extending the architectural runway). However, the concept of architectural runway is still defined negatively (lack of necessity for refactoring). We will combine this and the concept of technical debt into one coherent concept.

An interesting observation is that some business stakeholders seem to have a blind spot on the technical debt and credit. At best, they consider implementing architectural enablers as a necessary evil, and developers sometimes even must resort to begging to be allowed to perform outstanding refactorings. Structural integrity of the system is not always considered an asset that increases agility and economic viability of the project.

Still, in some cases it is indeed more critical to deliver as quickly as possible, consciously ignoring the long-term effect on sustainability. We will not cover these dynamics here, although we believe it would be a valuable extension to the theory.

### Growth

Let $$S$$ denote a software system under development. We model system growth as incrementation of its function or structure that is induced by applying a force $$F$$ and express it formally as:

$$\Delta S = E\circ F$$

where $$\Delta$$ denotes differentiation and $$E$$ denotes the effectiveness model. Many effectiveness models are conceivable, but from this point forward, we will focus on a simple yet sufficiently expressive one – the linear effectiveness model:

$$E = Q + R\cdot S$$

where $$Q \in \mathbb{R}^2$$ and $$R \in \mathbb{R}^{2\times 2}$$. In the simplest case:

$$Q=\left(\begin{array}{rr}1\\1\end{array}\right)$$

$$R=\left(\begin{array}{rr}−1&1\\1 & −1\end{array}\right)$$

We call this the simple effectiveness model. It is equivalent to:

\begin{align*} V_1 &= (1 − S_1 + S_2)\cdot F_1 \\ V_2 &= (1 + S_1 − S_2)\cdot F_2 \end{align*}

### Discussion

Why is the linear effectiveness model shaped like that? Here we are going to discuss the parts of the model in some detail and explain our reasoning for this structure.

$$Q$$ is the invariant component, corresponding to a linear transformation of force into effect, independent of the current state of the system ($$S$$). Of course, in reality things are not that simple. The interesting part is the $$R$$ matrix that models the influence of the individual components of $$S$$ on $$E$$. Is will be explained next.

Note: in the simple effectiveness model we set the magnitude of every value to 1 to keep things simple, but this is not as important as the sign; positive numbers signify a supporting factor (increasing effectiveness), and negative numbers signify a constraining factor (decreasing effectiveness). Actual values should be obtained empirically.

$$r_{1,1}=-1$$. This is the change resistance of the system; it grows with the functional scope. The minus sign means that its effect is constraining (hence resistance). The sheer size of system makes it more difficult to handle. This should be intuitively clear, since the larger the system, the more details must be taken into account and the greater the risk of breaking things.

$$r_{1,2}=1$$. This is the change support of the structure (technical credit, architectural runway). This number is positive; it has a supporting effect. The higher the structural integrity of the system, the easier the change. This should also be intuitively clear because a system with a clean and modular architecture allows for faster development and increases agility.

$$r_{2,1}=1$$. This is the functional driver for the system structure (design, architecture). The larger the functional scope of the system, the more straightforward it is to design a particular architecture fit for that purpose. In other words, the more information is available about the intended function of the system, the easier it is to design it. As explained before, structure must be guided by function, and this factor models that relationship.

$$r_{2,2}=-1$$. This is the churn overhead (refactoring resistance). As there is no such thing as the perfect structure for a system, trying to achieve perfection (creeping elegance) generates a lot of waste (churn). This has negative sign, meaning that this is a constraining factor, reducing effectiveness.

Plotting the simple linear effectiveness model yields the following picture:

Remember that this is a model (a thinking tool) and the real effectiveness function and resulting effectiveness vectors can look different in a real project. Nevertheless, the picture is illustrative.

As stated above, the effect of applying a force (effort) on a system is expressed as a simple differential equation. A software system under development can be thought of as a point being pushed through a field. When the applied force $$F$$ is multiplied with the effectiveness vector $$E$$, its effect can be amplifying or constraining (overhead, friction, churn). The following diagram illustrates this. Note that the resulting effect (shown to the right) is smaller in magnitude than the applied force (shown to the left) because all components of the effectiveness vector are less than 1:

The system can be pushed in different directions. For example, it is possible to extend the functionality of the system by implementing new features in a quick and dirty manner (this equates to moving strictly to the right in the diagram). However, if continued long enough, at some point the induced resistance will increase to a level at which further incrementation becomes essentially ineffective (the overhead of working with a mess becomes too high).

This condition can be resolved by “going north”, gradually increasing the structural integrity of the system. However, this maneuver has a natural limit too: refactoring indefinitely will result in a state where any additional refactoring becomes ineffective due to high amount of churn. Remember that structure and function are interdependent.

Perhaps a more telling picture emerges when we decompose $$E(S)=Q+R\cdot S$$ into its additive components $$Q$$ (invariant effectiveness) and $$R\cdot S$$ (state dependent effectiveness) and plot only the $$R\cdot S$$ part:

Some interesting points can be made by analyzing this picture. First, there is an implicit ideal curve (a diagonal line going through the origin in the linear effectiveness model) on which the resistance is minimal and hence effectiveness and velocity are maximized. Assuming that there is a limit, we will call it eigen velocity of the system ($$V_E$$). We will define it formally later when we discuss convergence.

Second, the further away the system is positioned from this ideal curve, the greater an imaginary attracting force pulling the system back to a balanced state. The development team senses this force by observing that there is significant waste and overhead in the process. The direction of that force points to a point on the ideal curve with the shortest distance. This manifests itself in developers demanding more refactoring, or, when the imbalance is to the opposite side, in impatient product owners.

As we will show shortly, the challenge of maintaining sustainability lies in the fact that in practice the position and shape of the ideal curve (and the value of the system eigen velocity) is not known a priori and cannot be perceived or measured directly. This makes control mechanisms described later necessary.

## Application

Up to this point we have been introducing the theory, but have not yet shown how it can be applied to describe and explain phenomena in practice and help us make better decisions. In this part we are going to provide that.

First, we are going to introduce the ideas of pressure and control. As mentioned previously, since we cannot know or observe system eigen velocity directly, we need a feedback mechanism for influencing development velocity indirectly, by changing other variables that we can control.

### Pressure

Control is essential for maintaining sustainable growth. A software development process out of control tends to lose momentum, has unpredictable performance and, since capacity costs money, has a higher risk of becoming economically not viable (get canceled).

But what should be controlled and by what means? There are many variables that can be influenced; the most obvious one is the functional scope: what and how many features should be developed next? In the agile model, this control function is shared between the project owners and the development team, who negotiate the intended change in scope for the next iteration in the planning procedure.

This approach works well if the condition holds that the team capacity is equal to the norm of the force (effort) applied to the system, that is: $$|F|=C$$. In other words, at any point in time, either the team is implementing a new feature or it is refactoring – there is no third alternative. Capacity is not being wasted.

With this assumption it is possible to introduce a notion of control based on the concept of pressure: the ratio of the functional component of the force to the overall capacity:

$$P=\frac{F_1}{C}\in[0,1]$$

Here, without loss of generality, we are ignoring the qualitative aspect of scope (what features to implement), assuming that features are ordered by priority and only the top prioritized features are selected for implementation, and focus only on the quantitative aspect (how much to implement). With this, $$F$$ can be defined as a function of pressure and capacity:

$$F=C\cdot\left(\begin{array}{c}P\\1−P\end{array}\right)$$

Pressure is a variable that can be directly and easily changed. For example, most teams maintain all remaining work in the backlog. Work items are often categorized into items that increase or change the functional scope of the system (user stories, features) and items that improve the structural integrity of the system (enablers, refactoring).

If the work items are tagged with estimated story points, pressure can be operationalized as the sum of story points estimated for user stories selected for development in the upcoming iteration divided by the total sum of story points of all work items scheduled for implementation in that iteration.

### Convergence

When pressure and capacity are kept constant, velocity converges to a limit over time. With:

$$V=E\circ F$$

$$F=C\cdot \left(\begin{array}{c}P\\1−P\end{array}\right)$$

we can study the behavior of the velocity limit as a function of pressure $$P$$ and capacity $$C$$. Assuming the simple effectiveness model yields:

$$\lim_{t\to\infty}V_1=2\cdot C \cdot P \cdot (1−P)$$

With $$C=1$$, plotting functional velocity limit against pressure results in the following diagram:

There are some interesting points to be noted about the shape of this function. It is a typical U-curve (turned upside down), following that we are dealing with an optimization problem: sustainable performance can not be achieved by setting pressure to one of the extreme values. Second, the curve at the optimum is relatively flat, meaning that it is not important and in fact unnecessary to hit the value exactly; operating near the optimum is mostly good enough.

Furthermore, we define system eigen velocity $$V_E$$ as the optimum limit velocity that a team can generate. In the diagram above this corresponds to the point at $$P=0.5$$.

### Control

We have already mentioned that effort is most effective when the actual velocity corresponds to the eigen velocity of the system. Unfortunately, is not possible to know a priori what the eigen velocity of the system is, and it cannot be observed or measured directly. Hence we do not know whether we are operating efficiently or significant improvements are possible.

However, we can observe the effect of changing the pressure (let’s call it corrective action: increasing or decreasing pressure). This leads to the following observation: when changing pressure leads to an improvement of long-term velocity (limit), the corrective action is adequate, otherwise its effect is detrimental to our goals.

### Responsiveness

Responsiveness is the delay between the introduction of a corrective action and the long-term manifestation of its effect. It depends primarily on the capacity of the team and the state of the system.

If responsiveness were infinite, software process control would be easy. However, as we will see in the example in the next part, this is simply not the case. Raising the pressure always leads to short-term gains and lowering the pressure leads to short-term losses, but the nature of the long-term effect depends on the state of the system.

In the next chapters, we will discuss an example of how the theory can be applied by using a numerical simulation of the development process assuming the simple effectiveness model (applying the simple Euler method).

## Example

Consider a development process assuming the simple effectiveness model with the initial pressure level set at 70% (P, the red line). The functional eigen velocity is at 50% (VE, the dotted line). Initially, because of the relatively high pressure, the actual functional velocity is higher than 50% (V1, the blue area), but due to accumulation of technical debt resulting from little time spent in maintaining the structural integrity of the system, functional velocity gradually decreases and eventually drops below 50%:

What action should be taken in order to stop further dropping of velocity? Could we stabilize the functional velocity at around 50%? This should be possible because this value corresponds to the functional eigen velocity of the system. There are two options, and though both are painful (the damage has already been done), only one of them is sustainable. The first one is to increase the pressure, the second is to reduce it. Let us consider what happens if we take the first one. Raising the pressure to 90% leads to this:

Although this results in a short spike of increased velocity, the long-term effect is devastating. In systems thinking, this condition is generally known as shifting the burden. Shortly after raising the pressure, the actual velocity drops to 40%, this time arriving at that point much sooner than before.

Now what? Raising the pressure again does not seem to be working, but leaving things as they are will lead to further deterioration of performance, eventually leading to a condition with the only available options being: kill the project, increase team capacity (throwing more people at the problem) or reduce pressure. We will not consider killing the project and adding more people to the team as viable alternatives and focus on the strategy of reducing pressure instead.

Let’s assume that at t=21 the project manager responsible for raising the pressure to 90% is fired and a new project manager has taken charge of the project. He is evaluating his options. What is he to do to save the project? The only viable option is to reduce pressure to a level that allows the system to regenerate. After securing support from his superior manager, he adjusts the pressure down to 60%:

Now two things can be observed: first, reducing pressure leads to an immediate and drastic drop in velocity (from 40% down to 25%). Ouch! The reason for this is simple to explain: even if the team starts to spend more of its capacity on refactoring of the codebase, it still takes time for the system to recover. Second, after some time, the system starts to regenerate. At t=31, velocity is back at 40%, but at this point with a cleaned up codebase, and the trend is upwards. Whew! Confident that his strategy has been working, the new project manager decides to lower the pressure again; at t=31 he lowers it further to 30%. This has the following effect:

The velocity curve exhibits similar behavior as before and eventually converges at 42%. Was dropping pressure to 30% a good decision? No, because at some point additional structural work leads to a high churn rate. Just leaving the pressure at 60% would have led the project to stabilize at 48% velocity (which is pretty good for 50% eigen velocity), but that level of performance cannot be attained by spending only 30% of capacity on implementing features and 70% on refactoring.

There are two interesting observations that can be made by analyzing the unfolding of this scenario. First, well intended corrective actions caused the system to deteriorate in the first place. Without any corrective action the system would sustainably perform at around 42%, which is not optimal but still better than the eventual state attained through active regulation.

Second, the effects of corrective actions did not manifest themselves immediately, but with considerable delay. This and the fact that the system eigen velocity cannot be known a priori are the reasons why control is essential and important to be implemented grounded on a solid theory.

## Sustainable software development: Conclusion

We discussed a simple theory that can be used to model sustainable software design and control based on pressure in software development. There are many ways to extend and refine the presented theory. For example, tooling automation can be studied and modeled as an additional projection, and its effect represented in the effectiveness model accordingly. Experience suggests that spending some time on automation can slow down delivery initially, but minimize cumulative manual overhead – another form of waste – in the long run.

Another possibility is to model the interdependencies between delivery performance, team capacity and culture. Usually, only economically viable projects get funded and staffed in the long run. Successful products generate profits, which usually leads to allocating more resources to the project. However, it is an interesting question whether this dynamic results in a positive or negative effect on the overall performance and how established culture can amplify or hinder agility and development effectiveness in general. Sometimes prior success generates conditions for future failure.

Yet another extension can be to consider how effective delivery generates value, taking the cost of delay into account. In some circumstances short-term delivery speed is more important than long-term sustainability. For example, failing to quickly acquire market share, leaving it to the competitors, would put the project to a substantial disadvantage. Including this dynamic into the model would allow for more accurate decisions.

Finally, although we believe that the theory and the models presented in this article are solid and useful, validating them empirically remains to be done. We already provided some ideas on how to implement simple measurements for variables like pressure. However, measuring concepts like delivery performance or structural integrity requires careful selection of appropriate proxy variables. Work in this field has been conducted by other authors and it remains to be seen whether the methods they employed can be used to validate this theory.

### Acknowledgements

Many thanks to Maria Kremena for creating the beautiful illustrations.

## Andrey Skorikov

Andrey Skorikov is an IT consultant at codecentric AG in Karlsruhe. His focus is on software quality, test automation and continuous integration/delivery.

## More content about Software Engineering

Software Engineering
Software Engineering