I was recently invited to write an article based on my doctoral research for the Pragmatic Architect column of IEEE Software magazine. It turned into an article that was too large for one column, so we split it into two. The first, Agility, Risk, and Uncertainty, Part 1: Designing an Agile Architecture, was published in the March/April edition of the magazine, and the second, Agility, Risk, and Uncertainty, Part 2: How Risk Impacts Agile Architecture, was published in the May/June edition.
If you don’t subscribe to this magazine, you can read part 1 in a previous blog post, and part 2 below.
Abstract: The amount of technical risk (and the underlying uncertainty) in a software development project can affect the amount of architecting that developers perform up-front. Software architects must determine the proper balance between risk and agility for their projects.
The classic definition of risk is the product of the probability of failure and the impact of that failure. My research participants stated that the main reason for a high probability of failure is system complexity. Complexity has three facets: scale (the number of things being considered), diversity (the number of different things), and connectivity (the number of relationships between the things) [7]. For agile architecture, complexity is caused by ASRs that perhaps conflict with each other, have a small solution space, or require compromises. Other sources of complexity are legacy systems (that are no longer being actively maintained) and the integration of multiple systems, an increasingly common requirement.
Other sources of failure include unique problems that haven’t yet been solved and the use of unknown or new and unproven technologies.
For critical systems, the cost of failure is high—perhaps people might be harmed or even lives lost. So, teams put in more architecture design effort to reduce risk. For example, compared to a team building a corporate website, a team building a medical system on which lives depend will require significantly more architectural effort and scrutiny to ensure that the ASRs are met before development starts.
Another consideration is the team’s and customer’s risk tolerance: a risk that’s acceptable to one team might be unacceptable to another. One team building a procurement website might deem the uptime to be of utmost importance. If a website visitor can’t sign up or make a purchase because the website failed, he or she will likely become lost to the business. In contrast, the stakeholders of a team building a competing system might be satisfied with a less robust system and possible financial loss if the system fails, if that means they can get their system live sooner (and start generating revenue earlier) for lower cost.
If you don’t subscribe to this magazine, you can read part 1 in a previous blog post, and part 2 below.
Abstract: The amount of technical risk (and the underlying uncertainty) in a software development project can affect the amount of architecting that developers perform up-front. Software architects must determine the proper balance between risk and agility for their projects.
In part 1 [1], I introduced the findings of my research into the dilemma software architects face in agile environments: determining the effort to put into architecting up front, before development [2]. To reduce that effort as much as possible, agile architects try to design an agile architecture—that is, they use an agile process to create a modifiable, change-tolerant architecture. To do this, architects can use five tactics: keep designs simple, prove the architecture with code iteratively, use good design practices, delay decision making, and plan for options.
Here, I examine how risk (and the underlying uncertainty) affects the amount of architecting that developers must perform up-front.
Here, I examine how risk (and the underlying uncertainty) affects the amount of architecting that developers must perform up-front.
Technical Risk
Along with benefits, agile architecture introduces technical risk, which relates to the uncertainty of technologies and design or the system itself. A major source of this uncertainty is a poorly designed architecture [3]. For example, decisions might have been made without sufficient effort and evidence that they satisfy architecturally significant requirements (ASRs). Perhaps the decisions were made implicitly, resulting in an emergent architecture. Such decisions can lead to a system that fails to meet ASRs and, in extreme cases, fails completely by “dying the death of a thousand cuts.” [4]
Preventing failure means more design effort to reduce uncertainty to a level that’s satisfactory to the team and stakeholders. The greater the risk and the wider its impact, the earlier it must be addressed, and the more detailed the architecture must be. Risk has always been a key driver of architecture design. For example, Tom Gilb’s 1988 book Principles of Software Engineering Management has often been quoted: “If you don’t actively attack the risks, they will actively attack you.”[5]
George Fairbanks proposed an approach in which the amount of architecture design is determined entirely by the need to reduce risk to a satisfactory level [6]. This reduction is done on the basis of either no up-front design, some yardstick measure based on a fixed proportion of time, a comprehensive set of design and documentation techniques, or an ad hoc approach that decides how much design each project requires. Fairbanks commented that although the ad hoc approach is the most common, it’s also highly subjective and provides no lessons on which to base decisions for future projects.Preventing failure means more design effort to reduce uncertainty to a level that’s satisfactory to the team and stakeholders. The greater the risk and the wider its impact, the earlier it must be addressed, and the more detailed the architecture must be. Risk has always been a key driver of architecture design. For example, Tom Gilb’s 1988 book Principles of Software Engineering Management has often been quoted: “If you don’t actively attack the risks, they will actively attack you.”[5]
The classic definition of risk is the product of the probability of failure and the impact of that failure. My research participants stated that the main reason for a high probability of failure is system complexity. Complexity has three facets: scale (the number of things being considered), diversity (the number of different things), and connectivity (the number of relationships between the things) [7]. For agile architecture, complexity is caused by ASRs that perhaps conflict with each other, have a small solution space, or require compromises. Other sources of complexity are legacy systems (that are no longer being actively maintained) and the integration of multiple systems, an increasingly common requirement.
Other sources of failure include unique problems that haven’t yet been solved and the use of unknown or new and unproven technologies.
For critical systems, the cost of failure is high—perhaps people might be harmed or even lives lost. So, teams put in more architecture design effort to reduce risk. For example, compared to a team building a corporate website, a team building a medical system on which lives depend will require significantly more architectural effort and scrutiny to ensure that the ASRs are met before development starts.
Another consideration is the team’s and customer’s risk tolerance: a risk that’s acceptable to one team might be unacceptable to another. One team building a procurement website might deem the uptime to be of utmost importance. If a website visitor can’t sign up or make a purchase because the website failed, he or she will likely become lost to the business. In contrast, the stakeholders of a team building a competing system might be satisfied with a less robust system and possible financial loss if the system fails, if that means they can get their system live sooner (and start generating revenue earlier) for lower cost.
How Technical Risk Impacts the Agile Architecture
My research found that reducing risk negatively impacts a team’s ability to design an agile architecture. The more the team needs to reduce risk, the more architecture design is necessary, and the earlier decisions must be made. So, tactics that aim to keep designs simple and delay decision making become less effective.
The optimal amount of up-front architecture design is therefore a balance between risk and agility. If a team ensures that it does only enough up-front architecture design to reduce risk satisfactorily, it can maximize its use of agile-architecture tactics and hence its ability to react to changes in requirements and the environment. The more agile a team is, and the more agile the architecture they design is, the more requirements volatility they can manage.
This balance means that in a highly critical and highly complex system that requires more effort to mitigate risk, the architecture will be less agile and less able to adapt to change. So, the architecture will be less able to support a system with highly unstable requirements.
On the other hand, a noncritical business website with little risk will require nothing more than a small, highly emergent agile architecture that can successfully support a system with highly unstable and continuously evolving requirements. Such systems are often built using frameworks that provide a “precooked architecture”8 that further reduces system complexity, sometimes to the point at which minimal architectural decisions must be made up front.
To manage highly unstable requirements, teams should use suitable frameworks (where possible) to reduce system complexity. They should also aim to improve their ability to design agile architectures through experience and striving for a more agile environment.
Addressing technical risk requires earlier and more detailed architecture designs, particularly in complex and critical systems. However, such designs reduce a team’s ability to use the five tactics I mentioned at the beginning of this article.
So, as I mentioned before, the right amount of up-front effort is just enough to reduce risk to a level that’s satisfactory to the team and stakeholders. This amount will affect the team’s ability to design an agile architecture, which in turn affects its ability to manage unstable requirements.
The optimal amount of up-front architecture design is therefore a balance between risk and agility. If a team ensures that it does only enough up-front architecture design to reduce risk satisfactorily, it can maximize its use of agile-architecture tactics and hence its ability to react to changes in requirements and the environment. The more agile a team is, and the more agile the architecture they design is, the more requirements volatility they can manage.
This balance means that in a highly critical and highly complex system that requires more effort to mitigate risk, the architecture will be less agile and less able to adapt to change. So, the architecture will be less able to support a system with highly unstable requirements.
On the other hand, a noncritical business website with little risk will require nothing more than a small, highly emergent agile architecture that can successfully support a system with highly unstable and continuously evolving requirements. Such systems are often built using frameworks that provide a “precooked architecture”8 that further reduces system complexity, sometimes to the point at which minimal architectural decisions must be made up front.
To manage highly unstable requirements, teams should use suitable frameworks (where possible) to reduce system complexity. They should also aim to improve their ability to design agile architectures through experience and striving for a more agile environment.
Addressing technical risk requires earlier and more detailed architecture designs, particularly in complex and critical systems. However, such designs reduce a team’s ability to use the five tactics I mentioned at the beginning of this article.
So, as I mentioned before, the right amount of up-front effort is just enough to reduce risk to a level that’s satisfactory to the team and stakeholders. This amount will affect the team’s ability to design an agile architecture, which in turn affects its ability to manage unstable requirements.
References
1. M. Waterman, “Agility, Risk, and Uncertainty,Part 1: Designing an Agile Architecture,” IEEE Software, vol. 35, no. 2, 2018, pp. 99–101.
2.
M. Waterman, J. Noble, and G. Allen, “How Much Up-Front? A Grounded Theory of Agile Architecture,” Proceedings of 37th International Conference on Software Engineering (ICSE 15), 2015, pp. 347–357. (Also read preprint version)
3.
P. Kruchten, TheRational Unified Process: An Introduction, 3rd ed., Addison-Wesley
Professional, 2004.
4.
G. Booch, “The Defenestration of SuperfluousArchitectural Accoutrements,” IEEE
Software, vol. 26, no. 4, 2009, pp. 7–8.
6.
G. Fairbanks, Just Enough Software Architecture: A Risk-Driven Approach, Marshall
& Brainerd, 2010.
7.
P. Kruchten, “Complexity Made Simple,” Proceedings of Canadian Eng. Education Assoc. Conf.,
2012..
8.
P. Kruchten, H. Obbink, and J. Stafford, “The Past,Present, and Future of Software Architecture,” IEEE Software, vol. 23, no. 2, 2006, pp. 22–30.