08 May 2018

Agility, Risk, and Uncertainty, Part 2: How Risk Impacts Agile Architecture

I was recently invited to write an article based on my doctoral research for the Pragmatic Architect column of IEEE Software magazine. It turned into an article that was too large for one column, so we split it into two. The first, Agility, Risk, and Uncertainty, Part 1: Designing an Agile Architecture, was published in the March/April edition of the magazine, and the second, Agility, Risk, and Uncertainty, Part 2: How Risk Impacts Agile Architecture, was published in the May/June edition.

If you don’t subscribe to this magazine, you can read part 1 in a previous blog post, and part 2 below.

Abstract: The amount of technical risk (and the underlying uncertainty) in a software development project can affect the amount of architecting that developers perform up-front. Software architects must determine the proper balance between risk and agility for their projects.

In part 1 [1], I introduced the findings of my research into the dilemma software architects face in agile environments: determining the effort to put into architecting up front, before development [2]. To reduce that effort as much as possible, agile architects try to design an agile architecture—that is, they use an agile process to create a modifiable, change-tolerant architecture. To do this, architects can use five tactics: keep designs simple, prove the architecture with code iteratively, use good design practices, delay decision making, and plan for options.

Here, I examine how risk (and the underlying uncertainty) affects the amount of architecting that developers must perform up-front.

Technical Risk

Along with benefits, agile architecture introduces technical risk, which relates to the uncertainty of technologies and design or the system itself. A major source of this uncertainty is a poorly designed architecture [3]. For example, decisions might have been made without sufficient effort and evidence that they satisfy architecturally significant requirements (ASRs). Perhaps the decisions were made implicitly, resulting in an emergent architecture. Such decisions can lead to a system that fails to meet ASRs and, in extreme cases, fails completely by “dying the death of a thousand cuts.” [4]

Preventing failure means more design effort to reduce uncertainty to a level that’s satisfactory to the team and stakeholders. The greater the risk and the wider its impact, the earlier it must be addressed, and the more detailed the architecture must be. Risk has always been a key driver of architecture design. For example, Tom Gilb’s 1988 book Principles of Software Engineering Management has often been quoted: “If you don’t actively attack the risks, they will actively attack you.”[5]
George Fairbanks proposed an approach in which the amount of architecture design is determined entirely by the need to reduce risk to a satisfactory level [6]. This reduction is done on the basis of either no up-front design, some yardstick measure based on a fixed proportion of time, a comprehensive set of design and documentation techniques, or an ad hoc approach that decides how much design each project requires. Fairbanks commented that although the ad hoc approach is the most common, it’s also highly subjective and provides no lessons on which to base decisions for future projects.

The classic definition of risk is the product of the probability of failure and the impact of that failure. My research participants stated that the main reason for a high probability of failure is system complexity. Complexity has three facets: scale (the number of things being considered), diversity (the number of different things), and connectivity (the number of relationships between the things) [7]. For agile architecture, complexity is caused by ASRs that perhaps conflict with each other, have a small solution space, or require compromises. Other sources of complexity are legacy systems (that are no longer being actively maintained) and the integration of multiple systems, an increasingly common requirement.

Other sources of failure include unique problems that haven’t yet been solved and the use of unknown or new and unproven technologies.

For critical systems, the cost of failure is high—perhaps people might be harmed or even lives lost. So, teams put in more architecture design effort to reduce risk. For example, compared to a team building a corporate website, a team building a medical system on which lives depend will require significantly more architectural effort and scrutiny to ensure that the ASRs are met before development starts.

Another consideration is the team’s and customer’s risk tolerance: a risk that’s acceptable to one team might be unacceptable to another. One team building a procurement website might deem the uptime to be of utmost importance. If a website visitor can’t sign up or make a purchase because the website failed, he or she will likely become lost to the business. In contrast, the stakeholders of a team building a competing system might be satisfied with a less robust system and possible financial loss if the system fails, if that means they can get their system live sooner (and start generating revenue earlier) for lower cost.

How Technical Risk Impacts the Agile Architecture

My research found that reducing risk negatively impacts a team’s ability to design an agile architecture. The more the team needs to reduce risk, the more architecture design is necessary, and the earlier decisions must be made. So, tactics that aim to keep designs simple and delay decision making become less effective.

The optimal amount of up-front architecture design is therefore a balance between risk and agility. If a team ensures that it does only enough up-front architecture design to reduce risk satisfactorily, it can maximize its use of agile-architecture tactics and hence its ability to react to changes in requirements and the environment. The more agile a team is, and the more agile the architecture they design is, the more requirements volatility they can manage.

This balance means that in a highly critical and highly complex system that requires more effort to mitigate risk, the architecture will be less agile and less able to adapt to change. So, the architecture will be less able to support a system with highly unstable requirements.

On the other hand, a noncritical business website with little risk will require nothing more than a small, highly emergent agile architecture that can successfully support a system with highly unstable and continuously evolving requirements. Such systems are often built using frameworks that provide a “precooked architecture”8 that further reduces system complexity, sometimes to the point at which minimal architectural decisions must be made up front.

To manage highly unstable requirements, teams should use suitable frameworks (where possible) to reduce system complexity. They should also aim to improve their ability to design agile architectures through experience and striving for a more agile environment.

Addressing technical risk requires earlier and more detailed architecture designs, particularly in complex and critical systems. However, such designs reduce a team’s ability to use the five tactics I mentioned at the beginning of this article.

So, as I mentioned before, the right amount of up-front effort is just enough to reduce risk to a level that’s satisfactory to the team and stakeholders. This amount will affect the team’s ability to design an agile architecture, which in turn affects its ability to manage unstable requirements.

References

1.      M. Waterman, “Agility, Risk, and Uncertainty,Part 1: Designing an Agile Architecture,” IEEE Software, vol. 35, no. 2, 2018, pp. 99–101. 
2.      M. Waterman, J. Noble, and G. Allen, “How Much Up-Front? A Grounded Theory of Agile Architecture,” Proceedings of 37th International Conference on Software Engineering (ICSE 15), 2015, pp. 347–357. (Also read preprint version)
3.      P. Kruchten, TheRational Unified Process: An Introduction, 3rd ed., Addison-Wesley Professional, 2004.
4.      G. Booch, “The Defenestration of SuperfluousArchitectural Accoutrements,” IEEE Software, vol. 26, no. 4, 2009, pp. 7–8. 
5.      T. Gilb, Principles of Software Engineering Management, Addison-Wesley, 1988.
6.      G. Fairbanks, Just Enough Software Architecture: A Risk-Driven Approach, Marshall & Brainerd, 2010.
7.      P. Kruchten, “Complexity Made Simple,” Proceedings of Canadian Eng. Education Assoc. Conf., 2012..
8.      P. Kruchten, H. Obbink, and J. Stafford, “The Past,Present, and Future of Software Architecture,” IEEE Software, vol. 23, no. 2, 2006, pp. 22–30.

02 May 2018

Agility, Risk, and Uncertainty, Part 1: Designing an Agile Architecture

I was recently invited to write an article based on my doctoral research for the Pragmatic Architect column of IEEE Software magazine, for a non-academic audience. It turned into an article that was too large for one column, so we split it into two. The first, Agility, Risk, and Uncertainty, Part 1: Designing an Agile Architecture, was published in the March/April edition of the magazine. If you don’t subscribe to this magazine, you can read the article below; part 2 (How Risk Impacts Agile Architecture) will follow shortly.

Abstract: Software architects in agile environments face the dilemma of determining how much effort goes into architecting up front, before development starts. This is an issue that agile methodologies and frameworks don’t address and that’s becoming more critical as agile development gets used for a wider range of problems. This article is the first of two that discuss findings of recent research based on the experiences of 44 agile practitioners, to help shed light on the problem.

Software architects in agile environments face the dilemma of determining how much effort goes into architecting up front, before development starts. This is an issue that agile methodologies and frameworks don’t address and that’s becoming more critical as agile development gets used for a wider range of problems. This article is the first of two that discuss findings of my recent research based on the experiences of 44 agile practitioners [1], to help shed light on the problem.

If architects (whether individual architects or the development team as a whole) spend too much time architecting up front, they’ll struggle to be agile. This is partly because they’re making architectural decisions too early—decisions that could later prove to be wrong or subject to change. It’s also because this excessive effort will delay development and the opportunity for early customer feedback.

On the other hand, too little architecting results in ad hoc decisions that might not support the system’s architecturally significant requirements—the requirements that impact the architecture. If the team is spending all its time fixing architectural problems that should have been thought through up front, it’s not delivering customer value. And, just as if the team spent too much time architecting, it will struggle to be agile.

Somewhere in between is the optimal level of architecting that maximizes the team’s architectural agility. How can we determine this optimal level? To answer that, we must realize that the optimal level depends highly on the team’s context. This context includes whether the requirements are stable, the amount of technical risk, how early the customer wants to start using the system (perhaps earning revenue from it), the team’s agility, how agile-friendly the team’s environment is (such as business and administration or project stakeholders), and the team’s architectural and technical experience.

The Agile Architecture

The context determines the team’s ability to design an agile architecture—one that supports the team’s agility and hence its ability to respond to change. I define agile architecture using two dimensions:
  • It has been designed using an agile process.
  • It’s modifiable and tolerant of change.
The former means that the architecture isn’t a static set of decisions—it evolves as the system evolves and as the team revisits architectural decisions. The latter means that the architecture can adapt more easily as the system changes.

Neither dimension by itself is necessarily sufficient for the architecture to be agile. An architecture can be designed using an agile process but not be particularly modifiable and tolerant of change. As Simon Brown said, “In my experience … teams are more focused on delivering functionality rather than looking after their architecture.”[2] An architecture can be designed to be modifiable and tolerant of change, but if it’s not designed using an agile process, it can’t fully welcome change (the second principle of the Agile Manifesto; agilemanifesto.org) as an intrinsic characteristic.

My research found that teams design agile architectures using five tactics:
  • Keep designs simple.
  • Prove the architecture with code iteratively.
  • Use good design practices.
  • Delay decision making.
  • Plan for options.
Each tactic differently impacts the required responsiveness to change and the amount of up-front design. Table 1 summarizes these impacts.

Table 1. Tactics for designing agile architecture and their impacts.

Tactic
Impact on responsiveness to change
Reduces up-front design effort?
Keep designs simple
Increases modifiability
Yes
Prove the architecture with code iteratively
Increases modifiability
Yes
Use good design practices
Increases modifiability
No
Delay decision making
Increases tolerance to change
Yes
Plan for options
Increases tolerance to change
No

Keeping designs simple is all about designing only for what’s immediately required: no overengineering or gold-plating, no designing for what might be required (called YAGNI—You Ain’t Gonna Need It—in Extreme Programming [3]). Simplicity reduces the detail in the design and the up-front effort. It increases the ease with which the design can be modified because there’s less design to change. Of course, simplicity guarantees that the architecture will need to be updated later as the system grows and the requirements evolve. However, that’s desirable because the decisions are delayed until the team has the best understanding about how to implement them.

Proving the architecture with code iteratively means simultaneous design and development, which is particularly useful if uncertainty exists about whether the architecture will meet the requirements. Simultaneous design and coding lets the team prove the architecture with real code, rather than through analysis, and refine the design if necessary. This lets the team come up with the simplest solution that works. So, like the simplicity tactic, this increases modifiability and reduces the up-front effort.

Good design practices to improve modifiability include separation of concerns (for example, service-oriented architecture, microservices, and encapsulated modules) and using quality management tools to ensure good design at the class level. Although good design practices should be an architect’s goal no matter what development process he or she uses, they’re especially important in agile development because they reduce the effort required to change the design. Good design practices don’t reduce the up-front effort; indeed, they can increase it because they require extra discipline. However, the benefits they bring are worth that extra effort [2] because changes can be isolated and limited to small parts of the system.

Delaying decision making is related to simplicity: this tactic aims to reduce the impact of trying to predict requirements that aren’t fully understood yet. Decisions are left as late as possible without delaying development, which allows the team as much time as possible to understand the requirements. Delaying decisions makes the architectural decisions more tolerant of change: fewer decisions must be revisited as requirements change, and the design requires less change.

Planning for options means the team makes decisions that retain flexibility and don’t close off future options. The architecture is designed to be easily extended. Although planning for options is related to good design practices, it’s less about general modularity and encapsulation and more about understanding what might need changing later and ensuring that the design isn’t optimized so much that it makes those changes difficult. This tactic doesn’t reduce up-front effort (and might increase it if, for example, the team uses extra levels of indirection to make changes to a particular technology layer easier).

The participants in my research used some or all of these five tactics to produce smaller, simpler, and more agile architecture designs that grow and evolve as the participants’ understanding of the systems develops. Hence, these tactics support their teams’ ability to respond to change.

In tension with minimizing design effort and deferring decisions as long as possible is the architect’s need to reduce technical risk. Risk has a big impact on architects’ ability to design an agile architecture and thus how much up-front design they do. I’ll discuss this impact in Part 2.

References

[1] M. Waterman, J. Noble, and G. Allen, “How Much Up-Front? A Grounded Theory of Agile Architecture,” Proceedings 2015 International Conference on Software Engineering (ICSE 15), 2015, pp. 347–357. (Also read preprint version)
[2] S. Brown, Software Architecture for Developers, Leanpub, 2013.
[3] M. Fowler, “Is Design Dead?,” Extreme Programming Examined, G. Succi and M. Marchesi, eds., Addison-Wesley Longman, 2001, pp. 3–17.