Thursday, September 28, 2023

Trifecta

Right from the beginning of my career, I have been bothered by the way we handle software development. As an industry, we have a huge problem with figuring out ‘who’ is responsible for ‘what’.

For decades, we’ve had endless methodologies, large and small, but all of them just seem to make poor tradeoffs between make-work and chaos. Neither is appealing.

As well, there are all sorts of other crazy processes and plenty of misconceptions floating around. Because of this most projects are dumpster fires, which only adds to the stress, wastes energy, and ensures poor quality.

For me, whenever development has worked smoothly it was been because of strong personalities who are subverted the enforced methodology. Strong, knowledgeable leadership works well.

Whenever the projects have been excessively painful, it is often caused by confusion in the roles and responsibilities which resulted in poor outcomes. Politics blossoms when the roles or rules are convoluted or vague. Focus gets misplaced, time gets wasted, and the quality plummets. It gets ugly.

It’s not that I have an answer, but after 30 years of working and 17 years of writing about it, I feel like I should at least lay down some basic principles.

So, here goes...

There are three primary areas for software. They are: a) the problem domain, b) the operational environment, and c) the development environment. Software (c) is a set of solutions (b) for some problems (a).

A system is a collection of similar solutions for a common problem domain.

There are two primary motivators for creating software: a) vertical and b) horizontal.

A vertical motivator is effectively a business-driven need for some software. Either they use it, offer it as a service, or sell it.

A horizontal motivator is an infrastructural need for some software. Missing parts of the puzzle that are disrupting either the operational or development flow.

Desired quality is a growing exponential curve, where low-quality throw-away code is to the left, then static, hardcoded, in-house development, then decent commercial products, then likely healthcare, aerospace, and NASA. To get to the next category is maybe 2x - 10x more work for each hop.

The actual quality is the desired level plus the sum of all testing, which is also exponential. So to find the next diminishing set of less visible bugs is 2x - 10x more effort. There is an endless series of bug sets. Barely reasonable commercial quality is probably a 1:1 ratio of testing with coding.

The quality of the code itself is dependent on the design and the enforcement of good style and conventions. Messy code is buggy code. The quality of the design is dependent on the depth of the analysis. The overall results are a reflection of the understanding of the designers and coders. The problem domain is often vague and irrational but it has to be mapped to code which is precise and logical. That is a very tricky mapping.

Ultimately while software is just instructions for a computer to run, its genesis is from and all about people. It is a highly social occupation. Non-trivial software takes a team to build and a team to run.

So, for every system, we end up with three main players:
  • Domain Champion
  • Operations Manager
  • Lead Software Developer
The domain champion represents all of the users. They also represent some of the funding, they are effectively paying for work to get done. They have a short-term agenda of making sure the software out there runs as expected and they are the ones that commission new features for it. They drive any non-technical analysis. They have or can get all of the answers necessary for the problem domain, which they need to understand deeply.

The operational manager is effectively the day-to-day ‘driver’ of the software. They set it up and offer it for others to use. They need to get the software installed, upgraded, and carefully monitor it. They are the front line for dealing with any issues the users encounter. They offer access as a service.

The lead developer builds stuff. It is constructive. They should focus on figuring out the stuff that needs to be built and the best way to do it, given all of the domain and operational issues. The features are usually from the top down, but to get effective construction the code needs to be built from the bottom up. The persistence foundation should exist before the GUI, for example.

For most domain functionality, the champion is effectively responsible for making sure that the features meet the needs of the users. The lead makes sure the implementation of those features is reasonable. They do this by breaking those features down into lots of different functionality to get implemented. A champion may need the system to keep track of some critical data, the lead may implement this as a set of ETL feeds and some user screens.

If there are bugs in production the users should go directly to the operations manager. If the manager is unable to resolve the issues, then they would go to the lead developer, but it would only be for bugs that are brand new. If it's recurring, the manager already knows how to deal with it. The operations manager would know if the system is slow or overusing resources. Periodically they would provide feedback to the lead.

If a project is infrastructure, cleanup, or reuse, it would be commissioned directly by the lead developer. They should be able to fund maintenance, proof of concept, new technologies, and reuse work on their own since the other parties have no reason to do so. The project will decay if someone doesn't do it.

The lead needs to constantly make the development process better, smoother, and more effective. They need to make sure the technology used is keeping up with the industry. Their primary focus is engineering, but they also need to be concerned with solution fit, and user issues like look and feel. They set the baseline for quality. If the interface is ugly or weird, it is their fault.

As well as the champion, the operations manager would have their own system requirements. They set up and are responsible for the runtime, so they have a strong say in the technologies, configuration, security, performance, resource usage, monitoring, logging, etc. All of the behavior and functionality they need to do their job. If they have lots of different systems, obviously having it consistent or aligned would be highly important to them. They would pick the OS and persistence for example, but not the programming language. The dependencies used for integration would fall under their purview.

The process for completing the analysis needed to come up with a reasonable set of features is the responsibility of the champion. Any sort of business analyst would report to them. They would craft the high-level descriptions and required features. This would be used by the lead to get a design.

If the project is infrastructure, instead of the champion, it is the responsibility of the lead to do the analysis. Generally, the work is technical or about organization, although it could be reliant on generalities within the problem domain. The work might be combining a bunch of redundant software engines altogether, to get reuse, for example.

Any sort of technical design is the lead, and if the organization is large, they likely need to coordinate the scope and designs with the firm’s architects and security officers. As well, the operational requirements would need to be followed. A design for an integrated system is not an independent silo, it has to fit with all of the other existing systems.

Architects would also be responsible for keeping the higher level organized. So, they wouldn’t allow the lead or champion to poach work from other teams.

The process of building stuff is up to the lead. They need to do it any which way, and in any order, that best suits them and their teams. They should feel comfortable with the processes they are using to turn analysis into deployment.

They do need to give time estimates, and if they miss them, detailed explanations of why they missed them. Leads need to learn to control the expectations of the champion and the users. They can’t promise two years of work in six months, for example. If development goes poorly or the system is unusable they are on the hook for it.

There should be a separate quality assurance department that would take the requirements from the champions, leads, and operations managers, and ensure that the things being delivered meet those specifications. They would also do performance and automated testing. With the specs and the delivery items, they would return a report on all of the deficiencies to all three parties. The lead and champion would then decide which issues to fix. Time and expected quality would drive those decisions.

The items that were tested in QA are the items that are given directly to operations to install or upgrade. There are two release processes. The full one and the fast one. The operations manager schedules installations and patches at their own convience and notifies the users when they are completed. The lead just queues up the almost-finished work for QA.

The lead has minimal interaction with operations. They might get pulled into net new bug issues, they get requirements for how the software should operate, and they may occasionally, with really tricky bugs, have to get direct special access to production in order to resolve problems. But they don’t monitor the system, and they aren’t the frontline for any issues. They need to focus on designing and building stuff.

The proportion of funding for the champion and for the lead defines the control of technical debt. If the system is unstable or development is slow, more funding needs to go into cleanup. The champion controls the priority of feature development, and the lead controls the priority of the underlying functionality. That may mean that a highly desired feature gets delayed until missing low-level functionality is ready. Building code out of order is expensive and hurts quality.

So that’s it. It’s a combination of the best of all of the processes, methodologies, writing, books, arguments, and discussions that I’ve seen over the decades and in the companies that I have worked for directly or indirectly. It offsets some of the growing chaos that I’ve seen and puts back some of the forgotten knowledge.

All you need, really, is three people leading the work in sync with each other in well-defined roles. There are plenty of variations. For example in pure software companies, there is a separate operations manager at each client. In some cases, the domain champion and lead are the same person, particularly when the domain is technical. So, as long as the basic structure is clear, the exact arrangement can be tweaked. Sometimes there are conflicting, overlapping champions pulling in different directions.

No comments:

Post a Comment

Thanks for the Feedback!