Software Architecture: Solution Architecture

In Part 1 of the Software Architecture Series, we surveyed common architecture paradigms - monolithic, microservices, event-driven, and others. In Part 2, we covered the conventions and principles that guide both frontend and backend implementations. This third part zooms in further: how does a specific solution actually get designed for a specific business problem?

As a reminder, here are the four parts of the series:

Part 1: Architecture Paradigms: Common architecture paradigms such as monolithic, microservices, and event-driven architectures.
Part 2: Architecture Conventions and Principles: Best practices, principles like SOLID, and architectural patterns such as CQRS and the Repository Pattern.
Part 3: Solution Architecture: Designing a solution that meets business needs and scales with the organization.
Part 4: Enterprise Architecture: The broader perspective of system integrations, scalability, and the role of enterprise architecture in large organizations.

Solution architecture sits between high-level paradigms and ground-level implementation. The solution architect's job is to turn a business goal - "let customers self-serve refunds in under a minute" - into a concrete design that an engineering team can build, operate, and evolve.

Framing the Solution
- Non-Functional Requirements (NFRs)
- Capacity Planning
Designing the Solution
Conclusion
Appendix: Common Solution-Architecture Trade-Offs

Framing the Solution

Before any boxes get drawn, a solution architect needs to frame the problem. That means converting a business goal into measurable constraints, sizing the system to expected and peak load, and identifying the boundaries it must respect. Framing is mostly about asking the right questions before reaching for a pattern.

Non-Functional Requirements (NFRs)

NFRs describe how a system should behave rather than what it should do. Common categories include performance, availability, security, scalability, maintainability, and compliance. The right priorities depend on the domain: a banking platform weights consistency and audit-ability heavily, while a streaming service optimises for latency and throughput.

Relative NFR Priorities Across Domains

Principles:

Quantify NFRs wherever possible (e.g., "99.95% monthly availability", "p95 latency under 200ms") instead of qualitative goals.
Surface NFR conflicts early; strong consistency typically reduces availability under network partitions.
Tie each NFR to a stakeholder; orphan NFRs are the first to be dropped when scope tightens.

Capacity Planning

Capacity planning estimates the resources a solution will need across expected and peak load. The goal is to size compute, storage, and network bandwidth so the system stays within its NFRs without permanently over-provisioning. Modern cloud-native solutions blend baseline capacity with elastic headroom rather than reserving fixed peak capacity year-round.

Cost Growth: Right-Sized vs Over-Provisioned Capacity

Principles:

Plan for growth in 6-12 month horizons; longer windows tend to over-commit and waste budget.
Stress-test capacity assumptions with synthetic load before locking in topology.
Prefer elastic headroom (auto-scaling, regional failover) for unpredictable spikes over fixed reserved capacity.

Designing the Solution

Once the problem is framed, the design phase translates constraints into structure: how the solution integrates with the existing landscape, which technologies make up the stack, and how the resulting decisions are recorded so future teams understand the "why" behind the architecture.

Integration Design

Most solutions live inside an existing landscape: identity providers, payment gateways, data warehouses, partner APIs, and internal services. Integration design defines how the new solution communicates with these systems - synchronously, asynchronously, or via event streams - and where the boundaries of ownership lie.

Integration Style	Best For	Trade-Off
REST over HTTP	Public APIs, broad compatibility	Verbose payloads, request-response only
gRPC	Internal service-to-service, low latency	Tooling and HTTP/2 dependency
GraphQL	Aggregating multiple back-ends for a UI	Complex caching, harder field-level authZ
Async Messaging (queues)	Decoupled work, smoothing load spikes	Harder to reason about end-to-end flow
Event Streaming	Audit trails, replayable pipelines	Operational overhead, schema governance
Backend-for-Frontend (BFF)	Tailoring APIs to specific clients	More services to own and version

A good integration design is one you can change. Most production failures we trace back to overly tight coupling between systems that should never have shared a contract. - Solution Architect

Principles:

Prefer asynchronous communication at boundaries you don't fully control.
Treat integration contracts as products; version them, document them, and deprecate them deliberately.
Avoid distributed transactions where idempotency and compensation can do the same job.

Tech-Stack Selection

Choosing a tech stack at solution scope is a constrained problem. The organization's standards, the team's existing skills, the operating model (in-house vs managed), and the lifetime of the system all narrow the field before any technical comparison begins. The architect's job is to make those constraints visible and pick within them, not against them.

Decision	Common Options	Primary Consideration
Language/Runtime	Java, Go, TypeScript, Python	Team skills, talent availability, operating cost
Persistence	Relational (Postgres), Document (Mongo), Wide-column (Cassandra), KV (Redis)	Access pattern, consistency requirements
Compute Model	VMs, Containers (K8s), Serverless	Operational maturity, latency tolerance
Messaging	Queues (SQS, RabbitMQ), Streaming (Kafka, Kinesis)	Ordering, replay, and throughput needs
Observability	OpenTelemetry-based stack, vendor APM	Coverage of logs, metrics, and traces together

Principles:

Default to organizational standards; deviate only when an NFR truly demands it.
Prefer boring, well-understood technology for the parts of the system that aren't the differentiator.
Account for total cost of ownership, not just licence cost; hiring, operations, and migration costs dominate over five-year horizons.

Architecture Decision Records (ADRs)

ADRs are short, dated documents that capture the context, options considered, and rationale behind each major architectural decision. Originally popularized by Michael Nygard, they have become a standard artifact in solution-architecture practice because they make decisions reviewable, reversible, and discoverable long after the original architect has moved on.

Section	Purpose
Title and Date	Stable identifier and chronology
Status	Proposed, Accepted, Deprecated, Superseded
Context	The forces, constraints, and stakeholders shaping the choice
Decision	The option selected, in active voice
Consequences	Trade-offs accepted, including what gets harder
Alternatives Considered	Options rejected and why

An ADR is the cheapest insurance policy in software architecture. The cost of writing one is an hour; the cost of not writing one shows up two years later when the team can't remember why they made the choice. - summary of ADR adoption guidance

Principles:

Write the ADR before, not after, the decision is acted on; the goal is to inform, not to justify.
Keep ADRs narrowly scoped: one decision per record, not a design document.
Store ADRs in the same repository as the code they govern, so they evolve with the system.

Conclusion

Solution architecture is the layer where business problems become buildable systems. By framing NFRs and capacity carefully, designing integrations that respect ownership boundaries, picking a tech stack inside organizational constraints, and recording the resulting decisions in ADRs, a solution architect gives the engineering team a clear, defensible blueprint to build against.

In Part 4: Enterprise Architecture, we will widen the lens again: how individual solutions fit inside the business, data, application, and technology domains of an enterprise, how governance and capability mapping work in practice, and how frameworks like TOGAF help large organizations keep their portfolio of solutions coherent.

Appendix: Common Solution-Architecture Trade-Offs

NFR / Goal	Common Trade-Off	Typical Mitigation
Strong Consistency	Reduced availability under partition	Read replicas with bounded staleness, regional active-passive
Low Latency	Higher infra cost, harder caching	Edge compute, request collapsing, pre-computation
High Availability	More moving parts, complex failover	Multi-region active-active, chaos testing, SLA-aware retries
High Scalability	Operational and observability overhead	Auto-scaling with sane limits, load-based sharding
Strong Security	Slower development cycle	Shift-left security, policy-as-code, paved-road platforms
Cost Efficiency	Performance and resilience headroom	Right-sizing reviews, reserved + spot mix, FinOps practice
Maintainability	Slower initial delivery	Modular boundaries, ADRs, automated tests at the seams

Table of Contents

Framing the Solution

Non-Functional Requirements (NFRs)

Capacity Planning

Designing the Solution

Integration Design

Tech-Stack Selection

Architecture Decision Records (ADRs)

Conclusion

Appendix: Common Solution-Architecture Trade-Offs