- Published on
Software Architecture: Solution Architecture
- Authors
In Part 1 of the Software Architecture Series, we surveyed common architecture paradigms - monolithic, microservices, event-driven, and others. In Part 2, we covered the conventions and principles that guide both frontend and backend implementations. This third part zooms in further: how does a specific solution actually get designed for a specific business problem?
As a reminder, here are the four parts of the series:
- Part 1: Architecture Paradigms: Common architecture paradigms such as monolithic, microservices, and event-driven architectures.
- Part 2: Architecture Conventions and Principles: Best practices, principles like SOLID, and architectural patterns such as CQRS and the Repository Pattern.
- Part 3: Solution Architecture: Designing a solution that meets business needs and scales with the organization.
- Part 4: Enterprise Architecture: The broader perspective of system integrations, scalability, and the role of enterprise architecture in large organizations.
Solution architecture sits between high-level paradigms and ground-level implementation. The solution architect's job is to turn a business goal - "let customers self-serve refunds in under a minute" - into a concrete design that an engineering team can build, operate, and evolve.
Table of Contents
- Framing the Solution
- Designing the Solution
- Conclusion
- Appendix: Common Solution-Architecture Trade-Offs
Framing the Solution
Before any boxes get drawn, a solution architect needs to frame the problem. That means converting a business goal into measurable constraints, sizing the system to expected and peak load, and identifying the boundaries it must respect. Framing is mostly about asking the right questions before reaching for a pattern.
Non-Functional Requirements (NFRs)
NFRs describe how a system should behave rather than what it should do. Common categories include performance, availability, security, scalability, maintainability, and compliance. The right priorities depend on the domain: a banking platform weights consistency and audit-ability heavily, while a streaming service optimises for latency and throughput.
Relative NFR Priorities Across Domains
Principles:
- Quantify NFRs wherever possible (e.g., "99.95% monthly availability", "p95 latency under 200ms") instead of qualitative goals.
- Surface NFR conflicts early; strong consistency typically reduces availability under network partitions.
- Tie each NFR to a stakeholder; orphan NFRs are the first to be dropped when scope tightens.
Capacity Planning
Capacity planning estimates the resources a solution will need across expected and peak load. The goal is to size compute, storage, and network bandwidth so the system stays within its NFRs without permanently over-provisioning. Modern cloud-native solutions blend baseline capacity with elastic headroom rather than reserving fixed peak capacity year-round.
Cost Growth: Right-Sized vs Over-Provisioned Capacity
Principles:
- Plan for growth in 6-12 month horizons; longer windows tend to over-commit and waste budget.
- Stress-test capacity assumptions with synthetic load before locking in topology.
- Prefer elastic headroom (auto-scaling, regional failover) for unpredictable spikes over fixed reserved capacity.
Designing the Solution
Once the problem is framed, the design phase translates constraints into structure: how the solution integrates with the existing landscape, which technologies make up the stack, and how the resulting decisions are recorded so future teams understand the "why" behind the architecture.
Integration Design
Most solutions live inside an existing landscape: identity providers, payment gateways, data warehouses, partner APIs, and internal services. Integration design defines how the new solution communicates with these systems - synchronously, asynchronously, or via event streams - and where the boundaries of ownership lie.
| Integration Style | Best For | Trade-Off |
|---|---|---|
| REST over HTTP | Public APIs, broad compatibility | Verbose payloads, request-response only |
| gRPC | Internal service-to-service, low latency | Tooling and HTTP/2 dependency |
| GraphQL | Aggregating multiple back-ends for a UI | Complex caching, harder field-level authZ |
| Async Messaging (queues) | Decoupled work, smoothing load spikes | Harder to reason about end-to-end flow |
| Event Streaming | Audit trails, replayable pipelines | Operational overhead, schema governance |
| Backend-for-Frontend (BFF) | Tailoring APIs to specific clients | More services to own and version |
A good integration design is one you can change. Most production failures we trace back to overly tight coupling between systems that should never have shared a contract. - Solution Architect
Principles:
- Prefer asynchronous communication at boundaries you don't fully control.
- Treat integration contracts as products; version them, document them, and deprecate them deliberately.
- Avoid distributed transactions where idempotency and compensation can do the same job.
Tech-Stack Selection
Choosing a tech stack at solution scope is a constrained problem. The organization's standards, the team's existing skills, the operating model (in-house vs managed), and the lifetime of the system all narrow the field before any technical comparison begins. The architect's job is to make those constraints visible and pick within them, not against them.
| Decision | Common Options | Primary Consideration |
|---|---|---|
| Language/Runtime | Java, Go, TypeScript, Python | Team skills, talent availability, operating cost |
| Persistence | Relational (Postgres), Document (Mongo), Wide-column (Cassandra), KV (Redis) | Access pattern, consistency requirements |
| Compute Model | VMs, Containers (K8s), Serverless | Operational maturity, latency tolerance |
| Messaging | Queues (SQS, RabbitMQ), Streaming (Kafka, Kinesis) | Ordering, replay, and throughput needs |
| Observability | OpenTelemetry-based stack, vendor APM | Coverage of logs, metrics, and traces together |
Principles:
- Default to organizational standards; deviate only when an NFR truly demands it.
- Prefer boring, well-understood technology for the parts of the system that aren't the differentiator.
- Account for total cost of ownership, not just licence cost; hiring, operations, and migration costs dominate over five-year horizons.
Architecture Decision Records (ADRs)
ADRs are short, dated documents that capture the context, options considered, and rationale behind each major architectural decision. Originally popularized by Michael Nygard, they have become a standard artifact in solution-architecture practice because they make decisions reviewable, reversible, and discoverable long after the original architect has moved on.
| Section | Purpose |
|---|---|
| Title and Date | Stable identifier and chronology |
| Status | Proposed, Accepted, Deprecated, Superseded |
| Context | The forces, constraints, and stakeholders shaping the choice |
| Decision | The option selected, in active voice |
| Consequences | Trade-offs accepted, including what gets harder |
| Alternatives Considered | Options rejected and why |
An ADR is the cheapest insurance policy in software architecture. The cost of writing one is an hour; the cost of not writing one shows up two years later when the team can't remember why they made the choice. - summary of ADR adoption guidance
Principles:
- Write the ADR before, not after, the decision is acted on; the goal is to inform, not to justify.
- Keep ADRs narrowly scoped: one decision per record, not a design document.
- Store ADRs in the same repository as the code they govern, so they evolve with the system.
Conclusion
Solution architecture is the layer where business problems become buildable systems. By framing NFRs and capacity carefully, designing integrations that respect ownership boundaries, picking a tech stack inside organizational constraints, and recording the resulting decisions in ADRs, a solution architect gives the engineering team a clear, defensible blueprint to build against.
In Part 4: Enterprise Architecture, we will widen the lens again: how individual solutions fit inside the business, data, application, and technology domains of an enterprise, how governance and capability mapping work in practice, and how frameworks like TOGAF help large organizations keep their portfolio of solutions coherent.
Appendix: Common Solution-Architecture Trade-Offs
| NFR / Goal | Common Trade-Off | Typical Mitigation |
|---|---|---|
| Strong Consistency | Reduced availability under partition | Read replicas with bounded staleness, regional active-passive |
| Low Latency | Higher infra cost, harder caching | Edge compute, request collapsing, pre-computation |
| High Availability | More moving parts, complex failover | Multi-region active-active, chaos testing, SLA-aware retries |
| High Scalability | Operational and observability overhead | Auto-scaling with sane limits, load-based sharding |
| Strong Security | Slower development cycle | Shift-left security, policy-as-code, paved-road platforms |
| Cost Efficiency | Performance and resilience headroom | Right-sizing reviews, reserved + spot mix, FinOps practice |
| Maintainability | Slower initial delivery | Modular boundaries, ADRs, automated tests at the seams |