Published on

Software Architecture: Solution Architecture

Authors

In Part 1 of the Software Architecture Series, we surveyed common architecture paradigms - monolithic, microservices, event-driven, and others. In Part 2, we covered the conventions and principles that guide both frontend and backend implementations. This third part zooms in further: how does a specific solution actually get designed for a specific business problem?

As a reminder, here are the four parts of the series:

  • Part 1: Architecture Paradigms: Common architecture paradigms such as monolithic, microservices, and event-driven architectures.
  • Part 2: Architecture Conventions and Principles: Best practices, principles like SOLID, and architectural patterns such as CQRS and the Repository Pattern.
  • Part 3: Solution Architecture: Designing a solution that meets business needs and scales with the organization.
  • Part 4: Enterprise Architecture: The broader perspective of system integrations, scalability, and the role of enterprise architecture in large organizations.

Solution architecture sits between high-level paradigms and ground-level implementation. The solution architect's job is to turn a business goal - "let customers self-serve refunds in under a minute" - into a concrete design that an engineering team can build, operate, and evolve.

Table of Contents


Framing the Solution

Before any boxes get drawn, a solution architect needs to frame the problem. That means converting a business goal into measurable constraints, sizing the system to expected and peak load, and identifying the boundaries it must respect. Framing is mostly about asking the right questions before reaching for a pattern.

Non-Functional Requirements (NFRs)

NFRs describe how a system should behave rather than what it should do. Common categories include performance, availability, security, scalability, maintainability, and compliance. The right priorities depend on the domain: a banking platform weights consistency and audit-ability heavily, while a streaming service optimises for latency and throughput.

Relative NFR Priorities Across Domains

Principles:

  • Quantify NFRs wherever possible (e.g., "99.95% monthly availability", "p95 latency under 200ms") instead of qualitative goals.
  • Surface NFR conflicts early; strong consistency typically reduces availability under network partitions.
  • Tie each NFR to a stakeholder; orphan NFRs are the first to be dropped when scope tightens.

Capacity Planning

Capacity planning estimates the resources a solution will need across expected and peak load. The goal is to size compute, storage, and network bandwidth so the system stays within its NFRs without permanently over-provisioning. Modern cloud-native solutions blend baseline capacity with elastic headroom rather than reserving fixed peak capacity year-round.

Cost Growth: Right-Sized vs Over-Provisioned Capacity

Principles:

  • Plan for growth in 6-12 month horizons; longer windows tend to over-commit and waste budget.
  • Stress-test capacity assumptions with synthetic load before locking in topology.
  • Prefer elastic headroom (auto-scaling, regional failover) for unpredictable spikes over fixed reserved capacity.

Designing the Solution

Once the problem is framed, the design phase translates constraints into structure: how the solution integrates with the existing landscape, which technologies make up the stack, and how the resulting decisions are recorded so future teams understand the "why" behind the architecture.

Integration Design

Most solutions live inside an existing landscape: identity providers, payment gateways, data warehouses, partner APIs, and internal services. Integration design defines how the new solution communicates with these systems - synchronously, asynchronously, or via event streams - and where the boundaries of ownership lie.

Integration StyleBest ForTrade-Off
REST over HTTPPublic APIs, broad compatibilityVerbose payloads, request-response only
gRPCInternal service-to-service, low latencyTooling and HTTP/2 dependency
GraphQLAggregating multiple back-ends for a UIComplex caching, harder field-level authZ
Async Messaging (queues)Decoupled work, smoothing load spikesHarder to reason about end-to-end flow
Event StreamingAudit trails, replayable pipelinesOperational overhead, schema governance
Backend-for-Frontend (BFF)Tailoring APIs to specific clientsMore services to own and version

A good integration design is one you can change. Most production failures we trace back to overly tight coupling between systems that should never have shared a contract. - Solution Architect

Principles:

  • Prefer asynchronous communication at boundaries you don't fully control.
  • Treat integration contracts as products; version them, document them, and deprecate them deliberately.
  • Avoid distributed transactions where idempotency and compensation can do the same job.

Tech-Stack Selection

Choosing a tech stack at solution scope is a constrained problem. The organization's standards, the team's existing skills, the operating model (in-house vs managed), and the lifetime of the system all narrow the field before any technical comparison begins. The architect's job is to make those constraints visible and pick within them, not against them.

DecisionCommon OptionsPrimary Consideration
Language/RuntimeJava, Go, TypeScript, PythonTeam skills, talent availability, operating cost
PersistenceRelational (Postgres), Document (Mongo), Wide-column (Cassandra), KV (Redis)Access pattern, consistency requirements
Compute ModelVMs, Containers (K8s), ServerlessOperational maturity, latency tolerance
MessagingQueues (SQS, RabbitMQ), Streaming (Kafka, Kinesis)Ordering, replay, and throughput needs
ObservabilityOpenTelemetry-based stack, vendor APMCoverage of logs, metrics, and traces together

Principles:

  • Default to organizational standards; deviate only when an NFR truly demands it.
  • Prefer boring, well-understood technology for the parts of the system that aren't the differentiator.
  • Account for total cost of ownership, not just licence cost; hiring, operations, and migration costs dominate over five-year horizons.

Architecture Decision Records (ADRs)

ADRs are short, dated documents that capture the context, options considered, and rationale behind each major architectural decision. Originally popularized by Michael Nygard, they have become a standard artifact in solution-architecture practice because they make decisions reviewable, reversible, and discoverable long after the original architect has moved on.

SectionPurpose
Title and DateStable identifier and chronology
StatusProposed, Accepted, Deprecated, Superseded
ContextThe forces, constraints, and stakeholders shaping the choice
DecisionThe option selected, in active voice
ConsequencesTrade-offs accepted, including what gets harder
Alternatives ConsideredOptions rejected and why

An ADR is the cheapest insurance policy in software architecture. The cost of writing one is an hour; the cost of not writing one shows up two years later when the team can't remember why they made the choice. - summary of ADR adoption guidance

Principles:

  • Write the ADR before, not after, the decision is acted on; the goal is to inform, not to justify.
  • Keep ADRs narrowly scoped: one decision per record, not a design document.
  • Store ADRs in the same repository as the code they govern, so they evolve with the system.

Conclusion

Solution architecture is the layer where business problems become buildable systems. By framing NFRs and capacity carefully, designing integrations that respect ownership boundaries, picking a tech stack inside organizational constraints, and recording the resulting decisions in ADRs, a solution architect gives the engineering team a clear, defensible blueprint to build against.

In Part 4: Enterprise Architecture, we will widen the lens again: how individual solutions fit inside the business, data, application, and technology domains of an enterprise, how governance and capability mapping work in practice, and how frameworks like TOGAF help large organizations keep their portfolio of solutions coherent.


Appendix: Common Solution-Architecture Trade-Offs

NFR / GoalCommon Trade-OffTypical Mitigation
Strong ConsistencyReduced availability under partitionRead replicas with bounded staleness, regional active-passive
Low LatencyHigher infra cost, harder cachingEdge compute, request collapsing, pre-computation
High AvailabilityMore moving parts, complex failoverMulti-region active-active, chaos testing, SLA-aware retries
High ScalabilityOperational and observability overheadAuto-scaling with sane limits, load-based sharding
Strong SecuritySlower development cycleShift-left security, policy-as-code, paved-road platforms
Cost EfficiencyPerformance and resilience headroomRight-sizing reviews, reserved + spot mix, FinOps practice
MaintainabilitySlower initial deliveryModular boundaries, ADRs, automated tests at the seams