CQRS & Event Sourcing Architecture

What is CQRS & Event Sourcing?

Command Query Responsibility Segregation (CQRS) and Event Sourcing are architectural patterns that we've adopted to build highly scalable, resilient, and performant financial systems. These patterns allow us to separate the write and read operations in our systems, optimizing each path for its specific requirements while maintaining consistency and reliability.

The CQRS Pattern

CQRS separates the command (write) operations from the query (read) operations. This separation allows each side to be optimized independently. Commands can focus on validation, consistency, and business logic, while queries can be optimized for performance, often using specialized data stores or caches.

Event Sourcing

Event Sourcing stores all changes to application state as a sequence of events. Instead of storing just the current state, we capture every action that has altered the state of the system. This provides a complete audit trail, enables complex event analysis, and allows systems to be reconstructed to any point in time.

Our Journey to Cloud Native

In 2017, we faced a significant challenge when using our low-latency HFT matching engine as the foundation for a full crypto exchange solution. This required retail components like payment gateways, KYC onboarding, and referral programs, with potential account opening requests in the thousands per hour.

"We clearly had to maintain our market-leading latency for HFT clients in Japan, but needed to combine this with the best of the 'Cloud Native' technologies to serve retail markets and achieve our scalability and flexibility goals."

From 2019 to 2022, we redefined and rebuilt our architectural approach, making nuanced decisions about which cloud-native technologies to adopt while preserving our critical low-latency capabilities.

Key Architecture Components

Event Stream Processing

We moved our core processing engines from shared memory to a fully event-sourced streaming model where any subscriber component can be instantly replaced by another, dramatically improving system reliability.

Microservice Aggregates

High-speed microservice aggregates act as query handlers in our CQRS model, supporting increased scale, reducing single points of failure, and improving latency for aggregate data requests (5-10ms).

Service Mesh

Our service mesh allows any service to locate, monitor, or communicate with any other service. The Service Chassis provides common system-wide functionality such as distributed real-time data, transaction routing, and observability.

Specialized Databases

We've broken down our monolithic database into microservice-specific databases including in-memory distributed databases for latency-sensitive services, PostgreSQL for configuration and archiving, and InfluxDB time series database for OHLC data.

Digital Twin Observability

Our system monitoring is based on a digital twin approach to observability, using OpenTelemetry standards, service mesh, and self-registration.

Comprehensive Monitoring: Application service controls and integrated docker controls in one console
Unified Management: Logstash and Elastic integration allows all support operations to be managed from one place
Instant Failover: Digital twin model powers cluster management and instant failover of stateful services
Future-Ready: Framework designed to integrate with AI for autonomous system monitoring

Benefits of Our Architecture

Performance

✓ Ultra-low latency on critical paths
✓ Optimized for HFT trading environments
✓ 5-10ms response times for aggregate queries
✓ Support for millions of transactions per day

Reliability

✓ Reduced single points of failure
✓ Instant component replacement
✓ Complete event history for auditing
✓ Transparent high availability failover

Scalability

✓ Elastic scaling for retail-facing components
✓ Handles high-volume account openings
✓ Efficient real-time price distribution
✓ Independent scaling of read and write operations

Development and Testing

Alongside our architectural changes, we've implemented modern development practices to ensure code quality and reliability:

DevOps and Testing

Shift-Left Testing: All components can be fully tested by developers using mock services
Behavior-Driven Design: Our TFCoverage framework supports end-to-end testing across multiple protocols
Automated Security: SAST and DAST analysis in our CI/CD pipelines reveals issues before deployment
Version Control: Structured dependency management allows customer releases to match specific dependencies
Containerization: Automated container deployment for consistent environments

These practices have resulted in a more engaged development team, dramatically improved testing coverage, and confident deployment cycles.

Breaking Down the Monolith

A critical part of our journey was breaking down our monolithic database into microservice-specific databases. This had grown to be a bottleneck both for scaling customers and for scaling our development teams - it was simply too large for anyone but the original senior developers to understand.

As part of the move to microservices, we restructured our database architecture to include:

In-memory distributed database for keeping metadata updated in real-time in latency-sensitive services
PostgreSQL database for configuration management, transaction management, and report handling
InfluxDB time series database for our real-time OHLC bar factory and optional integrations with cloud data lakes

This single change had a large positive impact on our development teams, as it suddenly became much easier to understand the full scope of each database, empowering teams to discuss improvements and innovate more quickly.

Time Management and Scheduling

We rebuilt our date management and service scheduling functionality to support:

Multiple calendars for exchanges operating on different calendars
Multiple time zones for FX exchanges operating in Japan on Chicago timezones
Multiple session types including daily sessions (futures markets), intra-day sessions (equity markets), and week-long sessions (crypto and FX markets)
"Time-Travel" to support both calendar and weekend exchange testing set in the future, and process re-runs and corrections in the past

We rebuilt our audit and time-travel handling to operate with the latest temporal database designs, while implementing our archive data service as a real-time listener on the event streams. This removed the need for a separate archive handling process at the end of every day.

From Batch to Stream Processing

Traditionally, financial systems have relied heavily on batch processing, particularly for end-of-day operations. While we have always had an extremely flexible batch processing engine, we are now replacing it with a stream processing engine to:

Batch Processing

• End-of-day processing
• Scheduled imports/exports
• Periodic reconciliation
• Sequential processing

Stream Processing

• Real-time processing
• Continuous data flow
• Immediate system-wide events
• Parallel processing

This transition supports much higher volume and enables system-wide account status events, like margin warnings and limit kill switch events, essential for high-frequency traders who require immediate notification of position changes or risk breaches.

Connectivity and Business Logic

With our new architecture in place, we've integrated significant functionality, including:

Market Connections

Multiple market makers
Exchange FIX gateways
Itch and Glimpse exchange feeds
Tora equities connections

Core Engines

Cross-currency matching engine
Pricing engine
Hedging engine
Risk management engine

Digital Assets

7 crypto exchange connections
Wallet integrations
Multi-asset support

This comprehensive connectivity framework ensures that our platform can interface with virtually any financial system while maintaining the performance and reliability expected in trading environments.

Frontend and User Experience

Our frontend architecture has undergone a similar transformation, moving from legacy frameworks to modern, component-based designs:

Micro-Frontend Approach

We've moved from a monolithic frontend based on Sencha ExtJS to a modern micro-frontend architecture using ReactJS and Web Components. This approach provides:

Framework Agnostic: Core components built with Web Components standard for compatibility
FDC3 Integration: Support for the Financial Desktop Connectivity and Collaboration Consortium standard
Multi-Language Support: Built-in internationalization
Multi-Timezone Support: Configurable timezone handling for global trading

Our new grid component system supports dramatically higher load and volume, essential for displaying real-time market data and large trading positions.

Ready to explore how our architecture can power your financial systems?

Contact us to discuss how our CQRS and Event Sourcing architecture can provide the performance, reliability, and scalability your business needs.