5 Steps to Implement Snapshots in Event Sourcing

Implementing snapshots in event sourcing enhances performance and scalability by reducing the need for event replay during state reconstruction.

Essential Designs Team

April 30, 2025

TechIndustry

Snapshots simplify event-sourcing systems by reducing the need to replay all events to rebuild an aggregate's state. Instead, they act as checkpoints, storing the state at a specific point in time. This saves time, improves performance, and makes systems more scalable. Here's how to implement snapshots effectively:

Set Snapshot Rules: Create snapshots based on event count, time intervals, or significant state changes.
Design Storage: Use a structured schema with aggregate IDs, version numbers, timestamps, and state data. Choose storage options like event stores, document databases, or caches.
Build Snapshot Logic: Integrate snapshot creation into command handlers and ensure proper error handling.
Load Snapshots: Retrieve the latest snapshot, apply remaining events, and validate consistency.
Test & Optimize: Measure performance, test edge cases, and monitor metrics like load times and storage usage.

Key Benefits:

Faster state reconstruction
Improved system performance
Scalability for high-event systems

Snapshots are essential for managing long event histories efficiently. Follow these steps to build a robust snapshot system.

Event Sourcing: Rehydrating Aggregates with Snapshots

Step 1: Set Snapshot Creation Rules

To manage performance and resource use effectively, it's crucial to establish clear rules for creating snapshots. These rules form the groundwork for efficient snapshot use in event sourcing systems.

Event Number Triggers

Using event count-based triggers means creating snapshots after a set number of events. This approach ensures predictable resource use. Adjust the event count based on the size of your aggregates and how often they’re updated. For smaller aggregates with frequent updates, you may need snapshots more often. On the other hand, larger aggregates or those updated less frequently can use a more spaced-out schedule.

Time-Based Creation

Time-based snapshots happen on a fixed schedule, which helps maintain consistent resource management. This method works best for systems with a steady flow of events. Match the snapshot schedule to your system’s activity. High-traffic systems might need multiple snapshots daily, while low-traffic systems could manage with daily or even less frequent snapshots.

State Change Triggers

State change triggers create snapshots when key state transitions occur. This approach is particularly useful for systems where significant state changes are critical. Use this method in scenarios like:

When major business rules are applied
During significant state transitions
After running complex calculations
When multiple related events occur at once

For example, in an order processing system, snapshots could be created after key status updates like "payment received", "shipped", or "delivered", instead of minor updates.

Trigger Type	Best For	Considerations
Event Number	Systems with predictable event flow	Adjust thresholds based on size and event rate
Time-Based	Systems with steady event patterns	Sync intervals with system activity levels
State Change	Systems with key business processes	Trigger on major business events

Step 2: Build Storage Structure

Setting up an efficient storage structure for snapshots is crucial for maintaining system performance. The design here lays the groundwork for effective snapshot management.

Data Structure

When defining the snapshot structure, include these key elements:

Aggregate ID: A unique identifier for the entity.
Version Number: A sequential counter to track state changes.
Timestamp: The UTC time when the snapshot was created.
State Data: The complete state of the aggregate at the snapshot's creation.
Metadata: Additional details like creator ID or business context.

For serialization, choose a format based on your needs. Use JSON for better readability or Protocol Buffers if speed is a priority.

Storage Options

Your storage choice should align with your system's requirements. Here's a breakdown of common options:

Storage Type	Best For	Impact on Performance
Event Store	Tight integration with events	Medium read/write speed
Document DB	Complex state structures	Fast reads, medium write speed
Redis Cache	High-frequency access	Extremely fast, but limited persistence
Blob Storage	Large state objects	Slower writes, medium read speed

Database Schema

A well-designed schema is essential for fast retrieval and scalability. Follow these guidelines:

Primary Key
Combine the Aggregate ID and Version Number into a composite primary key. Index the Timestamp separately to speed up queries for the latest snapshot.
Secondary Indexes
Add indexes for:
- Retrieving the latest snapshot for each aggregate.
- Querying snapshots within specific time ranges.
- Filtering snapshots by state type.
Partitioning Strategy
Use partitioning to improve performance:
- By time range to distribute the load.
- By aggregate type for quicker targeted queries.
- By business domain for logical data separation.

These strategies ensure efficient writes for new snapshots and fast reads for rebuilding aggregate states.

Here’s an example SQL schema for PostgreSQL:

CREATE TABLE snapshots (
    aggregate_id VARCHAR(36) NOT NULL,
    version_number BIGINT NOT NULL,
    timestamp TIMESTAMP NOT NULL,
    state_data JSONB NOT NULL,
    metadata JSONB,
    PRIMARY KEY (aggregate_id, version_number)
);
CREATE INDEX idx_snapshots_latest ON snapshots (aggregate_id, timestamp DESC);

This schema is optimized for quick snapshot retrieval and integrates seamlessly with an event sourcing system, ensuring scalability as your data grows.

Step 3: Code Snapshot Creation

Add snapshot creation directly into your command handler to maintain system performance and data consistency.

Command Handler Integration

To integrate snapshot logic, embed it into your command handling process. Here's an example in C#:

public class OrderCommandHandler {
    private readonly ISnapshotStore _snapshotStore;
    private readonly IEventStore _eventStore;

    public async Task Handle(UpdateOrderCommand command) {
        var aggregate = await LoadAggregate(command.AggregateId);
        aggregate.ProcessCommand(command);

        await _eventStore.SaveEvents(aggregate.Id, aggregate.Version, aggregate.GetUncommittedEvents());

        if (ShouldCreateSnapshot(aggregate)) {
            await CreateAndSaveSnapshot(aggregate);
        }
    }

    private bool ShouldCreateSnapshot(OrderAggregate aggregate) {
        return aggregate.Version % 100 == 0 || 
               aggregate.HasSignificantStateChange();
    }
}

This ensures snapshots are created when specific conditions are met, such as after every 100 changes or significant state updates.

Data Conversion Methods

Once integrated, focus on converting aggregate states into a format suitable for snapshots. Use a serialization method that balances efficiency and data accuracy:

public class SnapshotConverter {
    public SnapshotData Convert(IAggregateRoot aggregate) {
        return new SnapshotData {
            AggregateId = aggregate.Id,
            Version = aggregate.Version,
            Timestamp = DateTime.UtcNow,
            State = JsonSerializer.Serialize(aggregate, 
                new JsonSerializerOptions {
                    WriteIndented = false,
                    PropertyNamingPolicy = JsonNamingPolicy.CamelCase
                }),
            Metadata = CreateMetadata(aggregate)
        };
    }
}

This method ensures the aggregate's state is stored in a compact and consistent format.

Error Management

Proper error handling is critical for reliable snapshot creation. Below is a table summarizing strategies for common error types:

Error Type	Handling Strategy	Recovery Action
Storage Failures	Log and continue	Retry during the next trigger
Serialization Errors	Capture state details	Log diagnostic information
Concurrency Conflicts	Use optimistic locking	Resolve automatically or alert the team

Here’s an example of how to implement error management in snapshot creation:

public async Task CreateAndSaveSnapshot(IAggregateRoot aggregate) {
    try {
        var snapshot = _converter.Convert(aggregate);
        await _snapshotStore.Save(snapshot);
        _logger.LogInformation($"Snapshot created for {aggregate.Id} at version {aggregate.Version}");
    }
    catch (StorageException ex) {
        _logger.LogError($"Failed to save snapshot: {ex.Message}");
        _metrics.RecordSnapshotFailure(aggregate.Id);
        // Continue core operations without interruption
    }
    catch (Exception ex) {
        _logger.LogCritical($"Unexpected error during snapshot creation: {ex.Message}");
        await _alertingService.NotifyTeam(ex);
    }
}

By logging errors and continuing operations, you minimize disruptions while ensuring issues are traceable.

Monitoring and Validation

To maintain reliability, validate snapshot functionality early in development. Keep detailed logs of snapshot events to simplify troubleshooting and fine-tune system performance. Track important metrics such as creation time, storage usage, and error rates to identify and resolve bottlenecks quickly.

These practices help ensure your system remains efficient and resilient over time.

sbb-itb-aa1ee74

Step 4: Load and Apply Snapshots

This step focuses on restoring the state by loading snapshots and applying events.

Snapshot Retrieval

Loading snapshots efficiently is key to rebuilding the state.

Here’s an example in C#:

public class SnapshotLoader {
    private readonly ISnapshotStore _snapshotStore;
    private readonly IEventStore _eventStore;

    public SnapshotLoader(ISnapshotStore snapshotStore, IEventStore eventStore) {
        _snapshotStore = snapshotStore;
        _eventStore = eventStore;
    }

    public async Task<OrderAggregate> LoadLatestState(Guid aggregateId) {
        var snapshot = await _snapshotStore.GetLatest(aggregateId);
        var aggregate = snapshot != null 
            ? DeserializeSnapshot(snapshot)
            : new OrderAggregate(aggregateId);

        return aggregate;
    }

    private OrderAggregate DeserializeSnapshot(SnapshotData snapshot) {
        try {
            var state = JsonSerializer.Deserialize<OrderAggregate>(
                snapshot.State,
                new JsonSerializerOptions {
                    PropertyNameCaseInsensitive = true
                }
            );
            state.SetVersion(snapshot.Version);
            return state;
        }
        catch (JsonException ex) {
            throw;
        }
    }
}

Once the snapshot is loaded, apply the remaining events to fully reconstruct the state.

Event Application

To ensure the integrity of the process, verify that events are applied in the correct order:

public async Task<OrderAggregate> ReconstructState(Guid aggregateId) {
    var aggregate = await LoadLatestState(aggregateId);
    var events = await _eventStore.GetEventsAfterVersion(
        aggregateId, 
        aggregate.Version
    );

    foreach (var @event in events) {
        if (@event.Version != aggregate.Version + 1) {
            throw new VersionMismatchException(
                $"Expected version {aggregate.Version + 1}, got {@event.Version}"
            );
        }
        aggregate.Apply(@event);
    }

    return aggregate;
}

Data Consistency Checks

To maintain data integrity, perform consistency checks during the loading process.

Check Type	Implementation	Recovery Action
Version Alignment	Compare the snapshot version with the event stream	Rebuild from an earlier snapshot
Data Integrity	Validate a checksum of the snapshot data	Fall back to a full event replay
State Consistency	Verify business rules after reconstruction	Log and alert if inconsistencies detected

Here’s an example of a consistency validator:

public class ConsistencyValidator {
    public bool ValidateSnapshot(SnapshotData snapshot, IAggregateRoot aggregate) {
        return ValidateVersion(snapshot, aggregate) &&
               ValidateChecksum(snapshot) &&
               ValidateBusinessRules(aggregate);
    }

    private bool ValidateVersion(SnapshotData snapshot, IAggregateRoot aggregate) {
        return snapshot.Version <= aggregate.Version &&
               snapshot.Timestamp <= DateTime.UtcNow;
    }

    private bool ValidateChecksum(SnapshotData snapshot) {
        var calculated = ComputeChecksum(snapshot.State);
        return calculated == snapshot.Checksum;
    }

    private bool ValidateBusinessRules(IAggregateRoot aggregate) {
        return aggregate.ValidateInvariant();
    }

    private string ComputeChecksum(string state) {
        return "";
    }
}

This method combines snapshot loading, event application, and consistency validation to ensure a reliable state restoration process. Adjust these techniques as needed to fit your system's specific requirements.

Step 5: Test and Improve

Once you've implemented snapshot creation and retrieval, it's time to evaluate its performance and make necessary adjustments.

Speed Tests

Run detailed speed tests to measure how using snapshots impacts performance:

public class SnapshotPerformanceTester
{
    public async Task<PerformanceMetrics> CompareReconstruction(Guid aggregateId)
    {
        var stopwatch = new Stopwatch();

        // Test without snapshot
        stopwatch.Start();
        var withoutSnapshot = await ReconstructFromEvents(aggregateId);
        var noSnapshotTime = stopwatch.ElapsedMilliseconds;

        stopwatch.Restart();
        // Test with snapshot
        var withSnapshot = await ReconstructFromSnapshot(aggregateId);
        var withSnapshotTime = stopwatch.ElapsedMilliseconds;

        return new PerformanceMetrics
        {
            EventReplayTime = noSnapshotTime,
            SnapshotLoadTime = withSnapshotTime,
            ImprovementPercent = CalculateImprovement(noSnapshotTime, withSnapshotTime)
        };
    }
}

Here are the key metrics to monitor:

State Load Time: Should be under 500ms
Memory Usage: Keep it below 256MB
CPU Utilization: Aim for less than 30%
I/O Operations: Limit to fewer than 50 operations per second

These measurements ensure that using snapshots consistently speeds up reconstruction.

Special Case Testing

It's important to test scenarios with extreme conditions, like long event chains:

public class EdgeCaseTests
{
    public async Task TestLongEventChain()
    {
        var aggregate = new OrderAggregate(Guid.NewGuid());
        for (int i = 0; i < 10000; i++)
        {
            await aggregate.ProcessCommand(new AddItemCommand());
        }

        // Assert snapshot validity
        var snapshot = await _snapshotStore.CreateSnapshot(aggregate);
        var restored = await _snapshotLoader.LoadLatestState(aggregate.Id);

        Assert.Equal(aggregate.Version, restored.Version);
    }
}

These tests help confirm that the snapshot system handles edge cases effectively. Afterward, keep an eye on system performance to catch issues early.

System Tracking

Track metrics to ensure your snapshot system operates within acceptable limits:

public class SnapshotMetricsCollector
{
    private readonly IMetricsStore _metricsStore;

    public async Task TrackSnapshotCreation(SnapshotData snapshot)
    {
        await _metricsStore.Record(new SnapshotMetric
        {
            AggregateId = snapshot.AggregateId,
            Size = snapshot.State.Length,
            CreationTime = DateTime.UtcNow,
            EventsSinceLastSnapshot = snapshot.Version - LastSnapshotVersion
        });
    }
}

Focus on maintaining these targets:

Average Load Time: Under 1 second
Snapshot Size: Below 1MB
Events Between Snapshots: Fewer than 1,000
Failed Load Rate: Less than 1%

Summary

Snapshots help reduce event replay overhead and improve performance in event-sourcing systems. By following five key steps - setting clear rules for snapshot creation, designing an efficient storage setup, implementing snapshot generation, restoring states accurately, and testing the process regularly - teams can significantly enhance system performance. These steps ensure better responsiveness and resource management.

An effective snapshot strategy should address key aspects like when to create snapshots, how to store them efficiently, and ensuring reliable state restoration.

Here are some practical tips:

Use strategic intervals for snapshot creation.
Keep snapshots small to save storage space.
Build strong error-handling mechanisms for both creation and loading.
Track system performance to ensure efficiency goals are met.

FAQs

How can I decide how often to create snapshots in an event-sourcing system?

Determining the optimal frequency for creating snapshots in an event-sourcing system depends on factors like performance, storage capacity, and system complexity. Snapshots are used to reduce the time it takes to rebuild an entity's state, so the key is to balance efficiency with resource usage.

Consider these guidelines:

Event Count: Create a snapshot after a specific number of events, such as every 100 or 1,000 events, depending on your system's performance needs.
Time Intervals: Use time-based triggers, such as daily or hourly snapshots, if your system processes a high volume of events.
Performance Testing: Regularly test your system under load to determine the snapshot frequency that minimizes rebuild times without overloading storage.

Adjust these parameters as your system grows or changes to maintain optimal performance.

How can I handle errors during snapshot creation and loading to maintain system reliability?

To ensure system reliability when creating and loading snapshots in event sourcing, follow these best practices:

Validate data integrity: Always verify the completeness and accuracy of the snapshot before saving or loading it. This helps prevent corrupted data from impacting the system.
Implement error recovery: Design fallback mechanisms to revert to the latest valid state if a snapshot fails. For instance, you can rebuild the state from past events if the snapshot is unusable.
Log errors effectively: Maintain detailed logs of any issues encountered during snapshot operations. This helps with troubleshooting and ensures transparency in the system.

By combining robust error handling and recovery strategies, you can minimize disruptions and maintain the reliability of your event-sourced system.

What are the best ways to test and monitor snapshot performance in event sourcing systems to ensure efficiency?

To effectively test and monitor the performance of snapshots in event sourcing systems, start by validating their impact on system efficiency. Measure load times before and after implementing snapshots to ensure they reduce the time required to rebuild aggregate states. Use profiling tools to monitor memory usage and CPU performance during snapshot creation and retrieval.

Additionally, implement automated tests to verify the accuracy of snapshots. These tests should check that snapshots correctly represent the aggregate state at the time they were taken and that they integrate seamlessly with subsequent events. Regularly review metrics and logs to identify any bottlenecks or anomalies.

By consistently testing and monitoring, you can ensure that snapshots enhance system performance without introducing errors or inefficiencies.

Share this post

TechIndustry

Essential Designs Team

April 30, 2025

Our Clients

Making hundreds of businesses better, big or small.

Lifeguard

“Essential Designs was able to create a cutting edge application that will save lives, they always say "Anything can be done" and are definitely able to deliver on that promise.”

Jeff Hardy,
Founder

Teck Resources

"We’ve been engaged with Essential Designs for several years now and we’ve found that the value they deliver is significantly above everyone else that we deal with."

Rick Twaddle,
SBA

Schneider Electric

We are a company present in more than 150 countries..I was very happy with their efficiency..they were responsive and happy to make any changes that we required.

Kirill Kudymov,
Product Manager

Handld

"I had a lot of trust in them, everyone knew their job, and they worked very efficiently. ...the trust and communication skills were what distinguished them from their competitors.’’

Cristen Phipps,
Owner

Top 10 Benefits of Custom Software for Small and Mid-Sized Businesses

Essential Designs Team

October 8, 2025

Off-the-shelf software often limits small and mid-sized businesses. At Essential Designs, we create custom solutions designed around your goals, streamlining operations, strengthening security, and supporting long-term growth. Explore the top 10 benefits of going custom.

SoftwareSuccess

software company

AI Assistance

AIInnovation

Web Application