Hidden caching features: the silent memory killers

In modern software development, caching is a primary strategy for performance optimization. While effective, its misapplication — particularly through abstracted or hidden mechanisms — can lead to unintended consequences, such as excessive memory consumption that can compromise application stability. This blog post explores a case study where an inappropriate caching layer within a data processing service caused significant and unnecessary memory overhead.

The double-edged sword of caching

Caching works on a simple principle: store frequently accessed data in fast-access memory to avoid expensive operations like database queries or complex calculations. But when caching mechanisms operate behind the scenes without proper visibility or control, they can become memory black holes that are difficult to detect and debug.

Case study

This case study examines an email processing service built on AWS that exhibited unexpected memory behavior. What appeared to be a straightforward integration with S3 revealed a hidden complexity that caused memory consumption to balloon - and the culprit wasn’t immediately obvious.

System architecture and workflow

The system is designed to process inbound emails using an architecture on AWS. The process flow usually looks like:

Ingestion: AWS Simple Email Service (SES) receives an email.
Storage: SES stores the email content as an object in an Amazon S3 bucket.
Notification: A notification is published via Simple Notification Service (SNS) and sent to a Simple Queue Service (SQS) queue.
Processing: A message in the SQS queue triggers a processing service to retrieve the corresponding email object from S3 for processing.

Anomalous memory profile

During operation, the processing service exhibited an anomalous memory profile. Its memory consumption rapidly increased to over 2GB and then plateaued — we were expecting to see transient spikes that would return to a baseline after each email was processed. While the behaviour didn’t trigger an out-of-memory exception in this instance (as the service was provided with more resources that it would actually need), it did indicate a critical memory leak.

High memory usage on email processing service

Problem identification and analysis

The issue manifested during the S3 object retrieval step. The processing service utilised a shared internal library, S3StoreService, which was presumed to be a simple wrapper for S3 interactions. The standard pattern involved calling a StoreForBucket method to get an IStreamStore interface, from which the object was retrieved via a TryRetrieve call.

Root cause

An investigation into the S3StoreService implementation revealed that it was a sophisticated abstraction employing a chain-of-responsibility pattern that introduced several layers of caching:

ExpiringStreamStore: Checked for the object in a local file system cache.
CachingStreamStore: If not found locally, this layer would check an in-memory cache. If the object was not in the cache, it would be retrieved from S3 and then added to the in-memory cache.
TracingStreamStore: Provided logging and metrics for the operations.

The root cause of the memory issue was the CachingStreamStore. In this particular workflow, each email is retrieved and processed exactly once. The caching mechanism was therefore retaining every processed email in memory, providing no performance benefit as the cached data was never accessed again. This led to continuous memory growth as new emails were processed.

Resolution

Once the source of the memory consumption was traced to the unnecessary caching within the IStreamStore abstraction, the team was able to implement a direct fix.

The S3StoreService was bypassed in favour of using a native S3Client to interact directly with the S3 bucket. This modification eliminated the caching layers that were unsuitable for this single-read use case. Following this change, the service’s memory profile immediately returned to the expected behaviour: nominal baseline consumption with brief, transient spikes during active processing, fully resolving the issue.

Nominal memory usage on email processing service

An overactive appetite

The road to memory leaks is often paved with well-intentioned abstractions. We discovered our application’s memory had developed an insatiable appetite, a problem traced to a “smart” caching feature that was too clever by half. It diligently saved every piece of single-use data it touched, proving that even a presumed silver bullet can miss the target and hit your memory allocation instead.

Life at Gearset