Xebia Background Header Wave
Caching is a useful technique to improve performance or avoid overload of services. There are many solutions available,
and in AWS one of them is using DynamoDB. Using DynamoDB as a cache to store data with a limited time to live (TTL) made sense in our use case.
We were already using the database for client data. Using it as a cache allowed us to limit the number
of services we needed expertise on. It is also reasonably fast and will scale way beyond our needs. In this blog I
will show how to implement a cache using DynamoDB and a lambda written in TypeScript. In a follow-up blog I’ll show
another technique to improve the performance of queries on largish datasets.
The source code for the examples in this blog can be found in https://github.com/jvermeir/dynamodb/tree/main/cache-blog.

I’ve used CDK to build the infrastructure needed for the cache service.
The infrastructure is defined in cache-blog/lib/cache-blog-stack.ts.

The important parts are

const cacheTable = new dynamoDB.Table(this, 'CacheTable'...
and
const handler = new lambda.Function(this, 'CacheHandler', ...

The cache table has a partitionKey named id. Data can be accessed very efficiently using this id.
The table definition also adds an attribute named timeToLiveAttribute with the value ttl.
This attribute controls how long a record in the table remains valid. DynamoDB will automatically remove a record when its
ttl is in the past. There are no useful guarantees about the cleanup process, but we’ll come back to that later.
This code creates the table:
    const cacheTable = new dynamoDB.Table(this, 'CacheTable', {
      tableName: 'Cache',
      partitionKey: {
        name: 'id',
        type: dynamoDB.AttributeType.STRING,
      },
      billingMode: dynamoDB.BillingMode.PROVISIONED,
      timeToLiveAttribute: 'ttl',
      removalPolicy: cdk.RemovalPolicy.DESTROY,
    });


The lambda is defined with Node18 as its runtime:

    const handler = new lambda.Function(this, 'CacheHandler', {
      runtime: lambda.Runtime.NODEJS_18_X,
      code: lambda.Code.fromAsset('lambda'),
      handler: 'cache.handler',
    });


Besides the table and the lambda, cache-blog-stack.ts also defines an api gateway that will allow http access to the
cache lambda.

To deploy the stack, run:

cdk bootstrap

to create the stack in your aws account. And then

npm run build && cdk deploy

The stack is now created. The output shows the endpoint of the api gateway used to access the caching lambda. It will
look like this:

CacheBlogStack
Deployment time: 22.37s
Outputs:
CacheBlogStack.CacheServicecacheapiEndpointDA49EE49 
  = https://0znqtkdtml.execute-api.eu-west-1.amazonaws.com/prod/

Now we can store and retrieve values from the cache like this:

curl -X POST https://0znqtkdtml.execute-api.eu-west-1.amazonaws.com/prod/ -H "Accept: application/json" -d '{ "id": "id1", "value":"30"}'

curl https://0znqtkdtml.execute-api.eu-west-1.amazonaws.com/prod/\?id\=id1

The value of id doesn’t really matter, but a pattern we found useful looks like <tableName> – <uuid>, e.g.
PersonalData – dbf275d5-d61b-488a-97fc-c2cb1832c852. This would store personal data, maybe retrieved from a
backend service, for a user account with id dbf275d5-d61b-488a-97fc-c2cb1832c852. In principle, the uuid value
would be enough, but we found we often want to store different types of data about a specific user account. In that
case the uuid would be the uuid of the account and the prefix of the cache key would identify the type of data in the
cache record.

The lambda code is straight forward, implementing a GET and POST method.

The GET method is interesting because it shows
how to avoid a pitfall with the TTL attribute. const getCacheItem = async (id: string): … in cache-blog/lambda/database.ts
executes this query when retrieving the value with the given id:

    const queryCommand = new QueryCommand({
      TableName: CACHE_TABLE,
      ExpressionAttributeNames: {
        '#id': 'id',
        '#ttl': 'ttl',
      },
      KeyConditionExpression: '#id = :id',
      ExpressionAttributeValues: {
        ':id': id,
        ':now': now,
      },
      FilterExpression: '#ttl > :now',
    });


The important part is FilterExpression: ‘#ttl > :now’. This filter expression is necessary because DynamoDB doesn’t
guarantee outdated records are removed when their TTL is passed. Therefore, a record may still be in the database when it has
actually expired. We can fix this problem by adding the filter expression. Note that TTL is expressed in seconds.

Removing the infrastructure created for this blog can be done with CDK:

cdk destroy
Are you sure you want to delete: CacheBlogStack (y/n)? y
CacheBlogStack: destroying... [1/1]
CacheBlogStack: destroyed

References:

Jan Vermeir
Developing software and infrastructure in teams, doing whatever it takes to get stable, safe and efficient systems in production.
Questions?

Get in touch with us to learn more about the subject and related solutions

Explore related posts