What is it?
“A new capability that allows you to add your own code to process data retrieved from S3 before returning it to an application. S3 Object Lambda works with your existing applications and uses AWS Lambda functions to automatically process and transform your data as it is being retrieved from S3. The Lambda function is invoked inline with a standard S3 GET request, so you don’t need to change your application code.
In this way, you can easily present multiple views from the same dataset, and you can update the Lambda functions to modify these views at any time.”Danilo Poccia
AWS S3 Object Lambda allows you to pipe a file through a Lambda invocation on read. It’s something very particular and an exciting new release. This means that your applications (or customers) will read that file through a lens that you can define.
Why does this matter?
S3 Object Lambda is fantastic for a specific set of use cases. However, just because it’s there it doesn’t mean you will need it. In fact, in our early work with the service, you may not need it.
Let me explain.
In his piece introducing the service, Danilo mentions thumbnail generation. Now, do me a favor and please hold on before doing that.
Why hold off? Great question. I’m glad you asked.
Simply put, you risk spending more for little gain. This is a complex topic which we’ll cover more deeply in an upcoming piece.
When should I use it?
For now, here’s the core question:
When is it worth executing a S3 Object Lambda invocation for each reader v. saving a copy of the file?
Let’s look at some standard use cases where S3 Object Lamba will shine:
- DRM. You have many readers, interested in small variations of the same thing.
- GFTS Traffic Feeds. You have highly dynamic data. By the time you make enough copies for all your read cases, they would be old.
- Variable Format File Types: You infrequently need to generate different formats of different file types (e.g. pdf/ebook/mobi/xml).
- Stripping PII from Audit Logs. The amount of data you need to read, and process is significantly lower than the amount you have to write.
Conversely, here are a few – also standard use cases – where S3 Object Lamba may not be ideal:
- File Interaction with CLI. High-level, S3 CLI commands are not supported in S3 Object Lambda. For example, your flow relies on aws s3 cp s3://bucket/thing
- Database Replacement. You can certainly run selects and filters on the file content, but we do not recommend it. Consider a database instead, or a different file partition strategy to optimize performance and minimize cost.
- Serving Website Images or Thumbnails. Best practice for these assets is to build them once for everybody. Even if you are building a simple site like placekitten.com, please resist the temptation. The chances are very high you will end up investing more than you gain in this use case. If you really need a solution here, we recommend the use of CacheFly or CloudFront Lambda@Edge
- Low Read Cardinality. In environments where there are many reads for the same asset (e.g. your company’s internal benefits documentation), we recommend the use of CacheFly or CloudFront Lambda@Edge
- Frequently Requested / Long Lifetime (TTL) Files. Great example sof this use case are Docusign documents that you want to be able to reference for a very long time.
- Machine Learning Data Pipelines. For myriad reasons, we don’t recommend the use of S3 Object Lambda to filter fields or for field level access control for machine learning data pipelines.
- Cache. Whenever using a cache would make sense, use a cache!
With any new AWS service, or any cloud service for that matter, there is always a period of rapid evolution immediately after launch. We fully expect to push the edges of S3 Object Lambda as far as we can and determine just where the ideal uses cases can lie for our clients.
We’ll be back with more on this topic, so stay tuned.