Books on game design

Some time ago Austen Allred asked Twitter for game design book recommendations. Here are the crowdsourced suggestions:

As a bonus, folks also mentioned a Standford lecture on How to design addictive games and the Game Maker’s Toolkit community (Youtube / Twitter).

Shopify custom session storage in SQL with Prisma

The easiest way how to create a Shopify app is using the CLI tool provided by Shopify. A simple shopify create command sets you up with a boilerplate-but-functional app and you’re ready to go.

One of the first things you’ll need to do is to create a persistent session storage mechanism. The boilerplate app only comes with an in-memory storage. This gist shows one way how to build it, using Prisma as the model layer. It’s pretty straightforward, but it took me a while to figure out some kinks. I hope someone else will find it useful.

The way how the Shopify Node library is handling session is still being improved – I’ll update the gist as the new version is released.

How to connect to Cloud SQL in Cloud Run using Prisma

This guide assumes that Cloud SQL is configured with a public IP (the default). There are essentially three steps you need to do when you want to connect to the Cloud SQL instance from Cloud Run using Prisma:

  1. Create an IAM service account with the Cloud SQL Client role and attach it to your Cloud Run service. When deploying from the command line, pass it viat the --service-account parameter to gcloud run deploy

  2. Make the Cloud SQL instance available to your Cloud Run service. That can be done in the web console. If you’re doing it in a deploy script, you’ll need the instance connection name. It’s typically a colon separated string like PROJECT_NAME:REGION:INSTANCE_NAME. Pass --add-cloudsql-instance INSTANCE_CONNECTION_NAME --update-env-vars INSTANCE_CONNECTION_NAME="INSTANCE_CONNECTION_NAME" to the gcloud run deploy command to make it happen.

  3. Finally, tell Prisma to use the socket made available by Cloud SQL Proxy in Cloud Run to connect to the database. To do so, add a host=/cloudsql/INSTANCE_CONNECTION_NAME URL param to the DATABASE_URL. The full URL will then look something like this:
    postgresql://username:password@localhost/db_name?host=/cloudsql/gcp_project:us-central1:db. If you’re using MySQL, you might want to use socket instead of host.

Next.js and Webpack 5

Recently, I ran into weird error messages when trying to get Next.js 10 and Webpack 5 to work:

Module parse failed: parser.isAsiPosition is not a function

Module not found: Can't resolve 'node_modules/next/node_modules/@babel/runtime/helpers/assertThisInitialized' in 'node_modules/next/dist/next-server/lib'

After a lot of hours digging around, I found the solution. Just add future: { webpack5: true } to the config in next.config.js and you’re done. Hope this helps.

ELT vs. ETL

There are two basic paradigms of building a data processing pipeline: Extract-Transform-Load (ETL) and Extract-Load-Transform (ELT). ETL is, still, the default way, but this approach has a lot of drawbacks and it’s becoming obvious that building an ELT pipeline is better.

First of all, there’s actually no such thing as a pure ETL pipeline. There will always have to be another Transform step after the data is loaded into the data warehouse. You’ll end up having an ETLT process or two ETL pipelines joined together.

ETL pipelines are tricky to build correctly. There are subtleties with each integration that, if done wrong, can be costly. At best, you’ll lose time and money rebuilding it. At worst, you’ll lose data and produce incorrect analyses.

ETL pipelines are even trickier to operate. You don’t want to test just the code, but also the data. You need to set up a good deployment and monitoring process. You want to log both success and error metrics. Don’t forget about alerting. Do your data engineers want to be on-call? The list goes on and on.

ETL pipelines are also inherently inflexible. They need to be rigid to give the “most correct” data possible, but this also makes them more difficult to adapt. And adapt they must, as the world around keeps changing all the time. Whether its a new API version or a new business requirement, you’ll need to incorporate this change. To do so, a data engineer and a data analyst need to work in tandem.

Which leads to another problem with ETL, this time around organizational design. Regardless how you structure your data team, a data engineer will always have less skin in the game – they’ll never directly take the blame and lose credibility for wrong data in a BI dashboard. They’ll feel less responsible, hence less interested, in doing the meticulous work necessary. Also, needing a data engineer to change a pipeline just leads to slower pace of development overall, a huge competitive disadvantage in today’s world.

All of this makes building and running an ETL process a slow, expensive, and complex undertaking. The truth is, Extract and Load steps are undifferentiated heavy lifting – they are not specific to any company yet every company needs to do them to at least have a chance of getting insights from their data. So why do it at all when there’s a better alternative in the form of ELT?

Let someone with way more experience and expertise handle the EL so you can focus on the T.

You’ll get your data sooner, faster and in a reliable fashion. You’ll save money on paying extra data engineers (my guesstimate is with ETL, the data engineer to data analyst ratio is around 1:2 whereas with ELT, it’s closer to 1:5). You’ll make your data analysts faster, independent, happy.

Standard ETL has been around for a long time, but its time has passed. With modern tools, there’s no point of not doing ELT. Ask yourself this – if you have to choose with a slow, error-prone, expensive way of achieving a goal or a fast, reliable and cheaper alternative, which one would you go for?

Hot-reloading node.js and TypeScript

TL;DR: Use tsc-watch

I’ve spent some time researching how to get hot-reloading working with Node & TypeScript. Due to the mess that is the JS dev ecosystem, finding a solid solution took me longer I expected for this kind. Hopefully this post will save you that effort.

The best solution I found is using tsc-watch. Install it as a dev dependency and set tsc-watch –onSuccess 'node .' as your start script.

That’s it, happy hacking.

Two tips for writing CloudFormation templates

Here are two tips for writing more readable CloudFormation templates.

1) Use dot notation to access attributes

The Fn::GetAtt intrinsic function supports the use of dot notation to refer to a resource’s attribute. It works both the long Fn::GetAtt and short !GetAtt forms of the syntax. Instead of the long, enumerated, YAML array syntax:

Value:
  Fn::GetAtt:
    - MyResource
    - Arn

Use the one-liner version:

Value: !GetAtt MyResource.Arn

2) Use !Sub instead of !Join

When creating a string, using !Sub is often a much better option than using !Join. Here’s an example of building an ECR repository URI:

Value:
  Fn::Join:
    - '.'
    - - !Ref 'AWS::AccountID'
      - dkr
      - ecr
      - !Ref 'AWS::Region'
      - amazonaws.com
      - '/'
      - !Ref 'ServiceName'

With this approach, it’s hard to grasp what the string looks like, it’s hard to write correctly and it’s hard to debug. Now compare it to using !Sub:

Value: !Sub '${AWS::AccountId}.dkr.ecr.${AWS::Region}.amazonaws.com/${ServiceName}'

A cool thing to notice is the explicit !Ref on pseudo-parameters and template parameters. This makes the whole construction so much nicer. If you need to access an attribute, you can use the dot notation as mentioned above.

I hope you find these tips useful and apply them in your practice.

DynamoDB, NodeJS vs. Python and persistent connections

Recently, Yan Cui wrote an enlightening blogpost about using keep-alive HTTP connections to significantly speed up DynamoDB operations. He gave an example of how to do it in NodeJS. I was curious how to do it in Python.

To my surprise, I found out I did not have to do anything at all. DynamoDB keeps the connection open. See for yourself – using the CLI, run aws dynamodb list-tables --debug. Notice the response headers section, which looks something like this:

 Response headers:
 {'Server': 'Server', 
  'Date': 'Thu, 07 Mar 2019 19:42:55 GMT', 
  'Content-Type': 'application/x-amz-json-1.0', 
  'Content-Length': '328', 
  'Connection': 'keep-alive', 
  'x-amzn-RequestId': '38N9IJV176MACH027DNIRT5C53VV4KQNSO5AEMVJF66Q9ASUAAJG', 
  'x-amz-crc32': '2150813651'}

The Connection: keep-alive header is set by DynamoDB. Unless it’s explicitly set to close, the connection will stay open. Yet this is exactly what NodeJS does. Thank you to Stefano Buliani for providing additional visibility into this. This behaviour is inherited by the aws-js-sdk. I think that’s a mistake so I’ve opened a bug in the GitHub repo. Until then, if you’re writing code in JS, be sure to follow Yan’s recommendation.

Connection: keep-alive vs. close in Python

I was still curious if  I could replicate Yan’s findings in Python. Here’s a log of running a single putItem operation using vanilla boto3 DynamoDB client:

boto3-dynamodb-default

Except for the first one, most of them are sub 10 ms, since the connection is kept open.

However, when I explicitly did add the Connection: close header, things looked a lot different:

boto3-dynamodb-connection-close

Operations took at least 50 ms, often longer. This is in line with Yan’s findings.

Granted, my approach was not very rigorous. For the sake of replicability, here’s the code I used. Feel free to run your own experiments and let me know what you found.

Uselatest – a Cloudformation macro to always use the latest version of a Lambda Layer

One of the drawbacks of using a Lambda Layer is that you must declare it by its full version. This is a hassle as every time you update a Layer, you need to update its  declaration in every stack to get the latest updates. It would be much better if one could specify it only by its name (similar as with the FunctionName when declaring event source mapping). That is, instead of arn:aws:lambda:us-east-1:123456789012:layer:my-layer:24 just use my-layer.

I made a Cloudformation macro to do just that.

Uselatest scans through a Cloudformation template and replaces occurrences of Lambda Layers that are not fully qualified with an ARN of the latest available version of that Layer. This way you don’t have to think about updating a template after updating a Layer. The latest version will automatically get picked up during stack deployment. Magic. ✨

The macro works in all the places where you can declare a Layer. Check the Example section for more.

I wanted to make it available in the Serverless App Repo, but sadly, a Cloudformation Macro is not a supported resource. You’ll have to build, package and deploy it yourself if you want to use it.

Unit testing AWS services in Python

Consider the following piece of code:

import boto3
Table = boto3.resource('dynamodb').Table('foo')
def get_user(user_id):
ddb_response = Table.get_item(Key={'id': user_id})
return ddb_response.get('Item')
view raw models.py hosted with ❤ by GitHub

It’s a contrived example that just reads an item of data from a DynamoDB table. How would you write a unit test for the get_user function?

My favourite way to do so is to combine pytest fixtures and botocore’s Stubber:

from botocore import Stubber, ANY
import pytest
import models
@pytest.fixture(scope="function")
def ddb_stubber():
ddb_stubber = Stubber(models.Table.meta.client)
ddb_stubber.activate()
yield ddb_stubber
ddb_stubber.deactivate()
def test_user_exists(ddb_stubber):
user_id = 'user123'
get_item_params = {'TableName': ANY,
'Key': {'id': user_id}}
get_item_response = {'Item': {'id': {'S': user_id},
'name': {'S': 'Spam'}}}
ddb_stubber.add_response('get_item', get_item_response, get_item_params)
result = main.get_user(user_id)
assert result.get('id') == user_id
ddb_stubber.assert_no_pending_responses()
def test_user_missing(ddb_stubber):
user_id = 'user123'
get_item_params = {'TableName': ANY,
'Key': {'id': user_id}}
get_item_response = {}
ddb_stubber.add_response('get_item', get_item_response, get_item_params)
result = main.get_user(user_id)
assert result is None
ddb_stubber.assert_no_pending_responses()
view raw test_models.py hosted with ❤ by GitHub

There’s couple of things to note here.

First, I’m using the wonderful scope functionality of pytest fixtures. This allows me to create a new fixture per every test function execution. It is necessary for Stubber to work correctly.

The Stubber needs to be created with the correct client. Since I’m using a DynamoDB Table instance in models.py, I have to access its client when creating the Stubber instance.

Notice also the “verbose” get_item_response structure in the first test. That’s because of how the DynamoDB client interacts with DynamoDB API (needless to say, this is DynamoDB specific). The Table is a layer of abstraction on top of this, it converts between DynamoDB types and Python types. However it still uses the client underneath, so it expects this structure nevertheless.

Finally, it’s good practice to call assert_no_pending_response to make sure the tested code actually did make the call to an AWS service.

I really like this combination of pytest and Stubber. It’s a great match for writing correct and compact tests.

Does Lambda need timeout and memory size parameters?

Following my previous post on judging the serverlessness of a technology, I apply this criterion to AWS Lambda. I argue that the timeout and memory size configuration parameters are non-essential and should be made optional. The need to think about them makes Lambda less serverless than it could be.

On timeout

The way you naturally write a function is to finish as soon as possible. It’s just good engineering and good for business. Why then artificially limit its execution time?

The most common case I hear about using timeout is when a Lambda calls some external API. In this scenario, it is used as a fail-safe in case the API takes too long to respond. A better approach is to implement a timeout on the API call itself, in code, and fail the Lambda gracefully if it does not respond in time instead of relying on the runtime to terminate your function. That’s also good engineering.

So here’s my first #awswishlist entry: Make timeout optional and let functions run as long as they need to.

On memory size

I have two issues with the memory size parameter.

First of all, it’s a leaky abstraction of the underlying system. You don’t just specify how much memory your function gets, but also the CPU power. There’s a threshold where the Lambda container is assigned 2 vCPUs instead of 1. Last time I checked this was at 1024 MB, but there’s no way of knowing this unless you experiment with the platform. Since Lambda does not offer specialized CPU instances like EC2 does (yet?), it might not matter, but I worked on a data processing application where this came into play. Why not allow us to configure this directly? What if I need less memory but more vCPUs?

However a more serious point of contention for me is that setting the memory size is an issue of capacity planning. That’s something that should have gone away in the serverless world. You have to set it for the worst possible scenario as there’s no “auto-scaling”. It really sucks when your application starts failing because a Lambda function suddenly needs 135 MB of memory to finish.

Hence here’s my second #awswishlist entry: Make memory size optional. Or provide “burst capacity” for those times a Lambda crosses the threshold.

Now I won’t pretend I understand all the complexities that are behind operating the Lambda platform and I imagine this is an impossible request, but one can dream.

And while I’m at it, a third #awswishlist item is: Publish memory consumed by a Lambda function as a metric to CloudWatch.

Closing remarks

I do see value in setting either of these parameters, but I think those are specialized cases. For the vast majority of code deployed on Lambda, the platform should take care of “just doing the right thing” and allow us, developers, to think less about the ops side.

Thinking less about servers

Even though serverless has been around for a couple of years now, there is not a clear definition what the term actually means. Leaving aside that it’s a misnomer to begin with, I think part of the confusion stems from the fact that it is being applied to in two different ways. Serverless can either describe a quality of a technology (DynamoDB) or it can refer to an approach of building IT systems (a serverless chat-bot).

My way to judging the former is this:

The less you have to think about servers the more serverless a technology is. Furthermore, serverless is not a binary value but a spectrum.

Let me give an example. On a completely arbitrary scale from 1 to 10, I would rate DynamoDB with provisioned capacity as 8/10 serverless. It’s not fully serverless because I still need to think deeply about data access patterns, predict read and write load and monitor utilization once my system is operational. However, with the recent announcement of on demand pricing, I would rate DynamoDB 10/10. I don’t need to think about any of these aforementioned idiosyncrasies (burdens, really) of using the technology.

The second aspect of a serverless technology (and by proxy also a system) is that you don’t pay for idle except for data storage. Once again, if you need to think about something even if it’s not running (and clearly you’re going to think about your credit card bill), it is not serverless.

This is the promise of serverless. Once you start combining these technologies into systems, you can think about and focus on building value and leave the operational cost on the technology provider.