In my day job, we’re using Lambda and Step Functions to create data processing pipelines. This combo works great for a lot of our use cases. However for some specific long running tasks (e.g. web scrapers), we “outsource” the computing from Lambda to Fargate.
This poses an issue – how to plug that part of the pipeline to the Step Function orchestrating it. Using an Activity does not work when the processing is distributed among multiple workers.
A solution I came up with is creating a gatekeeper loop in the Step Function to oversee the progress of the workers by a Lambda function. This is how in looks:
The gatekeeper function (triggered by the GatekeeperState) checks, if external workers have finished yet. This can be done by waiting until an SQS queue is empty, counting the number of objects in an S3 bucket or any other way indicating that the processing can move onto the next state.
If the processing is not done yet, the gatekeeper function raises a
NotReadyError. This is caught by the
Retry block in the Step Function, pausing the execution of a certain period of time, as defined by its parameters. Afterwards, the gatekeeper is called again.
Eventually, if the work is not done even after
MaxAttempts retries, the ForceGatekeeperState is triggered. It adds a
"force: true" parameter to the invocation event and calls the gatekeeper right back again. Notice that the gatekeeper function checks for this
force parameter as the very first thing when executed. Since it’s present from the ForceGatekeeperState, it returns immediately and the Step Function moves on to the DoneState.
For our use case, it was better to have partial results than no results at all. That’s why the ForceGatekeeperState is present. You can also leave it out altogether and have the Step Function execution fail after
MaxAttempt retries of the gatekeeper.
The default way of creating a zip package that’s to be deployed to AWS Lambda is to place everything – your source code and any libraries you are using – in the service root directory and compress it. I don’t like this approach as, due to the flat hierarchy it can lead to naming conflicts, it is harder to manage packaging of isolated functions and it creates a mess in the source directory.
What I do instead is install all dependencies into a
lib directory (which is as simple as
pip install -r requirements.txt -t lib step in the deployment pipeline) and set the
PYTHONPATH environment variable to
/var/runtime:/var/task/lib when deploying the Lambda functions.
This works because the zip package is extracted into
/var/task in the Lambda container. While it might seem as an unstable solution, I’ve been using this for over a year now without any problems.
TL;DR: I’m open-sourcing a continuous deployment pipeline built for AWS to automate the process of creating and deploying AWS Lambda functions and related infrastructure.
Because of my tinkering with Alexa, I wanted to have an automated way of deploying a new version of a Lambda function just via
git push. Doing it manually is cumbersome. As of late, AWS offers all the tools necessary to do so. Their Code* family of services (CodeCommit, CodeBuild & CodePipeline) are the perfect building blocks to set up this process.
Furthermore, I also wanted to automate the necessary infrastructure and treat it as code. That’s where CloudFormation comes in. I didn’t have any prior knowledge of CloudFormation, so it was a great learning experience. I used this excellent template as a start point and I want to thank to the guys over at Cloudonaut for publishing it. Still, it took me a lot of time to grasp all the concepts of CFN and I went through a lot of trial-and-error to figure out how everything ties in together.
In the end, I’m very happy with the result. This initial version is quite basic, but it works well. What makes it cool is that the pipeline is self-referencing, so any changes you make to it get automatically applied. You can read the details about how it works in the README.
I will be expanding its functionality, feel free to star the repo on GitHub and follow along.
Writing on a mobile device, whether it’s a smartphone or a tablet, sucks. Because of the keyboard size and lack of physical feel, it’s just so easy to get it wrong. The situation gets worse when inputting passwords. By concealing input, one can’t check for typing errors.
I’ve long been a proponent of just showing the password field in plain text on mobile devices. There are multiple ways to go about it. You can have a toggle switch, a button that reveals the input for a limited time or possibly automatically show plain text after first unsuccessful login.
The obvious concern is that of security, but I don’t think this is an insecure way. It is much easier to conceal a display of a mobile device from prying eyes. Furthermore, this approach leads to a higher success rate so there’s no need to type a password multiple times which would present more opportunity to steal it (I’ve seen people who actually whisper their passwords when typing them).
So even though I think it’s good UX I unfortunately haven’t been able to convince anyone whom I’ve been building apps for to do this nor have I seen it in out in the wild. That is, until now.
I recently signed up for Mega. Their iOS client has this exact feature on the login screen:
I have to say the execution of it is not perfect (at first, I was confused with the actual meaning of the switch and since I didn’t write anything to the password field yet, it didn’t), but it could be easily enhanced. I would like to see more apps adopting this pattern for password input, making it more user-friendly, less error prone and also secure.
I’ve recently open-sourced a small but handy Objective-C library called MCEModelEditingProxy.
Often times when presenting values from a model you want to make them editable but don’t want to store the changes back immediately, but only after a confirmation from the user (e.g. pressing a Save button). MCEModelEditingProxy does precisely that. It stands as a transparent layer between your model and your controller, intercepting writes to the model.
If this sounds too abstract, check out the README in the project repo where you can find example use cases. I hope you’ll find the library useful and include it in your projects too.
I finally migrated my old blog. I don’t think WordPress will ever sell-out, so it should be safe here, opposite to Posterous. Although I kept all past posts, I intend to blog more about technical topics here (relating to my job, my projects, Python, iOS and whatever else comes by). For the rest, I’ve set-up a Tumblr. You can just follow me on Twitter as I will publish everything there as well.
I have a simple proposition for you. I will help you make sense out of the data you have. For free. There’s no catch. Give me your raw data, tell me what you want to learn from it and I’ll make it so.
If you’re wondering about my motivation, it’s easy. I want to sharpen my data analysis and dataviz skills. I know there are publicly available data sets I could use to do that, but I’m not particularly interested in working with those aimlessly. I need a purpose to be motivated.
Sounds intriguing? Great, get in touch.