AWS Lambda deployment pipeline

TL;DR: I’m open-sourcing a continuous deployment pipeline built for AWS to automate the process of creating and deploying AWS Lambda functions and related infrastructure.

Because of my tinkering with Alexa, I wanted to have an automated way of deploying a new version of a Lambda function just via git push. Doing it manually is cumbersome. As of late, AWS offers all the tools necessary to do so. Their Code* family of services (CodeCommit, CodeBuild & CodePipeline) are the perfect building blocks to set up this process.

Furthermore, I also wanted to automate the necessary infrastructure and treat it as code. That’s where CloudFormation comes in. I didn’t have any prior knowledge of CloudFormation, so it was a great learning experience. I used this excellent template as a start point and I want to thank to the guys over at Cloudonaut for publishing it. Still, it took me a lot of time to grasp all the concepts of CFN and I went through a lot of trial-and-error to figure out how everything ties in together.

In the end, I’m very happy with the result. This initial version is quite basic, but it works well. What makes it cool is that the pipeline is self-referencing, so any changes you make to it get automatically applied. You can read the details about how it works in the README.

I will be expanding its functionality, feel free to star the repo on GitHub and follow along.

Plain text input of passwords on mobile

Writing on a mobile device, whether it’s a smartphone or a tablet, sucks. Because of the keyboard size and lack of physical feel, it’s just so easy to get it wrong. The situation gets worse when inputting passwords. By concealing input, one can’t check for typing errors.

I’ve long been a proponent of just showing the password field in plain text on mobile devices. There are multiple ways to go about it. You can have a toggle switch, a button that reveals the input for a limited time or possibly automatically show plain text after first unsuccessful login.

The obvious concern is that of security, but I don’t think this is an insecure way. It is much easier to conceal a display of a mobile device from prying eyes. Furthermore, this approach leads to a higher success rate so there’s no need to type a password multiple times which would present more opportunity to steal it (I’ve seen people who actually whisper their passwords when typing them).

So even though I think it’s good UX I unfortunately haven’t been able to convince anyone whom I’ve been building apps for to do this nor have I seen it in out in the wild. That is, until now.

I recently signed up for Mega. Their iOS client has this exact feature on the login screen:


I have to say the execution of it is not perfect (at first, I was confused with the actual meaning of the switch and since I didn’t write anything to the password field yet, it didn’t), but it could be easily enhanced. I would like to see more apps adopting this pattern for password input, making it more user-friendly, less error prone and also secure.


I’ve recently open-sourced a small but handy Objective-C library called MCEModelEditingProxy.

Often times when presenting values from a model you want to make them editable but don’t want to store the changes back immediately, but only after a confirmation from the user (e.g. pressing a Save button). MCEModelEditingProxy does precisely that. It stands as a transparent layer between your model and your controller, intercepting writes to the model.

If this sounds too abstract, check out the README in the project repo where you can find example use cases. I hope you’ll find the library useful and include it in your projects too.

Migrating blog from Posterous

I finally migrated my old blog. I don’t think WordPress will ever sell-out, so it should be safe here, opposite to Posterous. Although I kept all past posts, I intend to blog more about technical topics here (relating to my job, my projects, Python, iOS and whatever else comes by). For the rest, I’ve set-up a Tumblr. You can just follow me on Twitter as I will publish everything there as well.

Let me help you with your data

I have a simple proposition for you. I will help you make sense out of the data you have. For free. There’s no catch. Give me your raw data, tell me what you want to learn from it and I’ll make it so.

If you’re wondering about my motivation, it’s easy. I want to sharpen my data analysis and dataviz skills. I know there are publicly available data sets I could use to do that, but I’m not particularly interested in working with those aimlessly. I need a purpose to be motivated.

Sounds intriguing? Great, get in touch.

Lessons learnt while building an HTTP API for a mobile service

Alert.Us launched yesterday. It is a child monitoring application we at Wildfuse helped to develop. I was responsible for designing and building the RESTful API and the whole server-side of the product. It was a great learning experience. Here are the main points I took away:

    Distributed systems are hard. This was the first distributed system I did. Although in comparison to what other people are building it’s almost a toy, still, it was hard. Even though I knew about fallacies of distributed computing and the CAP theorem, it’s still difficult to get it right the first time. This recent article sums it up nicely and I have to confest I felt a bit stupid after reading it.

    When building a distributed system you’ll need all the help you get. We used AWS extensively for the whole project and I believe it was a major success factor. I have to point out especially DynamoDB and SQS. These components just work. As my ops-chops are not as great, I welcomed the fact that I could just enable them and forget about them. Isn’t this the holy grail of cloud computing?

      Distributed systems are fun. They didn’t teach distributed computing at the university when I was attending and I doubt they do now. You have to learn it yourself, make mistakes and fix them, read white-papers and other people’s experiences. Learning this and working with new technologies is the fun part, but getting it finally right even more so. Unfortunately I don’t know of a good “centralized” source of information on this topic. Most of it is randomly scattered all over the Internet. For now, I’m using Prismatic to follow the news, but if you know of a good source, please share`.

        A central element of the app is an activity feed. Here are a couple of tips that might be useful if you’re building it into your product too:

        • Make the data structure representing a feed event as flexible and extensible as possible. In our case, the project requirements changed during development, as usual, and some of it also affected the feed. Be prepared.
        • As the server and client development is independent, the client apps should have a default way (a fallback) to present new, from their point of view essentially unknown, event types.
        • There should be a “beginning of time” event in every feed. It can be the date of birth of a user or the date when a user signed up. It can be used to display a welcome message. It’s also a good way to indicate that there’s nothing more in the past so the clients can stop paginating.
        • Having a feed means having events ordered by time, but time is a bitch. Consumers of this API are mobile apps. Mobile means the delivery of an HTTP request my be somewhat late, both ways. Furthermore, in our case, the apps have a “freeze” feature — if there is no network connection at a given moment, they’ll won’t fail but will wait with the sending of events until the connection is restored. All these factors raise a lot of questions. Do you trust the event timestamp from the mobile clients? What if the event is too far in the past? Do you process it the “normal” way, process without “side effects” (such as sending a push notification) or do you discard it altogether? And how far in the past is too far — 10 minutes, 1 hour? What if the event timestamp is in the future? Maybe your system will be different in that you can always attach a timestamp on the server — if so, good for you. I came to the conclusion that a good enough solution is to trust mobile clients, but if they report an event with a timestamp in the future, replace it with current time. This handles most of the cases as intended.
        • One thing I learnt about only after the feed was done was Activity Streams. I’ll leave it here for reference. It’s definitely something worth checking out.

          Speaking of time, represent time in RFC 3339 format and keep it in UTC. Always, everywhere. This will save you a lot of headaches. RFC 3339 is a subset of ISO 8601 for use on the Internet; hence it is friendlier. Follow the “Be strict in what you send, but generous in what you receive” rule — accept time with any timezone offset but send out time in UTC.

            Do errors the right way.

              Log like crazy. The bare minimum you should log is errors (d’oh), requests, responses, response times, all 3rd party integration points interactions (especially response codes and response bodies!) system state transitions, but feel free to add to this list. You can never log too much. Log in a machine readable format — JSON works pretty well. Ideally, all logs should be publicly accessible and searchable. If you don’t do any kind of analysis on your logs, you can safely delete them after ~7 days. This timeframe is enough to answer even late questions about what went wrong.

                Similar rules as for logging apply for measuring. The more you measure, the better. What gets measured, gets managed, right?

                  Read Release it!

                    Always have an up-to-date documentation of the API. In my opinion, a feature isn’t done until the documentation for it is written. I followed this rule closely. We even had an agreement that if there’s an implemented but undocumented feature of the API or a mistake in the docs, I owe a drink to the dev who found it. I never had to buy one. I got into the habit that the first thing I did after a commit was to update the docs. Accept it as part of the work. It pays its dividends — I saved a lot of time answering questions from the client developers just by pointing them to the docs. We kept them on the GitHub wiki and it worked quite well, but services like might work even better.

                      Even if you have the best documentation, there will still be information that will get lost. For example, one developer may come up with an enhancement other platforms may benefit from too or a change request comes in. A central messages board of some kind with this info aggregated would help. We didn’t have one (hence some information got “lost” or didn’t reach everyone involved in time), but it seems to me the Stripe way of having every email CCed to email groups would work well for us here. Have you encountered this problem during your carreer? I would love to hear how others are approaching it.

                        Help client developers as much as possible. Sometimes I feel frontend devs have it even harder. They have to face crazy demands of multiple parties and ever-changing feature request. I know, I’ve been there too. The last thing they want to face is a half-assed API. Ease the pain of development of your fellow comrades, they’ll love you for it.

                            Utilize the capabilities of HTTP to its fullest. I’m still amazed how well HTTP works for “modern day” use cases. Sure it’s not perfect, but it goes a long way. Be pragmatic rather than dogmatic about using HTTP. Tip #1: use the User-Agent header to identify the device type, operating system and client version/build number (e.g. “iOS 5.0; App v0.82 (1554)”). It helps when debugging. Tip #2: use Accept-Language to determine the locale of the resource. Works like a charm.

                              If you want to learn more about the design of HTTP APIs, I strongly recommend following the api-craft group and people like @johnsheehan, @kinlane, @mamund or @mikekelly85 on Twitter; surely, you’ll find others. If you know about “distributed systems” people on Twitter I can follow, please let me know, either in comments or tweet me.

                                HTTP APIs and errors – doing it the right way

                                Errors are the third class citizen of any interface; nobody wants to deal with them. Not enough care goes to crafting proper errors, but this is a mistake. It’s when things break your API consumers need most help. Proper error handling distinguishes good APIs from great. If you follow the recommendations outlined in this blog-post, you’ll be one important step closer to having a great API.

                                The basics

                                Since we’re dealing with HTTP here, be sure to take advantage of it. Always respond with a 4xx or 5xx status code when an error occurs. If you’re sending 200 OK, you’re doing it wrong. Make sure the payload is of the same content type as the rest of your API, whether you’re using JSON, XML, HTML or any other format, otherwise you’ll create a headache for API consumers. Be sure you send the appropriate Content-Type header (optionally respecting the Accept header). Most of this is covered in this great training video from Layer7.

                                Error payload

                                A title, description and an error code is the bare minimum you should send. The title and description are components of the error message which I write about in the next section.

                                As mentioned earlier, HTTP already provides us with a set of error codes and for simpler APIs, these work great and are sufficient. If you need more granular reporting, add another field to the payload which holds an application-specific error code. Have a document online with a detailed description of every application error code and add a URL pointing to it to the error payload (h/t to John Sheehan for bringing this out in the comments) This greatly helps when debugging less obvious error states.

                                Hence, a good error coming from an HTTP API might be similar to this:


                                Anatomy of a good error message

                                A good error message should be as specific as possible. Don’t use general terms as “Something went wrong” or “Error occurred”. That’s obvious. An error message should help the user recover from it. Provide hints to what went wrong, if something is missing, not well formed or conflicting. You’ll get bonus points for giving instructions on how to actually fix the error.

                                Also, keep in mind who’s the intended audience. Most API errors will be seen by developers, so you can “talk” to them in more technical terms. Some, however, may bubble up to regular app users, keep that in mind. In this case, send the title and description localized, according to the Accept-Language header.

                                Finally, don’t be afraid to write the error message in a human way, it will appear less frightening. UPDATE: Check out this great answer on StackExchange.


                                It goes without saying that all errors should be logged. What I found useful is to log the full HTTP request and the application stack trace. Have this log available online and add a new field to the error payload containing a link to the encountered error. The API consumers can then paste this link in their bug report. It makes debugging and cooperation between developers much easier.


                                Error states are an important part of every interface. Treat them as such, don’t ignore them. Users of your API will praise you.

                                Online lectures: Coursera

                                The fourth and final part in the mini-series about online education.

                                As tens of thousands of others, my first class on Coursera was Machine learning, led by Coursera co-founder Andrew Ng. It was clear from the beginning that it is a hit.

                                Professor Ng knows very well how to teach. The lectures were well structured, interesting and the principles of a given topic were communicated effectively. Coursera doesn’t make the same mistake as Udacity; lecutres are usually 10–14 mintues long with an occasional in-lecture quiz. Assignments for the ML course were stimulating and rewarding. Not that taking the ML course will make you a full-fledged data analysit (although some might argue otherwise), but you’ll learn the basics and get pointers to what to learn next.

                                It’s the same story with the rest of Coursera courses. They are fun and engaging, you’ll learn a great deal and know where to look next if you want to dive deeper in that particular subject.

                                From the technical standpoint, Coursera is not as advanced as Udacity (e.g. rewinding lecture video is a pain), but they’ve created a great paltform for online education. Plus, they are tailoring it to each course needs.

                                This, together with perfect content, partnership with top world universities, broad choice of topics make Coursera my online education platform of choise.

                                For some interesting behind the scenes info, watch this TED talk by Daphne Koller, Coursera co-founder.

                                Online lectures: Udacity

                                This is the third part of a short series about my experience with online education.

                                I took two Udacity courses – Design of computer programs and Intro into statistics.

                                There are things I like about Udacity and things I don’t. Unfortunatelly, those that I don’t like are the important ones for a online education platform.

                                The good things are that there are wise people building the platform. Technically, it is very good. I like that they use the Youtube player or the smooth transition between video playback and in-lecuture quizes. In some programming courses, you have a Python interpreter directly embedded in the browser window. They also built relationships with potential employers of Udacity “graduates”. All of this is smart and helpful, yet it doesn’t help with the main problem.

                                Udacity is not a good place to learn.

                                In my opinion it is mainly because of the format of the “lectures” – 2 minute videos are just too short and, as crazy as it might sound, I had trouble keeping my attention focused exactly because of this. Two minutes is not enough to pass on any principle. You try to keep it them in your head but the constant video switching suck. It interupts your train of thought.

                                Also, it doesn’t help that you can’t easily see the code or examples written previously and so you get stuck thinking “Why is it this way? What did the lecturer mean? How is it supposed to work?”. If you are not thinking exactly the same way as the lecturer, you’re going to have trouble following him.

                                The 2 mintue videos are at the heart of Udacity as you can hear from Peter Norvig in this TED talk, yet I hope they change it, and also improve on the other problems I’ve encountered. Until they do, I’ll prefer sites like Coursera.

                                Python pre-commit hook

                                This is a pre-commit hook I use in my Python projects.


                                Nevermind my feak bash-fu, in the end the script does what I want it to – the three following things:

                                • First, it checks if I haven’t forgotten to add a new module to the requirements.txt file. Most of the time this works like a charm with virtualenv and pip. The only drawback is installing modules in local experimental branches – these modules are not necessary in upstream branches and so they don’t belong to requirements.txt yet. When you switch back and want to commit in an upstream branch, the pre-commit hook fails. However, this is easily avoidable by using the --no-verify option of git commit.
                                • Second, it runs pyflakes on all the .py files in the repository. If there’s something pyflakes doesn’t like, the pre-commit hook fails and shows the output of pyflakes. There’s one case which is ignored and that is using the _ (underscore) function from the gettext module as install makes it available everywhere. Pyflakes documentation is non-existent and I guess there’s no way to create a configuration profile for it, so I had to resort to this hack.
                                • Finally, since I deployed code with forgotten set_trace() calls a couple of times, the third thing the script does is it checks for these and prints out the file and line number if it encounters any.

                                I keep this file as a part of the repository, making a symbolic link to it in .git/hooks/pre-commit. Make sure the file is executable.

                                  Do you have similar stuff in your VCS hooks? Is there anything I could improve in mine? I’ll be glad to see your tips in the comments.

                                  Getting the cellular network information on your iPhone

                                  I got this neat little trick from my colleague Petr. Dialing *3001#12345#* on your iPhone launches a hidden app called Field test. In it, you’ll find a lot of detailed information about your network. You can disable wifi to see even more data.


                                  To be honest, I don’t even know what half of those values mean, but you can easily Google for them. The EF-ICCID value in SIM Info can be useful even for non-developers. It’s the ID of your SIM card, the one your operator often times asks for. This way, you don’t have to take out the card from the device.

                                  Server-side verification of Google Play subscriptions

                                  TL;DR To programatically verify Google Play subscriptions, you have to use the OAuth 2.0 for web server applications auth flow, not service accounts. This is a massive FUBAR on Google’s side and makes life of developers very painful.

                                  Lately, I’ve been working on the backend part of a upcoming app we’re developing for one of our clients. This app offers monthly and yearly subscriptions, so I had to implement a check if the recurring payment happened, the credit card got billed and the app provider got the money. Of course, for multiple reasons, this has to be done server-side, completely automatically and without any intervention from the app user or provider.

                                  Google provides an API called android-publisher for this. To use any API from Google, first you have to enable it from the Console and then authenticate with it. The authentication is done via OAuth 2.0. As Google offers API access to many of their services which are used in different occasions, they also offer different OAuth 2.0 authentication flows.

                                  The flow/mechanism for server to server communication is called Service accounts in Google terminology. This is precisely what I needed. However, for reasons beyond my understanding, this is not the one used for android-publisher API. Instead, they chose Web server applications flow, which for this use case is absurd.

                                  (Sidenote: When we started to build the aforementioned app, recurring transaction were not even available for Android. We planned to use Paypal as we did for the Blackberry version. However, during development, Google introduced subscriptions for Android which made us happy.

                                  I started reading the docs and implementing the whole auth and check code, but it didn’t work; I was getting “This developer account does not own the application.” HTTP 401 error. Googling for this didn’t help – at that time, the only search results were two couple of hours old questions on Stack Overflow. I would swear the docs at that time mentioned to use Service accounts for authentication and later Google changed it. I had to re-read the docs from the beginning to debug this infuriating error.)

                                  Using Web server applications flow is ridiculous because human interaction is involved. At least once, you (in this case our client!) need to press an “Allow” button in you web browser. Palm, meet face.

                                  Here are the instructions you need to follow to achieve automated subscription verification. The code is in Python but it’s easy to adapt.

                                  First of all, in the Console, you need to create a Client ID for Web applications. You can use http://localhost as the redirect hostname. As you’ll see in a minute, it doesn’t matter much. You mostly need the Client ID and Client secret.

                                  Next, fire up the Python REPL and enter this:


                                  Use the Client ID and Client secret you obtained from Console. This piece of code will give you an authentication URL; by default, it will contain access_type=offline parameter. This is very important, make sure it’s there. Open the URL in your browser and log in with the Google account that you will be using for publishing the Android application. After a successfull login and authorization, you’ll be redirected to localhost in your browser. Unless you’re running a webserver locally, this will probably fail, but it doesn’t matter. The address you are redirected to will contain a code parameter. Copy its value and go back to the REPL again:


                                  Finally you’ve got an instance of the oauth2client.client.OAuth2Credentials class. It contains couple of properties but the only one that’s really interesting is the refresh_token. Store the refresh token to your server configuration, you can use it forever meaning until someone does not revoke the access to the API. Then you would have to got through this whole process again.

                                  Basically, thanks to this refresh token you will able to obtain a new access token on each call to the API. To do that, you create an instance of OAuth2Credentials and use that to authorize an httplib2.Http object:


                                  You can now build a service and call the get purchases API call.

                                  The following gist summarizes the whole blogpost:


                                  As long as the API access will not be revoked, you should be fine using this method.

                                  In praise of the future

                                  We’re living in the most exciting era of mankind. The scientific progress of the last hunderd years is just astonishing. This makes me happy.

                                  The Internet is just a little over 20 years old but now all of man’s knowledge is available to anyone with a connection to it. Thank you Sir Tim Berners-Lee. We have autonomous cars and planes; 100 years ago, man wasn’t able to fly at all. We have built a large underground tunnel to ram particles against each other really fast and discovered the Higgs boson. We have put a nuclear powered rover on Mars.

                                  With this amount of recent progress, can you imagine what’s waiting for us in the future? I hope for a squad of on-demand robots that will print a house according to my personal design. If I won’t be able to have a holiday in space in 30 years, I will be disappointed. Oh, and please, someone, bring back public supersonic flights.