No posts for ages - sorry. I thought this was excellent in many ways: https://www.youtube.com/watch?v=X0tjziAQfNQ
Category: Videos
Fault tolerance
Introduction I am slowly working my way through the 300+ back issues of the podcast Software Engineering Radio. I've got as far as a couple of excellent episodes on fault tolerance with Bob Hanmer. I recommend that you listen to them, even if (like me) you don't have to worry about this kind of thing … Continue reading Fault tolerance
Psychology, not technology, is the key to Google’s reliability
An excellent video by a Google Site Reliability Engineer, from Goto Conference 2017. What I liked in particular were three key points: Being honest that trying to have operations act as border guards, who attempt to vet code changes with an increasingly-long checklist before they go live, is a path to failure and frustration. Agreeing … Continue reading Psychology, not technology, is the key to Google’s reliability
Trying to not get too ranty about documenting software architecture
This article is my thoughts on a video about documenting software architecture: https://www.youtube.com/watch?v=kv8XedJTEww A summary of the video is: Domains other than software architecture, e.g. maps or electrical circuits, do a good job of capturing useful and important information in a way that communicates this well – this is mostly in pictures. Software architecture does … Continue reading Trying to not get too ranty about documenting software architecture
How to mess up A/B tests
An excellent talk about A/B tests from someone who knows - Martin Goodson. My favourite part is an A/B test that found a 2.5% improvement to (sales) conversions between the two versions of the software being tested. Unfortunately there was a bug in the A/B testing framework, such that the old version was being tested … Continue reading How to mess up A/B tests
Designing your system for when it fails (which it will)
A couple of excellent related videos from Goto Conference 2017. Some highlights are below. Metrics are better than nothing, but some context will make them much more useful. (My queue is filling up - is that because more things are arriving than I'd expect, or things are leaving more slowly?) Alerts and logs better than … Continue reading Designing your system for when it fails (which it will)
Here be dragons: testing your error handling code
Who tests the error handling parts of their code? You might want to start doing this after watching this very interesting video from Goto Conference 2016. Among other things, the speaker summarises a paper that investigates catastrophic failures in things like MapReduce, Cassandra etc. 58% of the catastrophic failures could have been prevented by testing … Continue reading Here be dragons: testing your error handling code
Getting your System into Production, and Keeping it There
An excellent video by Eoin Woods from Goto Conference 2015. The grown-up, how not to get called in the middle of the night, part to software. https://www.youtube.com/watch?v=CgWVsTkYfUM
Statistics Without the Agonising Pain, and Statistics for Hackers
John Rauser, data scientist at Pinterest, has an excellent video called Statistics Without the Agonising Pain. Less than 12 minutes, and it explains a useful stats term (statistical significance) to people who can code but don't know stats. It does this very well! https://www.youtube.com/watch?v=5Dnw46eC-0o Another video along similar lines, by Jake Vanderplas. It builds on … Continue reading Statistics Without the Agonising Pain, and Statistics for Hackers