A year or two ago, I was talking to Nancy about my interest in Continuous Integration (CI). She put me onto this book called Continuous Delivery (CD). It’s a little dated. The technologies have changed but all the principles still apply. If you are interesting in learnig more about CI or CD, this a good place to start.
Overall this book has some very useful information and some very solid theory backed up by a lot of real-world examples. However, I would be remiss if I didn’t mention that this thing reads like a textbook. If you have trouble falling asleep, 30 mins of this book every night will definitely do the trick. Don’t let that dissuade you, it is still worth reading, just don’t try to do it right before bed if you ever expect to finish.
CI versus CD
The terms CI, Continuous Integration, and CD, Continuous Delivery (sometimes Deployment), get thrown around together so often that often developers just use the combined term for them: CI/CD. What is the difference?
Continuous Integration has to do with how quickly a developer’s changes are integrated (merged) in with everyone else’s changes and ready to be released. CI proponents prefer “trunk-based” development where there is only one main branch and everyone checks into that at least once a day so that all the changes can be integrated. This requires a set of tests to make sure everything works, and ideally some automation. The anti-thesis to this is long-lived feature branches, where those changes only get merged in at the end, when the featuure is finished. Large merges are hard, therefore CI attempts to do more, smaller merges.
Continuous Delivery is related to CI. CD is concerned with how long it takes from the time a developer fixes a bug or adds a feature until it ends up in the end-users hands. CI is component of CD, but CD is much broader. The idea is that deployments can be challenging. The best way to deal with that is to automate them and do them often. Again, many smaller deployments/releases as opposed to one large one.
There is a lot in this book. Here are just a few of my takeaways.
Part of the key to making all this work and having it pay for itself is quick feedback. The main issue is the amount of time that expires between when a developer writes some code that introduces a problem and the time they realize there is a problem. We want that as short as possible. We want developers to be able to fix things before they move onto something else. Once they have moved on that context switching is costly.
If It Hurts Do It More Often
One of the main themes of the book is that integration and releases should be done often. If we let them go, then they can become very painful. The number of changes that are potential sources of problems is much greater and therefore it is harder to track down and fix the problem.
CI is a key pillar of CD. This goes along with the short feedback. It doesn’t make sense to waste time building or deploying code that is broken. If we find that out at the beginning by running unit tests, we can skip the rest of the build and deployment steps. So having a quick CI phase (the book calls it the commit phase) is really useful. The CI idea of having one main line of development is crucial so that we always know exactly which branch to build off of. It really helps to keep things organized.
One big component of CI that I have been working on and thinking about a lot lately is managing the configuration of machines/environments. This includes the development, build, test, and deployment environments. The problem is that small differences in configurations can cause lots of problems that are hard to detect. This is the “works on my machine” problem. The book advocates for automated control of these machines and storing the configuration in Source Code Control. Ideally you would use a script in your source code control to create these environments on demand. Then if you needed a new dependency instead of installing it manually, you would add it to the script.
In LabVIEW, this has always been a pain. The current solution is to use Virtual Machines, which while they work, are slightly clunky. I have also investigated Docker. It looks promising for CI (drivers are a bit of a problem), but for development, where you need a GUI, Docker doesn’t cut it. I also looked at Vagrant, which looks promising, specifically for development environments. The most promising thing I have seen so far is JKI’s Project Dragon. Perhaps combining Project Dragon with Vagrant for development environments and Docker for CI, may be the best solution.
Automating Deployment is all about reducing risk. Often there are a lot of steps and so automating it makes sure that the process is repeatable. The other point to this is to always use the same scripts for development, testing, and production. That way the scripts are getting a good workout and all the bugs are fixed before you deploy to production.
Building only once is about two things. Number one is just saving time. We want fast feedback so taking time to build something that already exists is wasteful. The more important reason is that every time we build we run the risk that we end up with a slightly different product. What if one machine has a different version of some dependency or driver? What if something in the compiled object cache gets corrupted? The book advocates to build one time for each pipeline and to store that in an artifact repository to be used by subsequent steps. This follows Fab’s advice of always building on the build machine. Although if you are strictly following the book you would also strictly monitor the configuration of that build machine and programmatically create or verify it to make sure the process is 100% repeatable.
How Can I Incorporate These Ideas?
If you want some help incorporating these ideas into your process, then let’s sit down and talk about how we can help you.