Christmas season is a fun time for programmers. You become the de facto tech support person in the family. You do a lot of googling, a lot of reading, a lot of configuring, a lot of troubleshooting and of course, a lot restarting. But ever wonder why you need to restart systems when they malfunction? Believe it or not, it is all because of bad programming and invalid state.
Any system can be seen as a system with a finite number of states, and transitions allow the system to move from one state to another. To put this into a working example, a state machine is like a road system. Points of interest would be your states. Where you are now is your current state. The act of driving your car to your next location is a transition. When you arrive at your next location, it becomes your current state.
Invalid states are states not anticipated by the system and therefore cause the system to act in an unexpected manner. This is often, if not always, caused by badly-written transitions. So, imagine your bad driving caused your car to take a right and jump off a bridge. The bottom of a river is probably not a point of interest, but your car is there now and you're probably causing all sorts of problems like road blocks and traffic.
How restarting helps
A restart works by putting the system back to a known state and run from there. In this case, back to the startup state. Technically, restarting does not fix the problem, and it does not guarantee that it will not happen again. So imagine a tow truck pulling your car out of the water and putting it back on the road. It does not guarantee that you will not jump off the bridge again, but your car is now back on the road and driveable.
The most common, management-friendly and marketing-appealing approach is to just write more tests and test more. But I personally don't like this approach primarily because this is a form of "guard-rail programming". Imagine driving through the mountains with a locked steering wheel and having the guard-rails direct you to your destination. A gap in the guard-rails would be deadly.
A better way to guarantee safety is through how code is written. Type safety, statelessness, referential transparency, one-way data flow, and just making code operate in an easy to reason about manner. Take the wheel and drive the car, with or without the guard-rails.
If you are a programmer and you encounter a system that requires a restart due to a malfunction, take a second to think about how you write your code. You could prevent the next restart, and the next unhappy customer. That customer may just be you.