You’re a new CIO. It’s 9:00 a.m. on a Wednesday and you’re in an emergency Zoom meeting with IT operations leaders. The faces on the screen are somber, and it’s clear why when they explain the purpose of the meeting.
It seems that all of IT ops, which was initially budgeted at $10 million for this fiscal year, is now looking at a $4 million overrun due to the unanticipated cost of the operations personnel and tools needed to operate the new bunch of applications and databases that just moved to a public cloud.
What happened? It’s likely they hit a “cloudops wall,” meaning that the cost of operating systems in the cloud was underestimated by 20% to 30%. They assumed that, at most, the cost of operating the same systems in the cloud would be about 10% more than on premises. Indeed, the industry told them that operations cost would likely be reduced.
The reality is that a few things are occurring right now.
First, the pandemic pushed many enterprises to migrate their next tranche of systems to the cloud—systems avoided at first since they were more complex and not as well designed. Moreover, these systems are interacting in new ways, such as a cloud-based database now consuming data from a traditional data center versus them living in the same data center.
Second, since there is a “need for speed” in moving to the cloud, many of the pragmatic steps have been compressed or skipped. Refactoring applications to leverage cloud-native services or containerizing some of the migrating systems has been pushed off, opting for cheaper and faster lift-and-shift processes that are underoptimized.
Finally, and most important, nobody in the company has done cloudops for these types of systems yet. For example, moving mainframe-based systems to a public cloud is much different from migrating LAMP (Linux, Apache, MySQL, and PHP) stacks, which are more modern. This lack of skills turns much of the planning into guesswork. This time they guessed wrong by 20% to 30%.
There are a few ways to fix the cloudops wall that enterprises are hitting now.
First, there needs to be more focus on refactoring or fixing systems as they move to the cloud. I often say, “Crap on premises moved to the cloud is just crap in the cloud.” Systems that get even more complicated and costly to operate in the cloud need to be fixed or improved as you move them.
It’s simple math for me. If you’re skipping improving the systems, then you need to budget more for cloudops. Or improve the systems as they migrate, such as refactoring to cloud-native services, and gain cloudops improvements and thus lower costs. It’s a clear trade-off.
Second, leverage the right cloudops tools to ensure that all operations that can be automated are automated. Most of those that hit a cloudops wall have underoptimized operations automation. They carry forward their ops practices from on premises to the cloud and end up adapting already inefficient processes and tools to systems that have become more complex.
The problem with the cloudops wall is that enterprises don’t understand why they’re hitting it. This is not a matter of systems in the cloud being more costly to operate than originally thought. This is about a lack of planning and a lack of a willingness to improve systems before moving to the cloud. It’s also about knowing how to leverage the correct cloudops tools in the right ways.
Perhaps this is another example of pay now versus pay a whole lot more later. I’ve found that the former is always a better choice in the world of cloud computing.