When moving data from traditional on-premises systems to public clouds, what to do with the data is the primary focus. Many enterprises simply replicate their current data technology, governance, and security in their cloud provider, not really thinking about improving how data is stored and used, just re-platforming it.
There are many old and new approaches to storing and using data. From the older to the newer we have data warehouses, data lakes, data lakehouses, and data mesh, as well as hybrid approaches that leverage some or all approaches. These are good concepts to understand but have perhaps confused those who are just looking for pragmatic ways to move their existing data to the cloud.
Moreover, each of these approaches comes with a unique technology stack, such as data warehouse databases, object storage, master data management, and data virtualization. All are handy tools to solve most of your transactional data and analytical data needs and should be understood as well.
What are the more pragmatic approaches to dealing with data moving to the cloud? Here are three to start with.
First, fix your data as it moves to the cloud. Just as we purge our junk before a move, data within most enterprises needs updating, if not a complete overhaul. The problem is that most enterprises blow the budget on the migration and have little or no funds left for changes and upgrades to the data design and technology. This could mean redesigning the schemas, adding metadata management and data governance, or using new database technology models (SQL to NoSQL).
The reality is that if you don’t take the time to fix the data during the move, you’re likely to migrate the data twice. First, lifting and shifting the data to platform/database analogs on the public clouds. Then, fixing the data in the future by migrating to new schemas, databases, and database models on the public cloud.
Second, weaponize data virtualization if needed. Data virtualization tools allow you to create a database structure that exists only in software, using several back-end physical databases. This is older technology that’s been modernized for the cloud and allows you to work around issues with the physical database designs without forcing physical changes to the back-end databases.
The value is how the layer of abstraction provides a view of the data that is better aligned to how applications and users want to see and consume it. Also, you’re not forced to fix issues with physical databases. If you think this is kicking the database reengineering can down the road, you’re right.
Finally, create or augment your existing database road map. Most enterprises have a vision and a plan for their databases existing on the cloud, but rarely is it written down or does it specify larger agreements with the developers, ops teams, security teams, etc.
There should be a detailed road map of database technology in and out of the cloud. It should include maturation of the databases, migration to new technology, and planning for data security and governance—anything that should occur in the next five years to improve the way data is stored and consumed—both by transactional and analytical systems.
This is where the approaches listed above are helpful; certainly data mesh and others should be considered. Look at the best practices and the emerging architectural patterns. However, don’t get lost in the technology. This is a fit-for-purpose exercise.
Data is the most important asset a company owns, but it’s not often treated like a first-class citizen of enterprise IT. It’s about time that changes.