Imagine you just finished a Data Migration (DM) project, everything went smoothly, the data were loaded into the new system with a minimum amount of issues, inherent sometimes to such complex projects, the users started to use the new system, everybody seemed to be satisfied, and a few weeks later within the company rumors propagate with the speed of light – “the migrated data are wrong”, “the new system can’t be used” , “IT did a bad job”, “we have to get back to the previous system”, and so on. The panic propagates, a few heads fall, the business tries to revert to the old system but there’s lot of new data available in the new system, and it’s not so trivial to move the data back to the old, in the meantime other rumors appear, and… it’s just a scenario but this could happen to any company if not the appropriate measures were taken at the right time. What could help a company when something like this happens?! A good Plan B aka a good Migration Fallback Plan/Policy, but that’s something nobody would like to do except extreme situations.
A common approach to any type of projects as well to a DM project is to identify and mitigate the risks before or during the project. That’s something I started to do a few days ago, to prepare a list with the risks associated with DM projects. For this exercise I tried to remember what things went wrong in previous similar projects I worked on and to figure out what else could go wrong. Some online resources helped me to refresh my memory too, and I think I found also two or three things I haven’t really thought about. My attempt was primarily focused on this type of problem mentioned above – minimizing the risks of not having the right data when the new system goes live. Before jumping into the thematic I would like to sketch the bigger picture, as I perceive it.
Having the right data when a system goes live primarily means having good Data Quality (DQ) in the target system after the data were migrated! As a DM is the best exemplification of the GIGO (Garbage-In Garbage-Out) principle, in order to have good DQ in the target is important to handle DQ latest during the DM project. That’s essential and common sense – you can’t expect to have good data in the new system when there’s lot of garbage in the old. So, a DM and a Risk Management for such a project should be built around this. In fact not having a DQ initiative or project in a DM project is one of the most important risks a company can take. Maybe in small DM, a DQ initiative isn’t necessary, though when the data are important for your company, DQ is a must! In addition DQ assessments have to be performed in alignment with the new system, and not the old. Even if the data have good quality into the old system, the quality of your data after DM will be judged in corroboration with the new system. This is a requirement that can be easily overlooked and its implications misunderstood!
Many think that DQ is one time activity, we do it for a DM project and we’ll have quality data and never have to care about their quality anymore. Totally false! DQ has to be part of a broader strategy, call it Data Governance, Master Data Management, Data Management or any other initiative in which data plays an important role. DQ is an on going, iterative and consolidated effort, it doesn’t end after DM but continues for the whole data life-cycle, as long the data have value for an organization. It doesn’t help if the data have high quality when the data are migrated and a few weeks or months later the overall quality and trust in data decreased considerably.
Keeping an acceptable level of DQ must be an organization’s strategy, and must be built a culture toward DQ. People need to be aware of the importance of having good quality data, and especially the consequences of having bad quality data. DQ doesn’t concern only the owners or stewards of data, or the people working with data, it concerns the whole organization because decisions are made based on those data, processes are changed and improved, an organization’s performance is often judged based on data. The quality of data is a matter of perception, on how users see the quality of data in corroboration with the needs they have, and the needs change over time. Primarily being aware what good DQ means and which are an organization’s needs in respect to data, it’s also a way of minimizing the negative perception of data, of gaining trust in data and some solid basis on which decisions can be made. Secondarily, these organizational data needs need to be addressed in a DM, they are the success factors upon which the success of a DM project is judged.
For sure considerable costs are associated with DQ initiatives and everything related to data which doesn’t always represent a direct cost component in the products or services handled by an organization. Considering that not all data have the same importance for an organization, it makes sense to prioritize the DQ effort as a whole and the data cleaning needs in particular, the focus should be the data with the highest impact and with time to tackle data with lower and lower impact. It must be found equilibrium between the DQ costs and the value of data. Most probably is important to spend resources on raising people’s awareness in respect to DQ early rather than cleaning retroactively data later. It also make sense to invest in tools that help to clean data using automated or semi-automated methods, though some manual/visual control need to be in place too.
DQ and the way the problems associated with it are tackled depends more on an organization’s internal kitchen – people, partners, organization, strategy, maturity, culture, geography, infrastructure, methodologies used, etc. What it matters is how the various negative and important aspects of an organization are aligned in order to take advantage of one of the most important assets an organization has is its data! For this is important to adopt methodologies that support DQ, align them and tweak them as requested, in order to make most of your data! But before or while doing that remember that a DM is an organization’s opportunity to change the quality of its data and its data strategy!