Wednesday, March 02, 2016

Self-Service BI


    According to Gartner, the world's leading information technology research and advisory company, Self-Service BI (aka self-service analytics, ad-hoc analysis, personal analytics), for short SSBI, is a “form of business intelligence (BI) in which line-of-business professionals are enabled and encouraged to perform queries and generate reports on their own, with nominal IT support” [1].

    Reading between the lines, SSBI presumes the existence of an infrastructure made of tools to support it (aka self-service BI tools), direct or indirect access to row data and/or data models for the users, and the skillset needed in order to work with data and answer to business problems/questions.

A Little History

     The concept of self-service is not new, it just got “rebranded” and transformed into a business opportunity. The need for business users to perform ad-hoc analyses was always there in organizations, especially in the ones not having the right infrastructure for harnessing their data. Even since the 90s with the appearance of products like MS Excel or MS Access in many organizations users were forced by the state of art to learn how to use such products in order to get the answers they needed from the data. Users started building personal solutions, many of them temporary, intended to fill the reporting gaps organizations had. With a little effort and relatively small investment users had the possibility of playing with the data, understanding the data, identifying and solving problems in the business. They acquired thus a certain level of business expertise and data awareness becoming valuable resources in the organization.

     With time such solutions grew in scope and data volume, gained broader visibility and reached deeper in organizations, some of them becoming team, departmental or cross-departmental solutions. What grows uncontrolled with time starts to have negative impact on the environment. First tools’ management became a problem because the solutions needed to be backed-up and maintained regularly, then other problems started to surface: security of data, inefficient data processing as increasing volumes of data were processed on local computers and transferred over the network, data and effort were duplicated, different versions of reality existed as different numbers were reported, numbers that were reflecting different definitions, knowledge about the business or data-analysis skillsets. The management needed a more consolidated and standardized effort in order to address these problems. Organizations were forced or embraced the idea of investing money in modern BI solutions, in more powerful servers capable of handling a larger amount of requests, in flexible data models that facilitate data consumption, in data quality initiatives. Thus through various projects a considerable number of such solutions were converted into more standardized and performant BI solutions, the IT department being in control of the changes and new requests.

Back to Present

    With IT in control of the reporting requirements the business is forced to rely on the rapidity with which IT is able to address new requirements. Some organizations acquired internal resources in order to build reports and afferent infrastructure in-house, others created partnerships with vendors, or approached a combination of the two. As the volume of requirements isn’t uniform over time, the business has to wait several days between the time a requirement was addressed to IT and a solution was provided. In business terms a few of days of waiting for data can equate with the loss of an opportunity, a decision taken too late, decision that could have broader impact.

     A few years ago things started to change when the ad-hoc analysis concept was rebranded as self-service and surfaced as trend. This time vendors like Qlik, Tableau, MicroStrategy or Microsoft, some of the main SSBI vendors, are offering easy to use and rich in functionality tools for data integration, visualization and discovery, tools that reflect the advances made in graphics, data storage and processing technologies (e.g. in-memory databases, parallel processing). With just a few drag-and-drops users are able to display details, aggregate data, identify trends and correlations between data. In addition the tools can make use of the existing data models available in data warehouses, data marts and other types of data repositories, including the rich set of open data available on the web.

Looking at the Future

   Like its predecessors SSBI seems to address primarily data analysts and data-aware business users, however in time is expected to be adopted by more organizations and become more mature where already adopted. Of course, some of the problems from the early days more likely will resurface though through governance, better architectures and tools, integration with other BI capabilities, trainings and awareness most of the problems will be overcome. More likely there will be also organizations in which SSBI will fail. In the end each organization will need to find by itself the value of SSBI.

[1] Gartner (2016) Self-Service Analytics [Online] Available from:
] Gartner (2016) Magic Quadrant for Business Intelligence and Analytics Platforms, by Josh Parenteau, Rita L. Sallam, Cindi Howson, Joao Tapadinhas, Kurt Schlegel, Thomas W. Oestreich [Online] Available from:

Saturday, February 27, 2016

2¢ on BI Myths: Business Intelligence is Complex


    While looking over “Business Intelligence Concepts and Platform Capabilities” Coursera MOOC resources for Module 2 I run into two similar articles from Solutions Review, respectively Information Age. What caught my attention was the easiness with which the complexity of BI “myth” is approached in both columns.

    According to the two sources the capabilities of nowadays BI tools “enabled business users to easily identify and present trends in an impactful way” [1], and “do not require an expert at the helm” [2]. It became thus simpler for users to independently query data and create interactive reports and presentations [2]. In both columns one can read between the lines that the simplicity of using BI tools is equivalent with negating the complexity of BI, which from my point of view is false. In fact here are regarded especially the self-service BI tools, in trend nowadays, that allow users to easily perform ad-hoc analysis with a minimal involvement from IT. Self-service BI is only a subset of what BI for organizations means, and just a capability from the many BI capabilities an organization needs in theory, even if some organizations might use it extensively.

Beyond the Surface

    A BI tool is not a BI solution per se, even if many generic BI solutions for different systems are available out of the box. This is one of the biggest confusion managers, users and unfortunately also BI professionals make. A BI tool offers the technological basis for creating a BI infrastructure, though it comes with no guarantees. It takes a well-defined IT and business strategy, one or more successful projects, skillful developers and users in order to harness the BI investment.

    On the other side it’s also true that organizations can obtain results also from less, though BI doesn’t equates with any ad-hoc analysis performed by users, even if they use BI tools for this purpose. BI is not only about tools, reporting and revealing trends in the data. BI often implies a holistic knowledge about the business and certain data awareness, without which users will start aggregating and comparing apples with pears and wonder why they taste and look different.

    If everything were so simple then why so many BI projects fail to deliver what’s expected? Why so many managers complain that they don’t have the data they need, when they need them? Sure maybe the problem lies in over-complexifying the whole BI landscape and treating everything from a high-level, though that’s more likely not it.

It’s a Teamwork Knowledge Game

    BI is or needs to be monitoring and problem solving oriented. This requires a deep understanding about processes and business. There are business users and also BI professionals who don’t have the knowledge one needs in order to approach a business problem. One can see that from the premises they have, the questions they raise, the data they consider, the models they build, and the results.

    From a BI professional’s perspective, even if one has a broad knowledge about various businesses, one often lacks the insight in a given business. BI professionals can seldom provide adequate BI solutions without input and feedback from the business. Some BI professionals rely too much on their knowledge, same as the business sometimes expects a maximum output from BI professionals by providing a minimum of input.

    Considering the business users, quite often their focus and knowledge cover only the data boundaries of their department, while many problems extend over those boundaries. They know facts that are not necessarily reflected in the data. Even if they are closer to the data than other parties, they still lack some data-awareness (including statistical awareness) in order to approach problems.

    Somebody was saying ironically when talking about users’ data and problem solving skills - “not everybody is a Bill Gates or Steve Jobs”. Continuing the idea, one can’t expect users to act as such. For sure there are many business users who are better problem solvers than BI consultants, though on the other side one can’t expect that the average business user will have the same skillset as an experienced BI consultant. This is in fact one of the problems of self-service BI. Probably with time and effort organization will develop such resources, though some help from BI professionals will be still needed. Without a good cooperation between the business and BI professionals an organization might not have the hoped results when investing in BI

More on Complexity

    The complexity arises when one tries to make more with the data, especially the data found in raw form. Usually the complexity of raw data can be addressed by building a logical or physical model that allows easier consumption of data. Here is the point where the users find themselves overwhelmed, because for this is required a good knowledge of the physical data model and its semantics, the technical knowledge to build models and the skills to reengineer the logic available in the source systems. These are the themes BI professionals are supposed to excel in. Talking about models, they are the most difficult to build because they reflect various segments of the business, they reflect a breakdown of the complexity. It’s also the point where many BI projects fail as the built models don’t reflect the reality or aren’t capable to answer to business questions.

    Coming back to the two columns, I have to point out that the complexity of a subject or domain can’t be judged based on how easy is to approach basic tasks. The complexity lies typically when one goes beyond the basics, when one dives into details. In case of BI its complexity starts when one attempts mixing various technologies and knowledge domains to model and solve daily business problems in an integrated, holistic, aligned, consistent and cost-effective manner. The more the technologies, the knowledge domains and constraints one has to consider, the more complex the BI landscape and solutions become.

    On the other side this doesn’t mean that the BI infrastructure can’t be simplified, that BI can’t rely heavily or exclusively on self-service BI solutions. However for each strategy there are advantages and disadvantages and one more likely has to consider both sides of the coin in the process. And self-service BI has its own trade-offs, weaknesses that can be transformed in strengths with time.


   When one considers nowadays BI tools capabilities, ad-hoc analyses are relatively easy to perform and can lead to results, though such analyses don’t equate with BI and the simplicity with which they are performed don’t necessarily imply that BI is simple as a whole. When one considers the complexity of nowadays businesses, the more one dives in various problems a business has, the more complex the BI landscape seems. In the end it’s in each organization powers to simplify and harmonize its BI infrastructure to a degree that its business goals aren’t affected negatively.

[1] Information Age (2015) 5 Myths about Intelligence, by Ben Rossi, [Online] Available from: 
[2] SolutionsReview (2015) Top 5 Business Intelligence Myths Revealed, by Timothy King, [Online] Available from:
[3] Gartner (2016) Magic Quadrant for Business Intelligence and Analytics Platforms, by Josh Parenteau, Rita L. Sallam, Cindi Howson, Joao Tapadinhas, Kurt Schlegel, Thomas W. Oestreich [Online] Available from: 
[4] Coursera (2016) Business Intelligence Concepts, Tools, and Applications MOOC, led by Jahangir Karimi, University of Colorado, [Online] Available from:

Friday, May 29, 2015

Keeping Current or the Quest to Lifelong Learning for IT Professionals


    The pace with which technologies and the business changes becomes faster and faster. If 5-10 years back a vendor needed 3-5 years before coming with a new edition of a product, nowadays each 1-2 years a new edition is released. The release cycles become shorter and shorter, vendors having to keep up with the changing technological trends. Changing trends allow other vendors to enter the market with new products, increasing thus the competition and the need for responsiveness from other vendors. On one side the new tools/editions bring new functionality which mainly address technical and business requirements. On the other side existing tools functionality gets deprecated and superset by other. Knowledge doesn’t resume only to the use of tools, but also in the methodologies, procedures, best practices or processes used to make most of the respective products. Evermore, the value of some tools increases when mixed, flexible infrastructures relying on the right mix of tools working together.

    For an IT person keeping current with the advances in technologies is a major requirement. First of all because knowing modern technologies is a ticket for a good and/or better paid job. Secondly because many organizations try to incorporate in their IT infrastructure modern tools that would allow them increase the ROI and achieve further benefits. Thirdly because, as I’d like to believe, most of the IT professionals are eager to learn new things, keep up with the novelty. Being an adept of the continuous learning philosophy is also a way to keep the brain challenged, other type of challenge than the one we meet in daily tasks.

Knowledge Sources

    Face-to-face or computer-based trainings (CBTs) are the old-fashioned ways of keeping up-to-date with the advances in technologies though paradoxically not all organizations afford to train their IT employees. Despite of affordable CBTs, face-to-face trainings are quite expensive for the average IT person, therefore the IT professional has to reorient himself to other sources of knowledge. Fortunately many important Vendors like Microsoft or IBM provide in one form or another through Knowledge Bases (KB), tutorials, forums, presentations and Blogs a wide range of resources that could be used for learning. Similar resources exist also from similar parties, directly or indirectly interested in growing the knowledge pool.

    Nowadays reading a book or following a course it isn’t anymore a requirement for learning a subject. Blogs, tutorials, articles and other types of similar material can help more. Through their subject-oriented focus, they can bring some clarity in a small unit of time. Often they come with references to further materials, bring fresh perspectives, and are months or even years ahead books or courses. Important professionals in the field can be followed on blogs, Twitter, LinkedIn, You Tube and other social media platforms. Seeing in what topics they are interested in, how they code, what they think, maybe how they think, some even share their expertize ad-hoc when asked, all of this can help an IT professional considerably if he knows how to take advantage of these modern facilities.

    MOOCs start to approach IT topics, and further topics that can become handy for an IT professional. Most of them are free or a small fee is required for some of them, especially if participants’ identity needs to be verified. Such courses are a valuable resource of information. The participant can see how such a course is structured, what topics are approached, and what’s the minimal knowledge base required; the material is almost the same as in a normal university course, and in the end it’s not the piece of paper with the testimonial that’s important, but the change in perspective we obtained by taking the course. In addition the MOOC participant can interact with people with similar hobbies, collaborate with them on projects, and why not, something useful can come out of it. Through MOOCs or direct Vendor initiatives, free or freeware versions of software is available. Sometimes the whole functionality is available for personal use. The professional is therefore no more dependent on the software he can use only at work. New possibilities open for the person who wants to learn.

Maximizing the Knowledge Value

    Despite the considerable numbers of knowledge resources, for an IT professional the most important part of his experience comes from hand-on experience acquired on the job. If the knowledge is not rooted in hand-on experience, his knowledge remains purely theoretical, with minimal value. Therefore in order to maximize the value of his learning, an IT professional has to attempt using his knowledge as much and soon as possible in praxis. One way to increase the value of experience is to be involved in projects dealing with new technologies or challenges that would allow a professional to further extend his knowledge base. Sometimes we can choose such projects or gain exposure to the technologies, though other times no such opportunities can be sized or identified.

    Probably an IT professional can use in his daily duties 10-30% of what he learned. This percentage can be however increased by involving himself in other types of personal or collective (open source or work) projects. This would allow exploring the subjects from other perspective. Considering that many projects involve overtime, many professionals have also a rich personal life, it looks difficult to do that, though not impossible.

    Even if not on a regular basis achievable, a professional can allocate 1-3 hours on a weekly basis from his working time for learning something new. It can be something that would help directly or indirectly his organization, though sometimes it pays off to learn technologies that have nothing to do with the actual job. Somebody may argue that the respective hours are not “billable”, are a waste of time and other resources, that the technologies are not available, that there’s lot of due tasks, etc. With a little benevolence and with the right argumentation also such criticism can be silenced. The arguments can be for example based on the fact that a skilled professional can be with time more productive, a small investment in knowledge can have later a bigger benefit for both parties – employee and employer. An older study was showing that when IT professionals was given some freedom to approach personal projects at work, and use some time for their own benefit, the value they bring for an organization increased. There are companies like Google who made from this type of work a philosophy.

    A professional can also allocate 1-3 hours from his free time while commuting or other similar activities. Reading something before going to bed or as relaxation after work can prove to be a good shut-down for the brain from the daily problems. Where there’s interest in learning something new a person will find the time, no matter how busy his schedule is. It’s important however to do that on a regular basis, and with time the hours and knowledge accumulate.

    It’s also important to have a focused effort that will bring some kind of benefit. Learning just for the sake of learning brings little value on investment for a person if it’s not adequately focused. For sure it’s interesting and fun to browse through different topics, it’s even recommended to do so occasionally, though on the long run if a person wants to increase the value of his knowledge, he needs somehow to focus the knowledge within a given direction and apply that knowledge.

    Direction we obtain by choosing a career or learning path, and focusing on the direct or indirect related topics that belong to that path. Focusing on the subjects related to a career path allows us to build our knowledge further on existing knowledge, understanding a topic fully. On the other side focusing on other areas of applicability not directly linked with our professional work can broaden our perspective by looking at one topic from another’s topic perspective. This can be achieved for example by joining the knowledge base of a hobby we have with the one of our professional work. In certain configurations new opportunities for joint growth can be identified.

    The value of knowledge increases primarily when it’s used in day-to-day scenarios (a form of learning by doing). It would be useful for example for a professional to start a project that can bring some kind of benefit. It can be something simple like building a web page or a full website, an application that processes data, a solution based on a mix of technologies, etc. Such a project would allow simulating to some degree day-to-day situations, when the professional is forced to used and question some aspects, to deal with some situations that can’t be found in textbook or other learning material. If such a project can bring a material benefit, the value of knowledge increases even more.

    Another way to integrate the accumulated knowledge is through blogging and problem-solving. Topic or problem-oriented blogging can allow externalizing a person’s knowledge (aka tacit knowledge), putting knowledge in new contexts into a small focused unit of work, doing some research and see how other think about the same topic/problem, getting feedback, correcting or improving some aspects. It’s also a way of documenting the various problems identified while learning or performing a task. Blogging helps a person to improve his writing communication skills, his vocabulary and with a little more effort can be also a visit card for his professional experience.

    Trying to apply new knowledge in hand-on trainings, tutorials or by writing a few lines of code to test functionality and its applicability, same as structuring new learned material into notes in the form of text or knowledge maps (e.g. concept maps, mind maps, causal maps, diagrams, etc.) allow learners to actively learn the new concepts, increasing overall material’s retention. Even if notes and knowledge maps don’t apply the learned material directly, they offer a new way of structuring the content and resources for further enrichment and review. Applied individually, but especially when combined, the different types of active learning help as well maximize the value of knowledge with a minimum of effort.


    The bottom line – given the fast pace with which new technologies enter the market and the business environment evolves, an IT professional has to keep himself up-to-date with nowadays technologies. He has now more means than ever to do that – affordable computer-based training, tutorials, blogs, articles, videos, forums, studies, MOOC and other type of learning material allow IT professionals to approach a wide range of topics. Through active, focused, sustainable and hand-on learning we can maximize the value of knowledge, and in the end depends of each of us how we use the available resources to make most of our learning experience.

Saturday, November 10, 2012

Data Quality: Data Migration’s Perspective – Part I: A Bird’s-Eye View

    Imagine you just finished a Data Migration (DM) project, everything went smoothly, the data were loaded into the new system with a minimum amount of issues, inherent sometimes to such complex projects, the users started to use the new system, everybody seemed to be satisfied, and a few weeks later within the company rumors propagate with the speed of light – “the migrated data are wrong”, “the new system can’t be used” , “IT did a bad job”, “we have to get back to the previous system”, and so on. The panic propagates, a few heads fall, the business tries to revert to the old system but there’s lot of new data available in the new system, and it’s not so trivial to move the data back to the old, in the meantime other rumors appear, and… it’s just a scenario but this could happen to any company if not the appropriate measures were taken at the right time. What could help a company when something like this happens?! A good Plan B aka a good Migration Fallback Plan/Policy, but that’s something nobody would like to do except extreme situations.

    A common approach to any type of projects as well to a DM project is to identify and mitigate the risks before or during the project. That’s something I started to do a few days ago, to prepare a list with the risks associated with DM projects. For this exercise I tried to remember what things went wrong in previous similar projects I worked on and to figure out what else could go wrong. Some online resources helped me to refresh my memory too, and I think I found also two or three things I haven’t really thought about. My attempt was primarily focused on this type of problem mentioned above – minimizing the risks of not having the right data when the new system goes live. Before jumping into the thematic I would like to sketch the bigger picture, as I perceive it.

    Having the right data when a system goes live primarily means having good Data Quality (DQ) in the target system after the data were migrated! As a DM is the best exemplification of the GIGO (Garbage-In Garbage-Out) principle, in order to have good DQ in the target is important to handle DQ latest during the DM project. That’s essential and common sense – you can’t expect to have good data in the new system when there’s lot of garbage in the old. So, a DM and a Risk Management for such a project should be built around this. In fact not having a DQ initiative or project in a DM project is one of the most important risks a company can take. Maybe in small DM, a DQ initiative isn’t necessary, though when the data are important for your company, DQ is a must! In addition DQ assessments have to be performed in alignment with the new system, and not the old. Even if the data have good quality into the old system, the quality of your data after DM will be judged in corroboration with the new system. This is a requirement that can be easily overlooked and its implications misunderstood!

    Many think that DQ is one time activity, we do it for a DM project and we’ll have quality data and never have to care about their quality anymore. Totally false! DQ has to be part of a broader strategy, call it Data Governance, Master Data Management, Data Management or any other initiative in which data plays an important role. DQ is an on going, iterative and consolidated effort, it doesn’t end after DM but continues for the whole data life-cycle, as long the data have value for an organization. It doesn’t help if the data have high quality when the data are migrated and a few weeks or months later the overall quality and trust in data decreased considerably.

    Keeping an acceptable level of DQ must be an organization’s strategy, and must be built a culture toward DQ. People need to be aware of the importance of having good quality data, and especially the consequences of having bad quality data. DQ doesn’t concern only the owners or stewards of data, or the people working with data, it concerns the whole organization because decisions are made based on those data, processes are changed and improved, an organization’s performance is often judged based on data. The quality of data is a matter of perception, on how users see the quality of data in corroboration with the needs they have, and the needs change over time. Primarily being aware what good DQ means and which are an organization’s needs in respect to data, it’s also a way of minimizing the negative perception of data, of gaining trust in data and some solid basis on which decisions can be made. Secondarily, these organizational data needs need to be addressed in a DM, they are the success factors upon which the success of a DM project is judged.

    For sure considerable costs are associated with DQ initiatives and everything related to data which doesn’t always represent a direct cost component in the products or services handled by an organization. Considering that not all data have the same importance for an organization, it makes sense to prioritize the DQ effort as a whole and the data cleaning needs in particular, the focus should be the data with the highest impact and with time to tackle data with lower and lower impact. It must be found equilibrium between the DQ costs and the value of data. Most probably is important to spend resources on raising people’s awareness in respect to DQ early rather than cleaning retroactively data later. It also make sense to invest in tools that help to clean data using automated or semi-automated methods, though some manual/visual control need to be in place too.

    DQ and the way the problems associated with it are tackled depends more on an organization’s internal kitchen – people, partners, organization, strategy, maturity, culture, geography, infrastructure, methodologies used, etc. What it matters is how the various negative and important aspects of an organization are aligned in order to take advantage of one of the most important assets an organization has is its data! For this is important to adopt methodologies that support DQ, align them and tweak them as requested, in order to make most of your data! But before or while doing that remember that a DM is an organization’s opportunity to change the quality of its data and its data strategy!

Thursday, October 04, 2012

Business Rules – An Introduction

    "Business rules" seems to be a recurring theme these days – developers, DBAs, architects, business analysts, IT and non-IT professionals talk about the necessity to enforce them in data and semantic models, information systems, processes, departments or whole organizations. They seem to affect the important layers of an organization. In fact the same business rule can affect multiple levels either directly, or indirectly through the hierarchical or networked structure of causality it belongs to. When considered all the business rules, the overall picture can become very complex. The fact that there are multiple levels of interconnected layers, with applications and implications at macro or micro level, makes the complexity to fight back because in order to solve business-specific problems often you have to go at least one level above the level where the problems were defined, or to simplify the problems to a level of detail that allows to tackled.

    The Business Rules Group defines a business rule as "a statement that defines or constrains some aspect of the business" [1], definition which seems to be closer to the vocabulary of IT people. Ronald G. Ross, in his book Principles of the Business Rule Approach, defines it as "a directive intended to influence or guide business behavior" [2], definition closer to the vocabulary of HR people. In fact the two definitions are kind of similar, highlighting the constrictor or guiding role of business rules. They raise also an important question – can everything that is catalogued as constraint or guidelines considered as a business rule? In theory yes, practically there are constraints and guidelines that have different impact on the business, so depending on context they need to be considered or not. What to consider is itself an art, which adds up to the art of problem solving.

    Besides identification, neither the definition nor management of business rules seems easy tasks. R.G. Ross considers that business rules need to be written and made explicit, expressed in plain language, independent of procedures and workflows, built on facts, motivated by identifiable and important business factors, accessible to authorized parties, specific, single sourced, managed, specified by those people who have relevant knowledge, and they should guide or influence behavior in desired ways [2]. This summarizes the various aspects that need to be considered when defining and managing business rules. Many organization seems to be challenged by this, and it can be challenging when lacks business management maturity.

    Many business rules exist already in functional and technical specifications written for the various software products built on request, in documentation of purchases software, in processes, procedures, standards, internal defined and external enforced policies, in the daily activities and knowledge exchanged or hold by workers. Sure, the formulations existing in such resources need to be enhanced and aggregated in order to be brought at the status of business rule. And here comes the difficulty, as iterative work needs to be performed in order to bring them to the level indicated by R.G Ross. For sure Ross’ specifications are idealistic, though they offer a “framework” for defining business rules. In what concerns their management, there is a lot to be done within an organization, as this aspect needs to be integrated with other activities and strategies existing in an organization.

    Often, when an important initiative, better said project, starts within an organization, then is felt in particular the lack of up-front defined and understood business rules. Such events trigger the identification and elicitation of business rules; they are addressed in documentation and remain buried in there. It is also true that it’s difficult to build a business case for further processing of business rules. An argument could be the costs associated from decisional mistakes taken by not knowing the existing rules, though that’s something difficult to quantify and make visible in an organization. In the end, most probably an organization will recognize the value of business rules when it reached a certain level of maturity.

[1] Business Rules Group (2000) Defining Business Rules - What Are They Really? [Online] Available from:
[2] Ronald G. Ross (2003) Principles of the Business Rule Approach. Addison Wesley. ISBN: 0-201-78893-4.

Saturday, June 30, 2012

Data Migration – An Introduction


    Basically, Data Migration is the movement of data from one IS (Information System), the legacy system, to a new IS, the target system, supposed to replace entirely or partially the legacy system. In the best scenario there are no differences between the two IS or the differences are minimal, negligible. In the worst scenario, there are multiple legacy systems used as source, and even multiple target systems, with important differences between them, differences that can even be translated in incompatibilities at multiple levels. Such architectures can span geographies, departments, organizations or industries; can involve a multitude of vendors, generations of systems, network types, different regulations, etc. In many Data Migrations the overall picture can be really complex, though for the sake of simplicity it’s enough to focus on the simplest scenario in which there is a single source and a single target system, with some differences between them. Abstraction can be made also of the fact that many migrations are parts of bigger projects, for example ERP implementations or any other type of applications migrations.

    Data Migration is quite a complex topic, for many appearing like a black box in which data come in and data come out. That’s valid for the typical user as well for the IT professionals who haven’t been involved in Data Migration projects. There are many books on topics that are tangent to Data Migration – Data Management, Data Quality, Data Integration or Data Warehousing. Excepting some presentations available on the Web, a few methodologies exposed by important companies, one or two books, and a few blogs, there isn’t much material available on Data Migration. The “trend” is also a reflection of the low importance given to Data Migration as subject, even if many professionals working in the field warn about the considerable impact a Data Migration can have on a project in particular, and on business in general.

    Approaching a topic like Data Migration can be, upon case, a complex task, however with a little intuition and some guidance its complexity falls apart. Often, when exploring such a topic, of help can be the 5W1H technique or its extended forms. The technique resumes to searching for answers to the “what”, “where”, “why”, “how”, “when”, “who” and “with what” questions. In case of Data Migration the questions are formulated as: what (data) to migrate, where to migrate, why to migrate, how to migrate, when to migrate, who migrates and with what to migrate?

Why to migrate?

    A Data Migration occurs as follow up of a need – an old system exists in place and can’t cope anymore with business’ growth, a company made an acquisition and the systems need to be consolidated, or the organization decided to change its infrastructure, the processes, the business model in order address nowadays business requirements like flexibility, availability, manageability, automation, cost cuts, etc. In other words a Data Migration occurs as a need for change, and it can be itself a change in what concerns technical infrastructure, process, procedures, data flow, ways of doing business. A migration has quite an impact on the business, so here is an entitled question: does it really makes sense to migrate? Why not start from 0 with the new system?!

    The migration can be a 0 point for an organization, though unless a company is starting anew, there are some data laying there in the old system(s) that need to be further available - for example open Purchase Orders that need to be fulfilled, Invoices that need to be paid, a catalog with all the Products and the available stock, information about Customers, what they bought, what they browsed or what they want to buy for Christmas, etc. At least some of the data need to be made available in one form and another also within the new architecture, if not the new system.

    The availability of old data can be solved by keeping the old system(s) in place, functional, even if the system won’t be fed with new data, or maybe it will. Keeping a system alive involves additional costs for maintaining the infrastructure – software and hardware licenses, consultants, administrators and other people responsible for the optimal work of such a system. This can become with time quite an unnecessary burden. It can be an acceptable choice for some organizations, but unlikely as best/good practice. And even if the system is kept, more likely there will be data that need to be available also in the new system. Can be discussed also about integration of the two systems, but again, does it make sense? The bottom line is that in multiple scenarios a Data Migration can prove to be the optimal solution for an organization.

What data to migrate?

    Even if it looks like a silly question, it can be one of most complex questions to answer. In theory is needed to migrate all the data, but are really needed all the data? Typically in a database can be found historical data not used anymore by the business, obsolete data marked or not for deletion, garbage data entered by mistake or remained after incomplete deletions, all these having low or no value for the business. Hopefully there are also “good data”, quintessential for the business. Somebody would say “what a hack, why do we need to philosophize so much, let’s migrate all the data!”. The decision can be understandable, though what if the percentage of “good data” is quite small in comparison with the total volume of data which can measure a few terabytes?! Sure, nowadays data centers can handle without problems terabytes of data, though there are some factors to be considered – it can be quite a challenge to migrate so many data, the volume of data affects also the performance of databases in particular, and IS in general, and a more natural reason – why store something that has minimal value for you?!

    It makes sense to migrate only the data that have value for an organization, but what data are needed then? Normally this starts by understanding what entities the business deals with and which are the attributes that characterizes them. Many of the entities can be met in organization’s daily activity, and maybe are already defined in organization’s glossary or Data Dictionary, so a review of the available inventory might do. If not, more effort needs to be spent for this purpose; activities specific to Data Discovery, Data Categorization, Data Definition or Data Profiling tasks can help after case to fill the understanding gaps. Except categorization the others are not all necessary, same as the analysis can be deep enough to serve the purpose.

    A first categorization was made above when data were considered as valuable, not valuable or in between. A second categorization can be made based on data’s usage: obsolete (not used anymore or marked for deletion), new (not used and recently entered), historical (data used in the past) and actual (data in use). A third categorization can be made on the status of the entities they represent, status that can be associated to the phase of the process the entity represent (e.g. active, inactive, open, invoices, closed, blocked, etc.). There can be considered other meaningful categorizations as long they prove to be important in identifying the useful data.

    An important categorization in migrations, in particular, and Data Management, in general, is to split data in master data, transaction data and setup data. Master data are data are data that change only seldom and have a long life (until become obsolete), are referenced through all the system, and are vital to an organization through their meaning (e.g. Customers, Suppliers, Products, Assets, Employees, Accounts, etc.). Transaction data in exchange are data that change often and have a relatively short life, typically are referenced by other transactions and can be associated with documents or movements through the system (e.g. Purchase Orders, Sales Orders, Invoices, Receipts, Assets Movements, etc.). Setup data are data used to configure a system (e.g. Transaction Types, Document Types, Roles, Permissions, etc.). This categorization deserves the full attention, because each of the three elements needs a different handling approach in migration or Data Management.

    Based on the identified categories can be considered some rough migration rules in deciding what data (actually records) to migrate, for example: - master data, unless they become obsolete, and open transactions are often considered to be migrated entirely; - historical transaction data spanning a few years back can be migrated in case they are needed in the process; - master data referenced by transaction data migrated need to be migrated too - setup data are entered manually - historical data are archived. There can be also exceptions from the rules, so such possible scenarios need to be considered too.

    Each entity is defined by multiple attributes (also called properties, dimensions). They need to go through a similar “categorization” process. In deciding what attributes to migrate is important to consider especially their role in defining the entity. Some of them define uniquely an entity (e.g. Customer Number, Product Number, Serial Number), physical characteristics of the entity (e.g. color, weight, height), categorize the entity (e.g. Category, Type) or its status (e.g. Active, Blocked, Invoiced), imply various events (e.g. Creation Date, Delivery Date, Invoice Date), and so on. It looks like another type of categorization, and it is, though it’s more difficult to create some rough rules based on it, because in the end the business dictates which Attributes are needed. In fact, most of the Attributes used (with distinct not null values) in the legacy system are more likely needed also in the new system, unless the process changed considerably, or the business is supposed to change also its model.

Where to migrate the data?

    When the Data Migration subject is brought on the table, a decision was already made about the target system. So the “where” question is partially answered, however it addresses only the peak of the iceberg. It shows that an iceberg lies there, in front of us, though under the deep of the waters there is something more, lot of questions and issues that need to be addressed. Like the source, the target needs to be further detailed in entities and their attributes; the targeted processes and procedures need to be considered together with the constraints imposed by the new system. It’s actually needed to identify the data requirements for the new systems and corroborate them with the requirements of the old system. Mapping the entities and attributes available in the two systems, process known as Data Mapping, can offer a good overview of what lays ahead, what similarities and gaps exist. There will be attributes that are available in the legacy but not in the target system, and therefore the target system needs to be extended or the data associated with the respective attributes can be left out. From the opposed perspective, there can be mandatory attributes in the target system which are not available in the organization, and therefore the associated data must be collected and/or made available for the migration. There can be cases when the data are not available in the legacy system but distributed in various other external or internal sources, so there can be an option to migrate or integrate the respective data, extend the processes to accommodate such scenarios, etc.

    Only when the mapping of data is ready and the various related questions addressed, the “where” question is fully answered. Given the continuous changes done to the target system that may still happen a few days before Go Live, Data Mapping can remain a hot topic until then.

With what to migrate?

    This question addresses the mix of tools used to migrate the data, and by extension the whole architecture developed for this purpose. As many experts point out, there is no general solution for such an approach because each migration is challenged by different requirements and architectures. ETL (Extract, Transform, Load) and Data Integration tools were mainly designed for this kind of purposes – moving data between data sources – therefore more likely the whole Data Migration architecture will be built around such a tool. In addition is needed to be addressed topics like assessment and reporting of Data Quality, Data Cleaning, Data Enrichment, Data Backup or Data Security. They will technically ensure that the data are migrated within intended level of quality and security.

    For each of these topics are available one or more tools on the market. The challenge is to find the right mixture for the overall architecture, to make them work together in an efficient and effective manner. One of the problems such tools have is that they look to the Data Migration or similar problems from their own perspective, making them hard to integrate with other tools. Given the increasing need for Data Migration, more likely exist there tools that cover most of its requirements, each with its own advantages and disadvantages. Starting with a new tool can prove to be quite challenge in itself. Many recommend following a methodology and using tools that already proved their capabilities in other projects. That’s a good approach, though need to be considered also costs, available resources, effort to build the infrastructure, the learning curve, etc. For some migrations MS Excel or Access will do, for others a more complex framework is needed. Keep in mind that there is no perfect architecture, just the architecture that will drive you to achieve your targets.

How to migrate the data?

    “How” refers mainly to the migration approach, steps, methodologies, processes and procedures used to migrate the data. Secondly, and not less important, it refers to how the mix of tools is used for migration – in other words the implementation. Despite the huge variety of tools and means of achieving the target, there can be depicted some generalities for each of these topics.

    Migration approach refers to the overall strategy considered for a migration – typically on whether the data are migrated all together, the new system becoming functional and replacing the legacy system (the big-bang migration), or the data are migrated in phases, the legacy and target systems functioning in parallel for a certain amount of time (the phased-out migration). Can be met other variations of migration approaches, under various denominations. It’s important to know the advantages and disadvantages of both or all approaches, especially in what concerns their application in your organization.

    “Steps” is just a misnomer for the actual Project Plan in which are considered the different phases and activities of such a project. In a general Data Migration project, can be discussed about Data Discovery, Data Definition, Data Collection, Data Consolidation, Data Mapping, Data Conversion, Data Transformation, Data Quality Assessment, Data Cleaning, Data Storage, etc. Some of these steps can be considered as standalone processes, sometimes being already part of the processes’ landscape existing in an organization. Other steps are just simple activities. Both types of steps share some important characteristics – they can be highly iterative and complex, are owned by the business, the IT functioning as facilitator, each of them depends on the input from other steps, and require continuous feedback, etc.

    A Data Migration is (should be) managed as any other IT project, and therefore can be discussed about project-specific methodologies like PMBOK, Prince2 or PRISM. Many of the before mentioned steps come with their luggage of methodologies too. In addition, considering that IT functions as a service, could be considered service-specific methodologies like ITIL, ISO/IEC, Six Sigma, etc.

    The actual implementation of all these depends entirely on the project’s scope, the knowledge of all those involved, the constraints met and the resources available for such a project. Many of the IT-specific problems and situations are specific across all IT projects.

Who will migrate the data?

    There is no Data Migration project that can be done without the business, the de facto owner of such a project and its output. There is lot of input needed from the business, its continuous involvement through the various stages is necessary for the whole duration. Unless the Data Migration resumes to a rudimentary tool like Excel and can be handled without too much expertise, a Data Migration needs technical resources that can elicit the requirements, translate them in technical requirements, built the infrastructure and maybe migrate the data. It entirely depends on the overall architecture and methodology what people are involved. In the best case scenario the migration will resume to one person pushing a button and the data flow as magic from source to the target system. In reality, multiple people will have to take care of migration, pushing some magic buttons in a chain of parallel and even redundant steps, monitoring and validating the process. Data owners, data stewards, data custodians, data architects, database administrators, migration and quality assurance specialists, developers, consultants and many other people can be involved, each of them playing their role.

When to migrate the data?

    Intuitively, data are or should be migrated when the target system is ready to receive the new data, thus when the development was finished, the system tested, and all the preparation for Data Migration were made. The statement is valid for any type of migration. How such a date or dates are calculated when a project starts is in itself kind of science or just a matter of needs. There are projects in which the dates for each milestone or phase are calculated back from a desired Go Live date, or projects in which the Go Live is calculated incrementally based on the steps to be performed. For dates’ calculation can be used also benchmarking from the field. The bottom line is that the data must be migrated on time for the Go Live and with a minimum disruption for the business.


    Whether standalone or as subproject of another project, a Data Migration can be or become quite a complex thematic that, through its outcomes, affects the business considerably. In the above paragraphs were considered some of the important aspects of such a project, the focus being more on figuring out what a migration implies rather than a detailed exploration. It’s also a mental exercise and an invitation into the thematic.

Sunday, June 03, 2012

Data Migration – What is Data Migration?

    If you are working in a data-centric business it’s almost impossible for the average worker not to have heard this term, even tangentially. Considering the meaning of “migration” - the act or process of moving from one place to another - the intuition might even tell what data migration is about: the process of moving data from one place to another. It’s pretty basic, isn’t it? Now as data are moved over and over again between various places, for example the various layers of an applications, between databases, between media storage devices, and so on, we need some precision in defining the term because not all these can be considered as data migration examples. Actually we can talk about data copying or data movement without speaking of data migration. So, what is data migration? Here are a few takes on defining data migration:

    “process of transferring data from one platform or operating system to another” (Babylon)

   "Data migration is the process of transferring data between storage types, formats, or computer systems." (Wikipedia)

    "Data migration is the movement of legacy data to new media and technologies as the older ones are displaced." (Toolbox)

    “The purpose of data migration is to transfer existing data to the new environment.” (Talend)

    “Data Migration is the process of moving data from one or more sources into a target application” (Utopia Inc.)

    “[…] is the one off selection, preparation and transportation of appropriate data, of the right quality, to the right place, at the right time.(J. Morris)

    Resuming the above definitions, data migration can be defined as “the process of selecting, assessing, converting, preparing, validating and moving data from one or more information systems to another system”. The definition isn’t at all perfect, first of all because some of the terms need further explanation, secondly because any of the steps may be skip or other important steps can be identified in the process, and thirdly because further clarifications are needed. Anyway, it offers some precision, and at least for this reason, could be preferred to the above definitions.

    So, resuming, data migration supposes the movement of data from one or more information systems, referred as source systems, to another one, the target system. Typically the new system replaces the old systems, they being retired, or they can continue to be used with reduced scope, for example for reporting purposes or . Even if performed in stages, the movement is typically one time activity, so everything has to be perfect. That’s the purpose of the other steps – to minimize the risks of something going wrong. The choice of steps and their complexity depends on the type of information systems involved, on the degree of resemblance between source and target, business needs, etc.

    As mentioned above, not everything that involves data movement can be considered as data migration. For example data integration involves the movement and combination of data from various information systems in order to provide a unified view. Data synchronization involves the movement of data in order to reflect the changes of data in one information system into another, when data from the two systems need to be consistent. Data mirroring involves the synchronization of data, though it involves an exact copy of the data, the mirroring occurring continuously in real time. Data backup involves the movement/copy of data at a given point in time for eventual restore in case of data loss. Data transfer refers to the movement of row data between the layers of information systems. To make things even fuzzier, these types of data movements can be considered in a data migration too, as data need to be locally integrated, synchronized, transferred, mirrored or back up. Data migration is overall a complex thematic.