Skip to main content

Introduction

What is data governance?

Citing from this great video:

Making sure your organization's data is in the right condition needed to succeed with business initiatives and business operations

I like that definition because the typical first thought about "data governance" is to associate it with blocking access to data for everyone except a senior database administrator or the architect who happened to be first on the project. But the purpose of data governance as defined above is quite the opposite. Data needs to be available to fuel the business—yet still in a managed (governed) way.

What problems does data governance solve?

Any organization, regardless of size, will quickly face huge problems if anyone in the company can read and write any dataset. Data governance practices are needed to establish a sustainable, business-efficient, and secure balance between data locked in silos and data anarchy on the other end of the spectrum.

Legacy architectures

From my experience, data locking is a typical problem in older parts of the IT landscape based mainly on monolithic but still business-critical systems, with database schemas of thousands of tables and columns with unclear meaning. Locking is not only a technical access problem (long waiting times to get access approved or a connection established) but mostly an inability to interpret that data outside the codebase of complicated legacy systems.

Data governance can help by providing secure ways of working to share data with consumers often running on a modern tech stack, aligning on data semantics and sources of truth, and supporting data quality management.

Modern architectures

Problems with data are not limited to legacy systems. Modern, cloud-based, distributed, heterogeneous architectures have their own challenges. Individual services often optimize data only for their own needs, locking valuable information or processing transient events without capturing durable facts. These environments frequently suffer from data duplication and inconsistent information across services, with no clear source of truth. Defining bounded contexts in evolving systems is difficult, and attempts to synchronize data across services are hard to implement reliably in a distributed setting.

Data governance can help by defining clear ownership, well-defined contracts, and aligned data flows across different services.

Roles

Collaboration between business and IT is crucial to ensure that indeed data is in the right condition to support business initiatives.

Data Steward

  • Business role
  • Has detailed knowledge of what data is needed
  • Evaluates the quality of data
  • Takes action on data quality issues
  • Works daily on projects that need and collect data

Data stewards are found, not made

Typically, there are already people in the business performing these activities.

Data Owner

  • Business role
  • Executive-level, policy-level
  • Has the final say on policies: who can access what
  • Resolves conflicts on definitions: what is a customer, what is revenue, etc.

IT

  • Helps navigate all the systems that produce and consume data
  • Enables Data Stewards with the right tools and technologies for data quality management, master data management, data access control, etc.

Different roles from the IT organization may be involved to support those needs at different stages. In particular, when Data Governance practices are not yet established, ideally there should be a central role (working cross-product, cross-domain), such as Enterprise Architecture, to establish ways of working and define necessary capabilities.

Then, capabilities for data governance would be implemented by platform-level data teams. This may involve establishing generic tools like DataHub or AWS Data Zone.

Individual datasets will be contributed and consumed by relevant teams, according to the enterprise-wide data flow strategy, technologies used, and business-aligned sources of truth for data.

Working backwards

Data Governance should be established starting from business needs and then working backwards to implementing necessary capabilities and datasets.

In practice, it is often the IT organization that first needs to educate the business about the value of better data management and how it can impact the business. It is not always obvious from a business perspective to link problems related to poor operational performance, slow delivery, and insufficient innovation pace with data governance problems.

Especially in the AI era, data is true gold: fuel for innovation and a true competitive advantage. But only if it is available (e.g., not locked in legacy, siloed systems), has good quality (can be understood and provides reliable information), and can be securely processed (following compliance rules and minimizing the risks of data breaches).

Sources