Data Lineage

Lead Designer • 2021 - present • IBM

Trust your data with visibility into your data pipeline

Overview

Data Lineage is an interactive visualization of your data supply chain that enables enterprise teams to understand how data flows across complex systems — from source to consumption — with confidence and clarity.

Over several years, this work evolved from deprecating a legacy lineage system, to rebuilding a business-first lineage experience, to integrating and eventually unifying Manta’s advanced technical lineage through acquisition. The result is a scalable, automated, and explorable lineage system that supports both business and technical users, improves trust in data, and enables critical workflows such as impact analysis, governance, and compliance.

Impact

Single, interactive interface

Visualize enterprise-wide data in a customizable graph that supports both technical and non-technical users, enabling flexible exploration across complex data ecosystems.

Integrated and enriched with business context

Understand your data through business metadata embedded directly in the lineage graph, and quickly access lineage from anywhere via platform integrations.

Wide spectrum of users and use cases

Leverage lineage across a wide range of users — from data providers to data consumers — and support use cases such as impact analysis, root cause analysis, and compliance.

Automated lineage scanning

Covering more than 75 technology scanners, automated lineage scanning continuously ingests metadata from multiple data sources to deliver comprehensive, up-to-date lineage.

Delivery

GA release

IBM Cloud Pak for Data 5.1 on Saas and on premise

Recognition

OTAA 2024

Outstanding Technical Achievement Award for Data Lineage

D&UX review score

B- (Good)

Highest scoring sections in usability, onboarding, and use (2024)

800 hours of manual effort reduced to just 7 hours for cloud dependency mapping

Large North American Bank

Audit reporting cycles shortened from weeks to less than a day

Large North American Bank

Saved over $2 million by eliminating the need to hire 35+ professionals

Leading healthcare company

Context

Problem

Relying on data without clear visibility into its origins and transformations exposes organizations to unpredictable risk, costly errors, and compliance failures.

Legacy lineage tools attempted to increase transparency into data flows but consistently fell short. They produced dense, uncontrolled visualizations that were difficult to interpret, performed poorly at enterprise scale, and were highly technical, making them inaccessible to business users who needed them for compliance reporting and data finding.

At the same time, organizations were forced to manually assemble lineage information from disparate sources. This labor-intensive process was slow, error-prone, and difficult to maintain, further undermining trust in data and increasing exposure to regulatory and operational risk.

Together, fragmented lineage data and unusable analysis tools left enterprises spending excessive time and effort to achieve basic compliance — while still lacking confidence in how their data was being used.

Users

Data engineers understanding upstream and downstream effects

Data scientists validating transformations and diagnosing issues

Compliance officers tracing data usage for regulatory and AI governance

Data analysts conducting impact analysis and reporting

Data steward understanding at a high level how their systems process and manage data

Use cases

  1. Impact analysis & change management: Data engineers and data stewards understand downstream impact before making changes to data pipelines, schemas, or systems.

  2. Root cause analysis & troubleshooting: Data engineers, data scientists trace issues back to their upstream source, reducing time spent diagnosing data quality and pipeline failures.

  3. Data trust & validation: Data analysts and compliance officers gain visibility into how data was sourced and transformed, to assess whether data is compliant and fit for use.

My role

As the lead product designer, I led the multi-year evolution of data lineage ingestion and visualization, balancing long-term system thinking with pragmatic delivery through organizational and technical change:

  • Led UX design across multiple lineage initiatives over several years

  • Partnered closely with engineering on performance and scalability constraints

  • Collaborated with research to validate mental models and interaction patterns

  • Helped align design decisions across acquisitions

Project stakeholders

  • Product Management

  • Engineering

  • UX Design

  • UX Research

  • Content Design

  • Enterprise clients and users

Method

Research → Concept → Launch (2022) → Acquisition → Concept → Iteration → Launch (2024) → Iteration

2021

Build & Introduce

Began to transition away from the legacy Information Governance Catalog lineage by:

  • Building a new business data lineage in Watson Knowledge Catalog, later renamed IBM Knowledge Catalog (IKC)

  • Introducing Manta, the leading independent data lineage vendor at the time, as an OEM in IKC.

2022

Launch & Integrate

Designed and launched the IKC business data lineage experience — a new, summarized, business user-first lineage model — in June with CPD 4.5.

Integrated Manta further for automated scanning and advanced visualization for technical users.

2023

Enhance & Acquire

Enhanced the IKC lineage experience with support for more lineage metadata.

In December, IBM acquired Manta with the vision to merge the strengths of Manta’s technical lineage with IKC’s business data lineage in a new, unified experience.

2024

Rebuild & Launch

Designed a unified, performant lineage visualization for business and technical users within 10 months: 6 sprints, 2 design milestone reviews, and 1 DUX review.

Launched in October with CPD 5.1, the GA product:

  • Integrated business context from IKC

  • Leveraged automated scanning from Manta

2025

Strengthen & Scale

Scaled Data Lineage for enterprise use by strengthening automated ingestion and deepening analytical capabilities

  • More complete lineage with expanded data source scanner support, agent management, and alias assignment

  • Deeper analysis with column-level lineage and historical lineage

Research and validation

IKC Business Data Lineage

We performed 2 phases of research:

  • Competitive analysis of lineage tools (Project Gemini)

  • Foundational research to understand expectations via 5 sponsor user interviews with data analysts and business users

Our research revealed that legacy lineage visualizations like the IGC lineage were overwhelming, slow to render, difficult to analyze, and inaccessible to business users. What users needed was greater flexibility and customizability.

Flexibility
Users need to move fluidly between levels of detail

Enterprise users shift between high-level and detailed views depending on context, rather than staying at one level.

Performance and scale are table stakes for trust

Slow rendering, static diagrams, and limited scalability undermined confidence in lineage.

“It’s important to have the flexibility to shuffle between different view levels and get a view from them quickly."

Data analyst

Customizability
Lineage must be configurable to match organizational needs

Users expected to be able to adapt lineage views to reflect how their organization defines and reasons about data.

Business users need tailored, simpler views

Research highlighted a strong need to translate technical lineage into business-friendly representations.

“It depends a lot on the context of why you’re launching that view. That’s why it’s important to customize it."

Business analyst / consultant

These insights directly informed the design of the new business lineage experience in IKC which started from high-level summarized views, incorporated business metadata into the lineage graph, and was tightly integrated with the IKC platform.

When shown the new UI, those same sponsor users expressed excitement to start using it.

“This is solid. The evolution is early. We’ll complain once we have our hands on it.”

Sponsor User

Data Lineage (IBM Manta Data Lineage)

We performed 2 phases of research:

  • Secondary research to understand key personas

  • Concept testing to identify core pain points.

Overall, data engineers and data analysts gave the concept a business value rating of 4 / 4. Both user groups found the end-to-end flows — from the landing page into the lineage viewer — easy to comprehend and convenient to navigate.

Additionally, users provided feedback about their expectations by referencing prior experience with other tools and familiar interaction patterns, such as right-click actions. This informed our next design enhancements.

4/4

Business value rating

9

Clients

12

Findings

“This is more interactive than Collibra - When you hover you know and you can easily manage on the side menu. In Collibra, you have to drag and drop and have to do multiple steps."

Data Engineer

Constraints and complexity

  • Enterprise scale and performance with thousands of assets and relationships

  • Heterogeneous data ecosystems of various technology types and environments to support and visualize

  • Automated vs. manual ingestion reliability with gaps that must be identified and reconciled

  • Business and technical mental models ranging widely among users who all engage with the same interface

  • Legacy system deprecation that required tactful transition and parity planning to avoid major disruptions

  • Organizational change and acquisition that required robust alignment across differing visions, design systems, technical constraints, and ways of working

  • Incremental delivery across fast release cycles which felt like sprinting a marathon

  • Regulatory and compliance requirements that require accessing accurate lineage reports across time and viewing additional metadata in the graph

Outcome

  • Successfully designed and delivered lineage through multiple organizational transitions, including OEM integration and acquisition

  • Helped establish a model approach for post-acquisition design integration, later shared across other M&A teams

  • Elevated lineage from a specialist, technical tool to a shared enterprise capability

Delivery

IKC Business Data Lineage launched with CPD 4.5 in June 2022, replacing legacy IGC lineage with a clearer, business-first model

Manta OEM integration enabled early access to advanced technical lineage and accelerated deprecation of legacy tooling

Data Lineage (IBM Manta Data Lineage) launched GA with CPD 5.1, on cloud in October 2024 and on prem in December 2024.

UX Quality

DUX Review (Oct 2024):

  • Overall score: B- (Good)

  • Strengths: Usability, Get started, Use

  • Identified accessibility as the primary improvement area

Recognition

IKC Business Data Lineage

3 international design awards

  • Red Dot

  • iF

  • German Design Award

Data Lineage (IBM Manta Data Lineage)

Outstanding Technical Achievement Award (OTAA) — Nov 2024

"The flow of information… in which the lineage is generated and viewed is making sense to me. I like the different places where we can configure and the user has control at all stages"

Data Engineer