Kaavio Musings

From Manual Enrichment to Market-Ready Content

Written by Derek Gregg | 2026

Customer Story

One of Canada's largest integrated distributors of plumbing, HVAC, and waterworks products for the construction industry

600k+

SKUs

2,000+

Suppliers

300+

Locations

A mission-critical process that couldn't scale

As customer expectations have shifted towards self-service, speed, and accuracy, our customer has invested in digital infrastructure and experiences. But they found themselves constrained by a problem that nearly every large technical distributor eventually hits: product data had become a bottleneck for growth.

They invested in improving product data through a variety of strategies: supplier engagement, data syndication, data brokers, software, and more. They paired these strategies with a content team that is experienced, thoughtful, and deeply familiar with their categories, but progress remained slow.

"A large portion of content comes from manual effort"
— Senior Product Analyst

The costs were not abstract, as catalog enrichment relied on spreadsheets, manual resources, and outsourced web scraping.

Poor product content doesn't just affect e-commerce; it limited their ability to enable their counter sales and showroom teams, inside sales, quoting, and more.

"The ecommerce system is downstream from our PIM, but the PIM also feeds other systems. All of that depends on enriched data. Product data has really become the lifeblood of the company."
— IT Consultant

Product content is a workflow problem, not a one-time data project

As they evaluated their options, one realization became unavoidable: product content was not a data acquisition problem. It was a human workflow problem.

Each product requires interpretation. Teammates had to research the product, determine which attributes mattered, normalize values, and resolve ambiguity. Repeating that work across hundreds of thousands of SKUs, suppliers, and product categories made the process deeply time-consuming and difficult to scale through manual effort alone.

The core challenge was scale. They needed to dramatically increase the amount and quality of product content they produced, without simply hiring a much larger team or forcing suppliers to conform to yet another rigid data standard.

"A lot of this has been done by hand with best effort. We need something that can understand which attributes matter for which products, and do that at scale."
— IT Consultant

To make this workflow viable long-term, they identified a set of concrete requirements:

  • Increase output 2-3x without adding headcount. The existing enrichment team was already operating at capacity. Any approach that scaled linearly with people was not viable.
  • Normalize attributes across heterogeneous categories. The system needed to handle category-specific complexity without requiring custom configuration or manual setup for each new product type.
  • Operate effectively on poor, partial, and unstructured inputs. Many suppliers lacked clean feeds, consistent schemas, or modern digital catalogs. The system needed to work directly with PDFs, fragmented web content, and incomplete internal records, without waiting for supplier compliance.
  • Produce high-confidence structured data usable across the business. Enriched data needed to support not just e-commerce, but search and filtering, pricing logic, internal sales and showroom tools, and emerging automation initiatives such as email-to-order workflows.

Taken together, these requirements ruled out incremental fixes. Hiring more analysts, relying more heavily on suppliers, or purchasing another static data source would not change the underlying constraint.

What they needed was a fundamentally different approach to product content, one that could scale human judgment without scaling human effort.

Building an AI-native, learning system for managing product content

Rather than treating product content as something to outsource or "turn on," we worked together to design a workflow that reflected how their catalog actually functioned, and how it needed to scale.

The collaboration focused on turning messy, fragmented inputs into trusted product understanding, without disrupting the customer's existing systems or forcing suppliers to change behavior.

Product truth lives across many sources

The customer brought deep domain knowledge about how their products were documented in the real world. Kaavio brought tooling to work with that reality at scale.

Together, they accepted a critical constraint: for many technical products, the most accurate source of truth is still a PDF spec sheet, not a clean data feed.

The workflow intentionally combined:

  • Supplier-provided structured data (when available)
  • Manufacturer websites
  • PDF catalogs and technical documentation
  • The customer's existing internal product records

Rather than declaring a single "golden source," the system evaluated all sources together, allowing facts to emerge through synthesis.

Use the customer's existing catalog as the learning backbone

A key decision was to start from the customer's catalog as it existed, rather than asking the team to define rules.

With access to the full catalog, Kaavio's system could:

  • Detect which attributes were already populated for different product categories
  • Learn patterns from the customer's prior enrichment work
  • Reinforce consistency based on what had historically worked inside the customer organization

"With the full catalog, we can use the customer's own data, which massively improves consistency and data quality."
— Stephen Perkins, Lead Engineer at Kaavio

This made their past effort an asset.

Create a shared review and deployment loop

Rather than handing off raw output, the teams designed a review-and-approval loop that fit the customer's operational reality.

  • Kaavio handled large-scale research and synthesis, and flagged content that needed additional review.
  • Their team reviewed changes in batches, focusing attention where judgment mattered.

This ensured:

  • Human accountability without human bottlenecks
  • Confidence in data used across the organization
  • A workflow that could run continuously, not episodically

From manual enrichment to market-ready content at scale

While the initiative is still in early stages, the customer team has already seen clear, qualitative shifts in how product content work gets done, and what is now possible at scale.

Early impact on workflow and team leverage

Before Kaavio, enrichment work required analysts to manually locate, interpret, and enter product information, often repeating research that already existed somewhere else.

In early reviews of Kaavio-generated output, the customer team saw a dramatic change in how much work was required before human review even began.

"When we were reviewing the sheets...the amount of data we didn't have to go and get, and that was clean, correct and in the sheet — astronomical. So have you saved us time? Yes. You've moved us over into a quality check position instead of a doing position."
— Senior Product Analyst

This marked a fundamental shift in the role of the customer's product data team:

  • Less time spent searching for information
  • More time spent validating, correcting, and applying judgment
  • A workflow designed to scale without linear headcount growth

Early signals on data quality and consistency

Another early indicator came from data accuracy and edge cases. As they pushed the system with difficult products and incomplete inputs, the team saw fewer breakdowns than expected.

"We are trying very hard to stump you, and we have not yet been able to do that."
— Product & Pricing Department Manager

This reinforced a key insight from the collaboration: using the customer's existing catalog as context significantly improved consistency and reduced rework, even across complex and heterogeneous categories.

A foundation for broader impact

Although e-commerce was the initial driver, early results confirmed that improvements to product content flow through the organization.

Because enriched data feeds the customer's PIM and downstream systems, the team now sees a clear path toward:

  • Increasing e-commerce's contribution to revenue
  • Improving SEO and product discoverability
  • Expanding automation across sales and operations
  • Supporting deeper, category-specific attributes without adding operational complexity
  • Scaling enrichment beyond the initial vendor set

As they plan for future growth, product data is no longer viewed as the primary constraint. It has become a strategic asset, supported by a workflow that can evolve over time.