AboutServicesWork Generative AIClientsLeadership InsightsCareersContact

Anchored Engagement · Case Study

Catalog Intelligence Platform

Industry

B2B Marketplace · Catalog AI

Scope

AI Engineering · Data Quality · Pipeline

Stack

Hybrid Classification · Embedding Search · LLM Ensemble

Background

This case study describes a catalog intelligence capability anchored to TrueLeaf Tech's engagement with a major B2B marketplace operator in India. The client connects millions of buyers and suppliers across an enormous range of product categories, and the integrity of their catalog is, in a real sense, the integrity of their entire business.

Anchored to a confidential client engagement. The B2B marketplace operator whose work this case study draws from operates at a scale that makes catalog management one of the most consequential engineering problems in the business. Specific implementation details have been generalised to honour the engagement's confidentiality.

The problem we were solving

B2B marketplaces face a particular flavour of catalog challenge. Supplier-provided listings are inconsistent in quality. Product naming conventions vary widely. Categorisation is imprecise. Duplicates proliferate. Search relevance suffers, buyer experience degrades, and the marketplace's ability to make accurate matches between buyers and suppliers — the central function of the business — erodes over time.

The traditional approach to this problem is heavy on manual review and rule-based cleanup. This works at small scale and breaks down completely at the catalog sizes typical of large B2B marketplaces. The brief was to build an AI-augmented catalog intelligence layer that could classify, normalise, deduplicate, and enrich listings at the rate they were being added, with quality high enough that downstream search and matching genuinely improved.

What we built

The classification and normalisation layer

Every new listing enters the system as raw, supplier-provided text and attribute data. The first stage of the pipeline classifies it into the marketplace's taxonomy — often several levels deep — and normalises its attributes against a canonical schema for that category. This is harder than it sounds because the canonical schema for "industrial pumps" looks nothing like the canonical schema for "office furniture," and the classifier needs to handle both correctly.

We built the classification layer as a hybrid system: a fast rule-based pre-pass that handles the easy cases, an embedding-based similarity pass that handles the medium cases, and an LLM-based deep classifier that handles the genuinely ambiguous cases. Each pass is faster and cheaper than the next, so the system applies them in order and only escalates when necessary. The disciplines behind this approach are described in more detail in our writing on retrieval pipelines.

The deduplication layer

B2B marketplaces accumulate duplicates faster than almost any other type of catalog. The same product is listed by multiple suppliers, often with slightly different names, slightly different specifications, and meaningfully different prices. The deduplication challenge is to recognise that these listings are the same product without collapsing distinct products that happen to share many attributes.

The architecture uses a multi-signal matching approach: text similarity, attribute similarity, image similarity, and seller behaviour patterns all contribute to a confidence score. High-confidence matches are auto-deduplicated. Low-confidence candidates flow into a human review queue. The model is calibrated continuously against the outcomes of human review, so the system gets sharper over time.

The enrichment layer

Beyond classification and deduplication, the system enriches listings with structured data that the supplier did not originally provide. Product specifications are inferred from descriptions. Use-case keywords are derived from category context. Compatible-with relationships are predicted from product taxonomy. Each enrichment is tagged with its confidence and source, so downstream systems can decide how much weight to give it.

The catalog is not what suppliers send you. It is what your platform extracts from what suppliers send you. The extraction is the product.

Engineering trade-offs we made

Precision versus recall in deduplication

The deduplication system has to balance two opposing failure modes. If it is too aggressive, it merges distinct products and degrades search results for both. If it is too conservative, it leaves obvious duplicates in the catalog. We tuned for precision — fewer false merges, more false misses — because the cost of an incorrect merge is much higher than the cost of a missed duplicate, and the missed duplicates can be caught on the next pass.

Real-time enrichment versus batch enrichment

An obvious design question was whether enrichment should happen at listing creation time or in periodic batches. Real-time enrichment has better data-freshness properties but more variable latency at the listing creation flow. Batch enrichment is operationally simpler but creates a window during which listings are visibly under-enriched. We chose a hybrid: a fast first-pass enrichment at creation time, followed by a more comprehensive batch enrichment within a few hours.

Single model versus ensemble

For the highest-stakes classifications, we use an ensemble of models rather than a single one, with explicit handling for disagreement cases. The ensemble approach is more expensive and more complex, but it produces more reliable outputs and — crucially — clearer signals about when the system is uncertain. The uncertainty signal is what makes the human review queue useful.

What we learned

How this informs our client work

The patterns described here — tiered classification, multi-signal matching, calibrated enrichment, and confidence-rated outputs — apply to any large-scale data quality problem. We have used variations on these patterns for retail catalog management, supplier data integration, and similar workloads.

If you are building or operating a system where data quality at scale is the constraint, get in touch. The engineering patterns travel well across industries, and the underlying disciplines — particularly the discipline of treating confidence as a first-class output — are widely applicable.

Work with us

Have a similar challenge in front of you?

If something in this case study resonates with what you're trying to build — or if you'd like to talk through a related problem — we'd be glad to spend a half-hour helping you think it through.

Start a conversation →

Frequently asked questions

How accurate is the classification layer?

Classification accuracy varies by category complexity but is consistently high in production. The hybrid approach — rules for easy cases, embeddings for medium cases, LLMs for ambiguous cases — produces accuracy that exceeds single-model approaches at a substantially lower cost per listing. Specific accuracy numbers depend on the catalog and the calibration of the human review loop.

Can the deduplication system be tuned for different precision-recall trade-offs?

Yes. The matching system produces a confidence score per candidate pair, and the threshold for auto-merge can be tuned per category to match the business's risk tolerance. Categories where false merges are particularly costly can be tuned more conservatively; categories where missed duplicates are particularly visible can be tuned more aggressively.

How does the system handle catalog drift over time?

The human review queue is the primary mechanism for handling drift. New product types, new supplier behaviours, and new edge cases surface through low-confidence cases that go to human review. The outcomes of human review feed back into the models, keeping the system calibrated as the underlying catalog evolves.

Can this approach be applied to non-marketplace catalogs?

Yes. The underlying patterns apply to any large-scale catalog problem — retail product management, supplier data integration, content classification, document categorisation. The specific tuning differs, but the architecture and the engineering disciplines transfer well across catalog types.

Related work

More from the TrueLeaf Tech engineering portfolio.

Let's build

Have an ambitious idea? We'd love to hear it.

Whether you're testing a hypothesis or scaling an established product, we'd be glad to spend a half-hour helping you think through the next step.