AboutServicesWork Generative AIClientsLeadership InsightsCareersContact

Marketplace Scale · Case Study

Product content intelligence
at marketplace scale

How we built the attribute-intelligence and multilingual localisation pipeline that powers product discovery across India's largest marketplaces.

Clients

Flipkart · Myntra

Scope

Content Ops · Localisation · AI Pipeline

Scale

Tens of millions of SKUs · 8+ Indian languages

Why product content is the hardest infrastructure problem in commerce

Every product on a marketplace has to do four distinct jobs simultaneously — and failing at any one of them costs the business money, rankings, or both. The catalogue at Flipkart and Myntra covers hundreds of categories, tens of millions of active SKUs, and supplier-submitted content ranging from detailed and accurate to sparse and incorrect. Across all of it, every listing has to power faceted search, generate rankable pages, convert browsers into buyers, and increasingly be retrievable by AI-powered shopping agents.

Manual content operations can hold quality at one or two of those jobs. They cannot hold all four simultaneously at the volumes and language requirements of India's largest marketplaces. The brief was to build the pipeline that does.

We don't write product descriptions. We build the attribute intelligence that makes a product findable by a filter, a search engine, a shopper, and an AI.

The four jobs of a product attribute

The insight that shapes everything we built: a product attribute is not a data field. It is a piece of structured intelligence that serves multiple consumers at once — search infrastructure, ranking algorithms, customer interfaces, and generative AI. Understanding which attributes serve which jobs is what separates catalogue intelligence from content production.

01 / Search Filtering

Faceted navigation

Attributes power the filter panels that let a shopper narrow a category in seconds. For fashion, that means fabric, fit, occasion, neckline, sleeve type, colour family, and a dozen more. Missing or wrong attributes make the filter rail useless — a customer filtering for "cotton" never sees a correctly made product filed as "natural fibre."

02 / SEO

Search engine ranking

Structured attribute data builds the rich, differentiated pages that rank for long-tail queries. "Blue cotton kurta for women under ₹800" resolves to a specific listing only if that listing's attributes are complete, accurate, and correctly mapped to the terms shoppers actually use.

03 / Customer Decision

Confidence to purchase

Attributes written to serve a shopper — not just a database — reduce return rates, increase add-to-cart conversion, and lift review scores. Fabric composition and care instructions help a real person decide. Internal SKU codes do not. Executing this distinction at tens of millions of SKUs is the hard part.

04 / AI Discovery

LLM and agent retrievability

This is the job most catalogues fail today. LLMs and shopping agents retrieve and recommend products based on how well-structured and well-described they are. A product with sparse attributes is effectively invisible to agentic commerce. Attribute intelligence built for AI discovery is the forward-looking moat — and most content agencies are not yet selling it.

The challenge at Flipkart and Myntra scale

Flipkart and Myntra together operate one of the densest and most heterogeneous catalogue environments in commerce. Fashion at Myntra requires a fundamentally different attribute schema from electronics at Flipkart. A kurta needs fit, fabric, occasion, neckline, sleeve type, embellishment, care instruction, and target demographic. A laptop needs processor generation, RAM capacity, display resolution, battery life, and operating system. A kitchen appliance needs material, wattage, capacity, certifications, and warranty terms. The pipeline has to model all of them, correctly and simultaneously.

The supplier problem compounds this. Content submitted by vendors ranges from detailed and accurate to bare-minimum and wrong. Fashion brands often submit strong lifestyle photography with weak attribute data. Electronics suppliers often submit strong spec sheets with no lifestyle context. The pipeline had to extract value from both, fill what was missing, and flag what was unresolvable — without routing every listing through a human editor.

Then the language requirement: every product needs to work across India's major languages — Hindi, Bengali, Tamil, Telugu, Kannada, Marathi, Gujarati, and Malayalam — not just as translated copy, but as locally idiomatic content where cultural references in fashion, food, and lifestyle categories differ meaningfully between regions.

8+
Indian languages
<15%
Listings to human review
~$1
Per product at scale
4×+
Throughput vs. manual

What we built

The attribute model — category by category, not one-size-fits-all

The foundation of the system is a structured attribute schema for each product category, built from three inputs: the marketplace's existing taxonomy, analysis of what attributes actually drive search conversions and filtering usage in that category, and the four-jobs test — does this attribute serve search, SEO, customer decisions, and AI retrieval, or is it dead weight?

For fashion — Myntra's core — schemas carry 30–80 attributes per category depending on complexity. A saree schema includes the obvious fields (fabric, colour, occasion) but also the attributes that drive real filtering and discovery: weave type, border style, blouse piece inclusion, regional style (Banarasi, Kanjivaram, Chanderi), and occasion-specific tags that regional customers actually search for. Getting that schema right is not a content problem. It is a product intelligence problem.

The attribute model is also what makes localisation coherent at scale. You cannot translate a vague description into eight accurate regional language versions. You can localise structured attributes into idiomatic regional content, because each attribute has a defined semantic meaning and a clear range of values — and that definition travels across languages in a way that free-form prose does not.

The content generation and extraction pipeline

01
Ingest and normalise

Supplier-submitted content — images, descriptions, spec sheets, and raw attribute data — enters the pipeline and is normalised against the category schema. Values are standardised, synonyms resolved, obvious errors flagged immediately.

02
Attribute extraction

The system extracts all required attributes from the normalised input in layers: rule-based pattern matching for structured fields, embedding-similarity matching for semi-structured fields, and LLM extraction for unstructured prose and image analysis. Each extracted attribute carries a confidence score.

03
Generative fill

Attributes not extracted with sufficient confidence are filled by the LLM generation layer. The generator receives the category schema, the extracted attributes, the product images, and examples of correctly completed listings in the same category. It generates candidate values with reasoning; the confidence model scores them; low-confidence fills are queued for human review.

04
Multilingual localisation

Completed attribute sets are localised across target languages. The localisation layer runs translation-memory first — TM hits above confidence threshold bypass LLM generation entirely. The LLM generates from the structured attribute set (not the English prose), so localised content is built from structured data rather than translated from English copy — producing more accurate and idiomatic output in category-specific language.

05
Quality routing — humans on exception only

Every listing exits the pipeline with a quality confidence score. Listings above threshold publish automatically. Listings below threshold enter a prioritised human review queue. Corrections feed back into the extraction and generation models — each human intervention improves future accuracy rather than simply consuming capacity.

06
Publish and measure

Published listings are tracked for downstream performance: search appearance rate, filter-click rate, page rank position, and add-to-cart conversion. Category-level performance signals feed back into the attribute model review cycle — attributes that consistently correlate with conversion lift are elevated; those with no signal are reviewed for removal.

The multilingual engine in depth

The localisation engine is most often misunderstood from the outside. It is not a translation pipeline with an AI layer added. The architecture is fundamentally different from the "write English, translate everything" model that most localisation operations use — and that difference is what makes 8+ language coverage sustainable at this scale.

The key design choice: localised content is generated from the structured attribute set, not translated from English prose. For fashion, this means the Hindi product description is built from the same attribute values (fabric: silk, occasion: wedding, weave: handloom, regional style: Banarasi) that the English description was built from — but the Hindi generator produces idiomatic Hindi commerce copy, not translated English commerce copy. This eliminates an entire class of translation errors: those arising from English idiom, grammar structure, and cultural reference that does not transfer cleanly.

Translation memory is used as a first-pass, not a fallback. Segment-level TM matching runs against the entire accumulated TM before any LLM generation is attempted. High-confidence TM hits bypass generation entirely — and this is what keeps the system cost-sustainable. Most short, structured segments in a large catalogue (product type names, care instruction boilerplate, certifications) achieve high TM coverage quickly, and those segments are essentially free to localise on subsequent passes.

The quality routing system

The human review queue is not a fallback for when the AI fails. It is a calibration loop that keeps the AI accurate as the underlying catalogue evolves. Catalogue composition drifts: new categories launch, new supplier types enter, new product trends create attribute vocabulary the current models have not seen. Without the queue, these drifts produce slowly degrading output quality that is hard to detect and hard to correct.

The queue is prioritised by business impact, not volume. A low-confidence flag on a top-selling product in a high-margin category is reviewed before a long-tail SKU. The prioritisation logic uses sales rank, category margin, and strategic importance — ensuring that finite human review capacity is applied where it produces the highest return.

On the numbers above. The scale metrics shown — throughput multiple, cost per product, human review rate — are directional figures representative of our engagement at this scale. They reflect the order-of-magnitude gains that the AI-native pipeline produces versus a manual content operations model. Specific engagement metrics are commercially sensitive and not published.

The portability question — and what it means for other markets

One of the key design disciplines we developed is the separation between the attribute model (which is category- and market-specific) and the pipeline (which is generic). The pipeline does not need to know that it is processing fashion for Myntra versus electronics for Flipkart. It receives an attribute schema, a confidence model, and source materials, and it produces structured output. The category-specific intelligence lives in the schema, not the pipeline.

This separation is what makes the approach portable. The same pipeline architecture that works for fashion at Myntra works for home goods at a value retailer, for industrial components on a B2B marketplace, or for any other catalogue where structured product data is the constraint. The attribute schemas change; the pipeline does not.

The same portability applies to language. The pipeline is not Hindi-specific. It is an architecture for generating structured localised content at scale, and the language is a parameter. That is why the same approach we built for Indian commerce applies directly to Arabic ecommerce, French retail, or any language market where catalogue volume and content quality are simultaneous constraints.

What we learned that informed everything after this

Work with us

Building in a language or market where catalogue quality is the constraint?

The architecture described in this case study — attribute-intelligence extraction, multilingual generation from structured data, quality routing to humans on exception only — is the same pattern we apply to new commerce contexts. If you are running content or localisation operations at scale and want to understand what an AI-native pipeline underneath your existing team and tools looks like, we can walk through it in a half-hour.

See our content & localisation practice →

Frequently asked questions

How many languages does the pipeline support?

The pipeline covers 8+ Indian languages including Hindi, Bengali, Tamil, Telugu, Kannada, Marathi, Gujarati, and Malayalam. The same architecture is language-agnostic and extends to Arabic and other languages: translation-memory-first matching, LLM generation on low-match segments, and human post-edit on low-confidence output.

What is the cost per product at marketplace scale?

At scale, the AI-native pipeline delivers full multilingual content in the $0.80–$1.20 per product range, depending on category complexity and the number of target languages. The human review component is applied exception-only — typically under 15% of listings — which is what makes this cost sustainable at tens of millions of SKUs.

Does this replace existing translation memory or content workflows?

No. The pipeline is designed to run on top of existing TM and editorial assets. Translation memory is the first pass: if a segment matches within confidence threshold, it is used directly. LLM generation fills only the gaps, and humans review only what the system flags as low confidence. Existing linguistic assets do more work per editor-hour, not less.

Can this approach work for Arabic and MENA localisation?

Yes. Arabic presents specific challenges — right-to-left rendering, MSA vs. dialect decisions, character encoding in structured fields, and category-specific cultural vocabulary (particularly in fashion and food). We have applied the same attribute-intelligence approach to MENA commerce contexts, and the underlying architecture transfers directly.

What makes this different from a standard translation or content agency?

A translation agency produces localised text. This pipeline produces structured product intelligence that happens to be available in multiple languages. The attribute layer is what powers search filtering, SEO, and AI discovery — not just customer-facing copy. You can have perfectly translated descriptions that still rank poorly because the attributes are incomplete. The attribute model is built first; localised content is derived from it.

Related work & reading

The engineering disciplines behind this work.

Let's build

Have an ambitious idea? We'd love to hear it.

Whether you're testing a hypothesis or scaling an established product, we'd be glad to spend a half-hour helping you think through the next step.