AI for Categories and Filters: How GPT Improves Catalog Taxonomy

Author: WebGoodPeople

The Real Problem with Large Catalogs

In a large e-commerce catalog, the same product can appear under dozens of names: "Steel pipes", "Pipes, steel", "St. pipes" — all in the same system. Add to that: categories with a single product, empty filter attribute values, hundreds of SKUs with no category at all. This is not a hypothetical scenario — it is what we see on almost every project with a catalog of 3,000+ SKUs.

The consequences are real: faceted search breaks down, users get zero results for products that physically exist, the recommendation engine misfires, and SEO suffers from duplicates and empty pages. Manually correcting 8,000 items means months of content work with a high error rate.

Three Concrete Tasks AI Solves

a) Normalizing product names and attributes. GPT receives a batch of 50–100 products along with a prompt that defines the taxonomy rules: which categories exist, how attributes should be named, what values are acceptable. The output is normalized names and suggested property values. No magic — this is essentially structured classification against defined rules.

b) Assigning categories to uncategorized products. Products without a category are common after migrating from 1C or loading a supplier price list. GPT analyzes the name, description, and existing characteristics, then suggests a category from a fixed list. This works well when the rules and examples in the prompt are clearly defined.

c) Generating missing attribute values from description text. If a product has a detailed description but the "Material" or "Precision Class" property is empty — GPT extracts the value from the text. This is cheaper than writing a custom parser for each product type, and more accurate than leaving the field blank.

Architecture: Batch Processing Only

A critical point: AI taxonomy enrichment is not a real-time process. Embedding an OpenAI call at the moment a product is saved is a bad idea — it introduces latency, unpredictable errors, and accumulates incorrect data with no rollback path.

The right approach: a Bitrix agent runs on schedule, selects 50–100 products missing property X (e.g., "AI Category" or "Draft Material"), sends the batch to OpenAI, receives the response, and writes the result to a draft property — not the live one. Then comes manual review.

Real Example: Hardware Catalog, 8,000 SKUs

On one project — a construction hardware catalog — the starting state looked like this:

  • 34% of products had no category assigned
  • Filter attribute coverage was 58%
  • Over 120 name variations for approximately 30 real categories

After implementing batch AI classification with a review gate, over 6 weeks:

  • Auto-assignment accuracy: 91% (verified on a 500-product sample)
  • Filter attribute coverage rose from 58% to 89%
  • Category name variations reduced to 31 (matching the actual category count)

These numbers were achieved not automatically, but precisely because of the review gate — a weekly sample check before promoting values to production.

The Review Gate Is Not Optional

AI makes mistakes. Not often, but systematically in specific patterns: rare categories, non-standard products, ambiguous names. An "AI assigns → auto-publishes" workflow is not acceptable for a production catalog.

The working model: the AI value lands in a draft Bitrix property (e.g., CATEGORY_AI_DRAFT). Once a week, a content manager or analyst reviews 20–30 random items from the new batch. If accuracy exceeds the threshold (typically 90%), the entire batch is promoted to the live property via script. If not, the batch goes to manual correction.

This process can be partially automated: compute a confidence score from the GPT response and auto-promote only high-confidence assignments. But manual sample review is still necessary — at least once a month.

What This Enables

  • Faceted search works correctly — attribute filters return real results, not empty pages
  • Zero-results rate drops — users find products even when their query does not exactly match the name
  • Recommendations improve — the engine sees correct categories and attributes, not noise
  • SEO improves — category pages get real content, duplicates are eliminated

If you have a catalog of 2,000+ SKUs with historical data from 1C or supplier feeds, AI taxonomy enrichment pays for itself quickly — not as a replacement for your content team, but as a tool that handles routine classification and frees people for work that actually requires expertise.

Want to explore how this applies to your catalog? Learn more about AI integration — or get in touch to discuss a pilot on your data.

Tell us about your project

Our offices

  • Russia
    Saint Petersburg, Rizhskaya st. 5, bldg. 1, office 402
    +7 (967) 555-90-32
  • Kazakhstan
    Almaty
    +7 (707) 340-29-12