IKEA US - CommerceTXT Dataset 30,511 IKEA US products in CommerceTXT v1.0.1 format - A token-optimized, human-readable alternative to JSON for e-commerce data. ๐ Dataset Statistics Metric Value Products 30,511 Categories 632 Format CommerceTXT v1.0.1 Data Date 2025-07-15 Token Savings 24% vs JSON Tokens Saved 3.6M ๐ฏ What is CommerceTXT? CommerceTXT is a lightweight, text-based protocol designed for AI/LLM consumption of e-commerce data. It eliminates JSON overhead while maintaining structure and readability. Key Benefits: โ 24% fewer tokens than JSON (3.6M saved including catalog structure) โ Human-readable - easy to debug and version control โ AI-optimized - clean format for RAG and LLM processing โ Structured - parseable with simple rules ๐ Dataset Structure ikea-us-commercetxt/ โโโ commerce.txt # Root with @CATALOG (632 categories) โโโ products/ # 30,511 files organized by category โ โโโ frames/ โ โ โโโ 00263858.txt โ โ โโโ ... โ โโโ tables-and-desks/ โ โ โโโ ... โ โโโ ... (632 category folders) โโโ categories/ # 632 category index files โ โโโ frames.txt โ โโโ tables-and-desks.txt โ โโโ ... ๐ Usage Load with datasets library from datasets import load_dataset dataset = load_dataset("tsazan/ikea-us-commercetxt") commerce_txt = dataset['train'][0]['commerce.txt'] product_files = dataset['train'][0]['products'] Direct file access with open("commerce.txt") as f: catalog = f.read() print(catalog) with open("products/frames/00263858.txt") as f: product = f.read() print(product) with open("categories/frames.txt") as f: category = f.read() print(category) Parse with CommerceTXT parser from commercetxt import parse_file result = parse_file("products/frames/00263858.txt") product = result.directives.get('PRODUCT', {}) offer = result.directives.get('OFFER', {}) print(f"Product: {product.get('Name')}") print(f"Price: ${offer.get('Price')}") print(f"Brand: {product.get('Brand')}") ๐ File Format Example # @PRODUCT Name: KNOPPรNG frame, black SKU: 00263858 Brand: IKEA LastUpdate...
First seen: 2026-01-12 14:01
Last seen: 2026-01-12 17:02