In an era of globalized supply chains, manufacturers face a deceptively difficult question: where should we produce? The answer depends on a dizzying array of factors — labor costs, energy prices, currency rates, tariffs, social contributions, overhead factors, and commodity prices — all of which change constantly and vary dramatically by region. Benchmark Costing is a data-driven intelligence platform that collects, normalizes, and analyzes global economic and manufacturing cost data, enabling businesses to make informed sourcing and production decisions backed by real numbers.

The Problem

Manufacturing cost analysis today is a fragmented, manual process. Companies trying to compare production costs across countries must gather data from dozens of sources — the European Central Bank for currency rates, Eurostat for labor statistics, the Bureau of Labor Statistics for US data, Destatis for German figures, and commodity exchanges like the LME and COMEX. Each source has its own format, update frequency, and access method. The result is spreadsheet hell: outdated numbers, inconsistent comparisons, and decisions made on gut feeling rather than data.

Budget 40% of Phase 1 time for data quality — because „it’s just downloading some CSVs“ is the most dangerous underestimation in data engineering.

Beyond the data collection challenge, there is no standardized way to calculate total landed cost that accounts for all variables: wages, social contributions, energy costs, logistics, currency fluctuations, and overhead. Each company reinvents this wheel, often poorly.

The Solution

Benchmark Costing is a comprehensive platform built on a Python/FastAPI backend with TimescaleDB for time-series storage, Apache Airflow for data pipeline orchestration, and a Next.js frontend for interactive visualization. It automates the entire cycle from data collection to cost calculation to decision support.

The platform provides:

  • Automated Data Collection from 15+ authoritative sources (ECB, Federal Reserve, Eurostat, BLS, Destatis, LME, COMEX, and more)
  • Cost Calculators for total landed cost, machine hour rates, and sourcing analysis
  • Time-Series Intelligence with historical trends, forecasting, and anomaly detection
  • Regional Benchmarking across countries and industries
  • MCP Integration for AI-assisted analysis via the Model Context Protocol

Key Features

Comprehensive Data Pipeline

Apache Airflow orchestrates daily data collection DAGs that pull from authoritative sources. Each collector implements retry logic, data validation via Great Expectations, and incremental updates with weekly full reconciliation. The system tracks data lineage from source to calculation.

Cost Calculation Engine

A hybrid calculation engine pre-computes common scenarios covering 80% of queries while handling edge cases in real-time. Materialized views are refreshed every 6 hours. Calculators cover:

  • Total landed cost with multi-region comparison
  • Machine hour rates with depreciation and shift models
  • Sourcing analysis with risk scoring
  • What-if scenario analysis with parameter sweeps

Material and Machinery Database

The platform catalogs 8,500+ plastic grades, 1,200+ steel types, and 3,000+ alloys with properties, pricing, and supplier information. A machinery catalog tracks equipment costs, depreciation schedules, and capacity data for machine hour rate calculations.

Interactive Data Explorer

The Next.js frontend offers multi-dimensional filtering, time-series charts built with Recharts and D3.js, correlation heatmaps, geographic visualizations with choropleth maps, and customizable dashboards. Data can be exported as CSV, Excel, or PDF.

Technology Stack

Layer Technology Purpose
Backend Python 3.11+, FastAPI, SQLAlchemy 2.0, Pydantic v2 API server and business logic
Database TimescaleDB 2.13+ (PostgreSQL 15) Time-series data with compression
Cache Redis 7.2+ with RedisJSON Response caching with hybrid TTL
Data Pipeline Apache Airflow 2.8+, httpx, BeautifulSoup4 ETL orchestration and collection
Frontend Next.js 14+, TypeScript 5.3+, Shadcn/UI Interactive dashboards and calculators
Charts Recharts, D3.js Data visualization
ML (Phase 3) Scikit-learn, Prophet, MLflow Forecasting and anomaly detection
Infrastructure Terraform, Docker, Kubernetes, GitHub Actions IaC, containers, CI/CD
Monitoring Prometheus, Grafana, ELK Stack Observability

Architecture

Benchmark Costing follows a layered architecture with clear separation of concerns. At the base sits TimescaleDB, chosen for its SQL compatibility, excellent compression (2-20x), and continuous aggregates for pre-computed rollups. The time-series data is partitioned monthly with automated retention policies and archival to object storage.

The Data Collection Layer runs as Airflow DAGs with isolated task failures — one broken source never takes down the entire pipeline. Collectors implement incremental updates daily and full reconciliation weekly. Data quality is enforced through Great Expectations with range checks, consistency validation, and anomaly detection.

The API Layer (FastAPI) provides RESTful endpoints with OAuth 2.0/JWT authentication, role-based access control, cursor-based pagination, and rate limiting. A planned GraphQL API and WebSocket support for real-time updates are on the roadmap.

The Caching Layer uses a hybrid strategy: 1-hour TTL for most data combined with event-driven invalidation for critical paths like currency rates. Redis with RedisJSON supports complex cached objects.

The MCP Connector exposes tools like query_regional_data, get_labor_costs, search_materials, calculate_machine_hour_rate, and cost_comparison, enabling AI assistants to perform sophisticated manufacturing cost analysis through natural language.

benchmark-costing/
├── backend/           # FastAPI application
│   ├── api/           # Routers, models, schemas, services
│   └── calculators/   # Cost calculation engines
├── data-collectors/   # Airflow DAGs + source connectors
│   ├── airflow/       # DAGs, plugins, config
│   └── collectors/    # ECB, Eurostat, BLS, etc.
├── frontend/          # Next.js with Shadcn/UI
├── mcp-connector/     # MCP server + tools
├── ml/                # Forecasting models (Phase 3)
└── infrastructure/    # Terraform, K8s, Docker

Data Sources

Source Data Frequency
European Central Bank Currency exchange rates Daily
Federal Reserve (FRED) US economic indicators Daily
Eurostat EU labor costs, statistics Weekly
Destatis German labor and industry data Weekly
Bureau of Labor Statistics US CPI, PPI, employment cost Weekly
LME / COMEX Commodity and metal prices Daily
ENTSO-E European energy prices Daily

Looking Ahead

The platform’s four-phase roadmap spans 12 months, progressing from foundation (data collection, core API, basic UI) through expansion (15+ data sources, MCP connector, advanced calculators) to intelligence (ML-powered forecasting with Prophet and LSTM, anomaly detection, recommendation engine) and finally optimization (performance tuning, SOC 2 compliance, client SDKs). With an estimated 150+ tasks and 200 person-weeks of effort, Benchmark Costing aims to become the definitive platform for global manufacturing cost intelligence — turning a chaotic landscape of scattered data into clear, actionable insights that drive better sourcing decisions.