Projects That Run in Production
Every project listed here was delivered to a real client, runs on real infrastructure, and handles real data. No concept projects. No internal experiments.
CASE / 01
Automotive Data Platform for Swedish E-commerce
- CLIENT TYPE
- Swedish Automotive E-commerce Platform
- INDUSTRY
- Automotive / E-commerce / Retail
- SERVICES
- Data Engineering, Web Scraping, ETL Pipeline
- STACK
- Python, Apache Airflow, PostgreSQL, Scrapy, PrestaShop, TecDoc, Docker
THE CHALLENGE
A Swedish automotive parts retailer needed to integrate over 5 million parts records from TecDoc and multiple supplier feeds into their PrestaShop storefront. Manual catalog management was causing product gaps, pricing errors, and inventory mismatches that were costing them sales. Existing processes involved overnight manual batch uploads that frequently failed without notification.
WHAT WE BUILT
- — A full TecDoc integration pipeline extracting parts data, vehicle compatibility mappings, and brand hierarchies across 149 automotive brands
- — Automated supplier feed ingestion from multiple European parts distributors with schema normalization and conflict resolution
- — A data enrichment layer matching TecDoc article numbers to supplier stock and pricing data
- — A custom PrestaShop PHP plugin for real-time catalog sync from the PostgreSQL warehouse
- — Apache Airflow DAGs for scheduled pipeline execution with retry logic, failure alerting, and full task logging
- — Data quality checks at every transformation stage catching null anomalies and schema drift before records reach the storefront
THE OUTCOME
- 5 million+ parts records fully integrated and searchable by vehicle registration number
- 149 brands covered with complete vehicle compatibility data
- Pipeline runtime reduced from overnight manual batch to sub-hourly automated updates
- Zero silent failures: every pipeline error surfaces immediately via Slack alerting
- Client team operates the system independently with runbooks and observability dashboards
CASE / 02
Medallion Architecture Pipeline for Healthcare Claims Data
- CLIENT TYPE
- Healthcare Data Platform
- INDUSTRY
- Healthcare / Data Engineering
- SERVICES
- Data Engineering, Data Architecture, ETL Pipeline
- STACK
- Python, Apache Airflow, PostgreSQL, dbt, Docker
THE CHALLENGE
A healthcare data platform needed to process large volumes of CMS Medicare claims data across inpatient, outpatient, carrier, and prescription drug domains. The existing pipeline was a collection of unorchestrated scripts with no data quality enforcement, no observability, and no clear separation between raw and analytical data. Data engineers spent significant time debugging failures that were discovered only when downstream reports broke.
WHAT WE BUILT
- — A three-layer medallion architecture with clear separation between raw ingestion, transformation, and analytical output
- — Bronze layer: raw file ingestion with schema validation, audit logging, and idempotent load patterns
- — Silver layer: standardized transformations, deduplication, business rule enforcement, and data type normalization
- — Gold layer: clean, aggregated analytical tables structured for direct consumption by reporting and AI systems
- — Apache Airflow DAGs with task-level logging, SLA monitoring, and failure alerting across all three layers
- — dbt models for declarative, version-controlled transformations with automated testing
- — Full data lineage tracking showing the path from raw source file to analytical table for every record
THE OUTCOME
- Zero undetected pipeline failures since deployment: every error surfaces with context before downstream impact
- Data engineering team time spent debugging reduced by over 60 percent
- New claim types added to the pipeline without restructuring existing DAGs
- Complete data lineage from raw source to analytical layer for governance and audit requirements
CASE / 03
RAG Knowledge Pipeline over Legal Document Corpus
- CLIENT TYPE
- Legal Technology Platform
- INDUSTRY
- Legal / AI / Knowledge Management
- SERVICES
- RAG Systems, AI Development, Data Engineering
- STACK
- Python, LangChain, OpenAI API, pgvector, PostgreSQL, FastAPI, Docker
THE CHALLENGE
A legal technology platform needed to enable semantic search and accurate question answering over a large corpus of legal documents. Users needed to query complex legal content in natural language and receive precise, cited answers without the hallucination risk of a general-purpose LLM. The document corpus was large, varied in format, and required careful chunking to preserve legal context across section boundaries.
WHAT WE BUILT
- — A full document ingestion pipeline handling PDF, HTML, and structured text formats with layout-aware parsing
- — A chunking strategy with overlap and section boundary preservation to maintain legal context
- — Embedding generation using OpenAI text-embedding-3-large with batch processing for the full corpus
- — pgvector integration in PostgreSQL for vector storage, cosine similarity search, and metadata filtering by document type and date range
- — LangChain retrieval chain with custom prompt templates engineered for legal accuracy and mandatory citation formatting
- — A ground truth evaluation framework testing retrieval precision and answer groundedness across hundreds of real queries
- — FastAPI deployment with JWT authentication and rate limiting for production use
THE OUTCOME
- Retrieval precision above 90 percent on the ground truth evaluation set
- Every answer includes citations linking directly to the source document and paragraph
- Average query response time under two seconds including retrieval and generation
- Fully modular architecture: swap embedding model, vector store, or LLM without rewriting the pipeline
CASE / 04
AI Automation Pipeline for Business Operations
- CLIENT TYPE
- Business Operations Platform
- INDUSTRY
- Operations / AI Automation
- SERVICES
- AI Automation, Agent Development, Workflow Engineering
- STACK
- n8n, Python, LangChain, OpenAI API, PostgreSQL, Slack API, Docker
THE CHALLENGE
A business operations team was spending multiple hours per day on a multi-step data handling process that involved fetching data from external sources, applying classification logic, writing results to a database, and notifying relevant stakeholders. The process was manual, error-prone, and delayed decisions because the data was always hours behind.
WHAT WE BUILT
- — An end-to-end n8n workflow replacing the entire manual process with a fully autonomous pipeline
- — Custom Python function nodes for data transformation and business logic that exceeded n8n's native capabilities
- — LangChain integration for LLM-powered classification and decision steps within the workflow
- — Webhook and scheduled polling triggers replacing manual process initiation
- — PostgreSQL write-back with structured output and full audit trail of every automated decision
- — Slack alerting for both successful completions and failures with full context for rapid resolution
- — Error handling, retry logic, and circuit breaker patterns preventing cascade failures
THE OUTCOME
- Manual processing time reduced by over 80 percent
- Data freshness improved from hours to minutes
- Zero missed processing cycles since deployment
- The same automation pattern is now reused across three additional workflows for the same client
Ready to build something that actually works in production?
Tell us about your data challenge. We will respond within 24 hours with a clear assessment and a practical plan.