WYC Schierstein: Yacht Club Management

Company registration data is the backbone of business intelligence, compliance, and due diligence in the DACH region. Yet accessing this data programmatically from the official German Handelsregister and Austrian ZVR (Zentrales Vereinsregister) remains a significant technical challenge. Vereinsregister Universal Crawler is a high-performance web crawler with a modern control interface that automates the extraction of company registration documents, shareholder lists, and structured business data at scale.

The Problem

The German Handelsregister (commercial register) and Austrian ZVR contain critical information about every registered company: founding documents, current officers, registered addresses, financial filings, shareholder lists, and chronological change histories. This data is essential for:

Due diligence — Verifying company details before business relationships
Compliance (KYC/AML) — Know Your Customer and Anti-Money Laundering checks
Market research — Analyzing company formations, dissolutions, and ownership structures
Legal proceedings — Obtaining official company documents for court cases

However, the official register portals are designed for manual, one-at-a-time lookups. They offer no bulk download API, impose session-based access controls, and present data through complex multi-step web interfaces. For anyone needing data on hundreds or thousands of companies, manual extraction is simply not feasible.

„We needed shareholder lists for 5,000 companies across 20 German courts. Manual download would have taken months. The crawler did it in days.“

The Solution

The Vereinsregister Universal Crawler is a Node.js/TypeScript application that automates the entire document retrieval process. It navigates the register portals programmatically, handles session management and pagination, and downloads documents in parallel across multiple courts. A modern single-page web UI provides real-time monitoring, analytics, and control over the crawling process.

Key Features

Multi-Country Support — Crawl both German Handelsregister and Austrian ZVR registers
Multi-Court Parallel Processing — Crawl up to 5 courts simultaneously, each with its own worker pool
7 Document Types — Download SI (structured data), AD (current printout), CD (chronological history), HD (historical document), DK (document archive), UT (entity carrier), and GL (shareholder lists)
Gesellschafterliste (Shareholder List) Support — Specialized extraction of shareholder lists via DK document tree navigation
Court Explorer — Browse courts by German state, view register number ranges, and queue multiple courts for bulk processing
Document Browser — Search and browse downloaded documents with filters and pagination
Company Search — Full-text search across all collected company data
On-Demand Retrieval — Retrieve specific documents by court and register number without running a full crawl
Real-Time Monitoring — Live progress tracking, log streaming, and run history with statistics
Analytics Dashboard — Visualize document coverage, crawl performance, and trends with Chart.js
Proxy Rotation — Built-in proxy support with health tracking and success rate monitoring
Notifications — Optional Discord and Slack webhooks for crawl status updates

Technology Stack

Component	Technology	Purpose
Runtime	Node.js + TypeScript	High-performance async I/O with type safety
Web Server	Express	REST API and static file serving
Database	SQLite (better-sqlite3, WAL mode)	High-performance persistent storage
Frontend	Vanilla JS SPA	Single-page application with multiple views
Charts	Chart.js	Analytics and performance visualizations
Parsers	XJustiz, Austria, CD parsers	Document-specific XML and data parsing
Proxy	Custom ProxyManager	Rotation, health tracking, success rates
Notifications	Discord/Slack webhooks	Crawl status alerts

Architecture

The crawler follows a service-oriented architecture with clear separation between crawling, parsing, storage, and presentation concerns.

Crawl Orchestration Layer

The CrawlerManager coordinates multi-court parallel processing. When a bulk crawl is initiated, it distributes work across up to 5 concurrent court crawlers. Each court crawler is managed by the CrawlerService, which handles the actual scraping logic, including session management, pagination, and retry handling.

Scraper Layer

The HandelsregisterScraper implements the protocol for interacting with the German register portal. It handles the multi-step document retrieval process: searching by register number, navigating document trees, and downloading individual documents. For shareholder lists (GL), it performs specialized DK tree navigation to locate and extract the most recent Gesellschafterliste.

Parser Layer

Three specialized parsers handle different document formats:

XJustizParser — Parses German XJustiz XML structured data (SI documents)
AustriaParser — Handles Austrian register document formats
CDParser — Processes chronological document histories

Storage Layer

SQLite with WAL (Write-Ahead Logging) mode provides high-performance concurrent reads during active crawls. The Database service manages all data persistence, while RunLogger tracks crawl runs with detailed statistics for historical analysis.

Proxy Management

The ProxyManager implements intelligent proxy rotation with health tracking. Each proxy’s success rate is monitored, and unhealthy proxies are automatically removed from the rotation pool. This ensures reliable, uninterrupted crawling even during extended multi-day operations.

API and Frontend

The Express server exposes a comprehensive REST API covering status monitoring, crawl control, document browsing, on-demand retrieval, search, and analytics. The frontend is a vanilla JavaScript SPA that communicates exclusively through this API, providing real-time dashboards, document browsers, court explorers, and analytics views.

Supported Document Types

Code	Name	Format	Description
SI	Strukturierte Inhalte	XML	Structured company data in XJustiz format
AD	Aktueller Abdruck	PDF	Current official register printout
CD	Chronologischer Abdruck	PDF	Complete chronological change history
HD	Historischer Abdruck	PDF	Historical document snapshot
DK	Dokumentenansicht	Tree	Document archive with navigation
UT	Unternehmensträger	XML	Entity carrier information
GL	Gesellschafterliste	PDF	Most recent shareholder list

Deployment

Deployment requires only Node.js and npm. After installing dependencies and configuring the .env file with proxy settings and notification webhooks, the server starts with npm start and is accessible on the configured port (default 9399). The application is self-contained with no external database dependencies.

Conclusion

The Vereinsregister Universal Crawler transforms what would be months of manual document retrieval into an automated, monitored, and scalable operation. With multi-court parallel processing, 7 document types including shareholder lists, intelligent proxy rotation, and a comprehensive analytics dashboard, it provides enterprise-grade access to German and Austrian business register data. For compliance teams, legal departments, and business intelligence operations, it is an indispensable tool for large-scale register data extraction.

The Problem

The Solution

Key Features

Technology Stack

Architecture

Crawl Orchestration Layer

Scraper Layer

Parser Layer

Storage Layer

Proxy Management

API and Frontend

Supported Document Types

Deployment

Conclusion

Benchmark Costing: A Global Economic & Manufacturing Intelligence Platform