Company registration data is the backbone of business intelligence, compliance, and due diligence in the DACH region. Yet accessing this data programmatically from the official German Handelsregister and Austrian ZVR (Zentrales Vereinsregister) remains a significant technical challenge. Vereinsregister Universal Crawler is a high-performance web crawler with a modern control interface that automates the extraction of company registration documents, shareholder lists, and structured business data at scale.
The Problem
The German Handelsregister (commercial register) and Austrian ZVR contain critical information about every registered company: founding documents, current officers, registered addresses, financial filings, shareholder lists, and chronological change histories. This data is essential for:
- Due diligence — Verifying company details before business relationships
- Compliance (KYC/AML) — Know Your Customer and Anti-Money Laundering checks
- Market research — Analyzing company formations, dissolutions, and ownership structures
- Legal proceedings — Obtaining official company documents for court cases
However, the official register portals are designed for manual, one-at-a-time lookups. They offer no bulk download API, impose session-based access controls, and present data through complex multi-step web interfaces. For anyone needing data on hundreds or thousands of companies, manual extraction is simply not feasible.
„We needed shareholder lists for 5,000 companies across 20 German courts. Manual download would have taken months. The crawler did it in days.“
The Solution
The Vereinsregister Universal Crawler is a Node.js/TypeScript application that automates the entire document retrieval process. It navigates the register portals programmatically, handles session management and pagination, and downloads documents in parallel across multiple courts. A modern single-page web UI provides real-time monitoring, analytics, and control over the crawling process.
Key Features
- Multi-Country Support — Crawl both German Handelsregister and Austrian ZVR registers
- Multi-Court Parallel Processing — Crawl up to 5 courts simultaneously, each with its own worker pool
- 7 Document Types — Download SI (structured data), AD (current printout), CD (chronological history), HD (historical document), DK (document archive), UT (entity carrier), and GL (shareholder lists)
- Gesellschafterliste (Shareholder List) Support — Specialized extraction of shareholder lists via DK document tree navigation
- Court Explorer — Browse courts by German state, view register number ranges, and queue multiple courts for bulk processing
- Document Browser — Search and browse downloaded documents with filters and pagination
- Company Search — Full-text search across all collected company data
- On-Demand Retrieval — Retrieve specific documents by court and register number without running a full crawl
- Real-Time Monitoring — Live progress tracking, log streaming, and run history with statistics
- Analytics Dashboard — Visualize document coverage, crawl performance, and trends with Chart.js
- Proxy Rotation — Built-in proxy support with health tracking and success rate monitoring
- Notifications — Optional Discord and Slack webhooks for crawl status updates
Technology Stack
| Component | Technology | Purpose |
|---|---|---|
| Runtime | Node.js + TypeScript | High-performance async I/O with type safety |
| Web Server | Express | REST API and static file serving |
| Database | SQLite (better-sqlite3, WAL mode) | High-performance persistent storage |
| Frontend | Vanilla JS SPA | Single-page application with multiple views |
| Charts | Chart.js | Analytics and performance visualizations |
| Parsers | XJustiz, Austria, CD parsers | Document-specific XML and data parsing |
| Proxy | Custom ProxyManager | Rotation, health tracking, success rates |
| Notifications | Discord/Slack webhooks | Crawl status alerts |
Architecture
The crawler follows a service-oriented architecture with clear separation between crawling, parsing, storage, and presentation concerns.
Crawl Orchestration Layer
The CrawlerManager coordinates multi-court parallel processing. When a bulk crawl is initiated, it distributes work across up to 5 concurrent court crawlers. Each court crawler is managed by the CrawlerService, which handles the actual scraping logic, including session management, pagination, and retry handling.
Scraper Layer
The HandelsregisterScraper implements the protocol for interacting with the German register portal. It handles the multi-step document retrieval process: searching by register number, navigating document trees, and downloading individual documents. For shareholder lists (GL), it performs specialized DK tree navigation to locate and extract the most recent Gesellschafterliste.
Parser Layer
Three specialized parsers handle different document formats:
XJustizParser— Parses German XJustiz XML structured data (SI documents)AustriaParser— Handles Austrian register document formatsCDParser— Processes chronological document histories
Storage Layer
SQLite with WAL (Write-Ahead Logging) mode provides high-performance concurrent reads during active crawls. The Database service manages all data persistence, while RunLogger tracks crawl runs with detailed statistics for historical analysis.
Proxy Management
The ProxyManager implements intelligent proxy rotation with health tracking. Each proxy’s success rate is monitored, and unhealthy proxies are automatically removed from the rotation pool. This ensures reliable, uninterrupted crawling even during extended multi-day operations.
API and Frontend
The Express server exposes a comprehensive REST API covering status monitoring, crawl control, document browsing, on-demand retrieval, search, and analytics. The frontend is a vanilla JavaScript SPA that communicates exclusively through this API, providing real-time dashboards, document browsers, court explorers, and analytics views.
Supported Document Types
| Code | Name | Format | Description |
|---|---|---|---|
| SI | Strukturierte Inhalte | XML | Structured company data in XJustiz format |
| AD | Aktueller Abdruck | Current official register printout | |
| CD | Chronologischer Abdruck | Complete chronological change history | |
| HD | Historischer Abdruck | Historical document snapshot | |
| DK | Dokumentenansicht | Tree | Document archive with navigation |
| UT | Unternehmensträger | XML | Entity carrier information |
| GL | Gesellschafterliste | Most recent shareholder list |
Deployment
Deployment requires only Node.js and npm. After installing dependencies and configuring the .env file with proxy settings and notification webhooks, the server starts with npm start and is accessible on the configured port (default 9399). The application is self-contained with no external database dependencies.
Conclusion
The Vereinsregister Universal Crawler transforms what would be months of manual document retrieval into an automated, monitored, and scalable operation. With multi-court parallel processing, 7 document types including shareholder lists, intelligent proxy rotation, and a comprehensive analytics dashboard, it provides enterprise-grade access to German and Austrian business register data. For compliance teams, legal departments, and business intelligence operations, it is an indispensable tool for large-scale register data extraction.