How I Built TideWatch: Real-Time Maritime Threat Detection for the Baltic Sea
Contents
Four major incidents in two years. Nord Stream pipelines sabotaged. Balticconnector gas pipeline severed. C-Lion1 and BCS East-West telecom cables cut within 18 hours. Estlink 2 power link and four telecom cables destroyed on Christmas Day 2024. Each time, the public trail was there — AIS gaps, satellite passes, news reporting — but the fusion work happened manually, after the fact.
I decided to build the system that should have existed before these incidents. TideWatch is a real-time grey-zone maritime threat detection platform for the Baltic Sea. It fuses live AIS vessel tracking, Sentinel-1 SAR satellite imagery, and OSINT into a single analyst console — with an AI-powered analyst that can answer questions about any vessel, alert, or incident with full evidence provenance.
This article is the technical deep-dive into how I built it, what architectural decisions mattered, and what I learned about building AI systems for security-critical applications.
The Problem: Manual Fusion After the Fact
Every Baltic cable incident followed the same pattern. A vessel did something suspicious — dragged an anchor, went dark on AIS, loitered near critical infrastructure. The evidence was scattered across multiple data sources: AIS tracking platforms, satellite imagery archives, news reports, government statements. Analysts and journalists pieced it together days or weeks later.
The Yi Peng 3 incident in November 2024 is the clearest example. The bulk carrier dragged its anchor across two subsea cables — BCS East-West (Sweden-Lithuania) and C-Lion1 (Finland-Germany) — within 18 hours. The AIS track showed the vessel's path crossing both cable routes. Satellite imagery confirmed the timeline. But this reconstruction happened after the cables were already severed.
The question I asked: what if the fusion happened in real-time? What if an analyst console could show a vessel approaching a cable route, flag the anomalous behavior, cross-reference it with satellite imagery, and generate an alert — before the damage occurs?
That is what TideWatch does.
Architecture: Ingest, Fuse, Detect, Explain
TideWatch has four layers, each deliberately constrained to open or commercially licensable data. No classified sources. Every alert has defensible provenance.
Layer 1: Live AIS Fusion
AIS (Automatic Identification System) is the maritime equivalent of aircraft transponders. Vessels broadcast their position, speed, heading, and identity. TideWatch ingests this data in real-time via AISStream WebSocket, covering the full Baltic Area of Interest.
The raw AIS stream is noisy — duplicate messages, out-of-order timestamps, position jumps from GPS errors. The ingestion pipeline normalizes, deduplicates, and spatially indexes every message into PostgreSQL with PostGIS. A vessel position appears on the analyst console less than 5 seconds after broadcast.
The critical design decision: storing raw AIS messages alongside processed tracks. When an anomaly is detected, the analyst can drill down to the exact raw messages that triggered it. No black boxes.
Layer 2: SAR Dark-Vessel Detection
AIS has a fundamental weakness: vessels can turn it off. A ship approaching critical infrastructure with malicious intent will likely go dark — exactly when you most need to track it.
Sentinel-1 SAR (Synthetic Aperture Radar) satellites solve this. SAR works day and night, through clouds, and detects vessel-sized objects on the water surface regardless of whether they are broadcasting AIS. TideWatch processes Sentinel-1 imagery through the Copernicus Process API, applying CFAR (Constant False Alarm Rate) amplitude detection to identify vessel-like objects.
The fusion reconciler then matches SAR detections against the AIS database. If the satellite sees a vessel-sized object at a location where no AIS-broadcasting vessel exists — that is a dark-vessel alert. The most operationally significant type of detection in grey-zone maritime surveillance.
This is where the architecture gets interesting. SAR imagery has limited temporal resolution — Sentinel-1 revisit time over the Baltic is measured in days, not minutes. So dark-vessel detection is not continuous. It provides snapshots that, when fused with continuous AIS tracking, create a more complete picture than either source alone.
Layer 3: Anomaly Detectors
Raw position data becomes operationally useful through anomaly detection. TideWatch runs multiple detectors on rolling time windows:
Loitering near cables: A vessel that reduces speed and circles within a defined proximity of a subsea cable or pipeline route triggers a loitering alert. The threshold is configurable — different cable operators have different risk tolerances.
AIS-off inside zone: A vessel that was broadcasting AIS, enters a monitored zone, and stops broadcasting triggers an immediate alert. This is the pattern seen in multiple Baltic incidents — vessels going dark near critical infrastructure.
Route deviation: A vessel whose actual track deviates significantly from its declared route or typical pattern for its vessel type. A bulk carrier that suddenly changes course toward a cable route when its AIS destination says otherwise.
Rendezvous detection: Two vessels meeting at sea in an unusual location or pattern. Relevant for ship-to-ship transfers and coordinated operations.
Every anomaly is explainable — the detector outputs a human-readable explanation of why the alert was raised, with references to the specific data points that triggered it. And every alert is score-ranked, so the analyst feed stays quiet. A system that generates hundreds of false positives per day is worse than no system at all.
Layer 4: Natural-Language Analyst
This is where the AI architecture I have built across multiple production systems comes together in a new domain.
The TideWatch analyst is an LLM-powered interface that answers natural-language questions about vessels, alerts, and incidents. But unlike a generic chatbot, it operates under strict constraints:
Typed tools only. The LLM does not have free-form access to the database. It can only call predefined tools — get_vessel_history, get_alerts_in_area, get_cable_proximity, get_sar_detections. Each tool returns structured data with source IDs. This eliminates the possibility of hallucinated vessel names, fabricated positions, or invented incidents.
Evidence grounding. Every claim in the analyst's response links back to a raw data ID — an AIS message, a SAR detection, an OSINT article. The analyst cannot state "Vessel X was near Cable Y at time Z" without the underlying data supporting it. This is not optional in a security context. A hallucinated vessel name in a threat assessment could trigger a diplomatic incident.
RAG over structured data. The knowledge base is not a collection of documents — it is the live spatiotemporal index of tracks, detections, infrastructure, alerts, and OSINT. The retrieval is spatial and temporal, not just semantic. "What vessels were within 5km of C-Lion1 in the last 48 hours?" requires a PostGIS query, not a vector similarity search.
This architecture draws directly from the lessons I documented in building production RAG systems — particularly the principle that retrieval quality determines output quality. In maritime surveillance, retrieval is geospatial, and the consequences of poor retrieval are not just wrong answers but potentially wrong security decisions.
The Technology Stack
| Component | Technology | Why |
|---|---|---|
| Database | PostgreSQL + PostGIS + pgvector | Spatiotemporal indexing for vessel tracks, geospatial queries for proximity detection, vector storage for OSINT embeddings |
| AIS ingestion | AISStream WebSocket | Real-time vessel positions, open data |
| SAR processing | Sentinel-1 via Copernicus Process API | Free SAR imagery, global coverage |
| Anomaly detection | Python workers on rolling windows | Configurable detectors, score-ranked output |
| Fusion reconciler | Custom Python service | SAR × AIS matching for dark-vessel detection |
| LLM analyst | RAG over typed tools | Evidence-grounded, no hallucination possible |
| Frontend | MapLibre + deck.gl | High-performance geospatial visualization |
| Infrastructure | Docker containers | Reproducible deployment |
The choice of PostgreSQL with PostGIS as the core data store deserves explanation. Maritime data is fundamentally spatiotemporal — every data point has a location and a time. PostGIS provides the spatial indexing and query capabilities (ST_DWithin, ST_Intersects, spatial joins) that make real-time proximity detection possible. Adding pgvector to the same database allows OSINT embeddings to live alongside structured vessel data, enabling hybrid queries that combine spatial, temporal, and semantic search.
I considered dedicated time-series databases (TimescaleDB, InfluxDB) and dedicated vector databases (Pinecone, Qdrant). The decision to keep everything in PostgreSQL was deliberate — operational simplicity. One database to backup, monitor, and maintain. For the current scale (Baltic Sea, hundreds of vessels), PostgreSQL handles the load comfortably. If TideWatch expands to global coverage, the architecture may need to evolve.
Design Decisions That Mattered
Narrow Scope by Design
TideWatch is deliberately narrow: Baltic Sea, subsea-cable and suspicious-vessel threat model. Not global. Not all maritime threats. Not piracy, smuggling, or illegal fishing.
This constraint is the single most important architectural decision. A system that tries to detect every type of maritime anomaly globally will drown in false positives and never achieve the detection quality needed for operational use. By constraining the geography (Baltic Sea) and the threat model (grey-zone attacks on critical infrastructure), every component can be optimized for that specific mission.
The anomaly detectors are tuned for Baltic vessel traffic patterns. The cable and pipeline database covers Baltic infrastructure specifically. The OSINT sources are curated for Baltic security reporting. This narrow focus is what makes the system operationally useful rather than academically interesting.
Open Data Only
Every data source in TideWatch is either open (AIS via AISStream, Sentinel-1 via Copernicus) or commercially licensable. No classified data. No government-restricted sources.
This is a deliberate choice with significant implications:
Defensible provenance. Every alert can be traced back to publicly available data. An analyst can share a TideWatch alert with a partner nation, a cable operator, or a journalist without classification concerns.
Reproducibility. Anyone with the same data sources can verify TideWatch's detections. This builds trust in a domain where trust is scarce.
Scalability. Open data does not require security clearances, SCIFs, or classified networks. TideWatch can run on commercial cloud infrastructure and be accessed by analysts without special access.
The trade-off is coverage. Classified sources — signals intelligence, submarine detection, classified satellite imagery — provide capabilities that open data cannot match. TideWatch does not replace classified systems. It provides a complementary layer that operates in the unclassified space, accessible to a broader set of analysts and decision-makers.
No Hallucinated Vessel Data
In a generic chatbot, a hallucination is an inconvenience. In a maritime threat assessment, a hallucinated vessel name could trigger a diplomatic incident. A fabricated position could misdirect a coast guard patrol. An invented timeline could undermine an investigation.
The typed-tools architecture eliminates this category of risk entirely. The LLM cannot generate vessel data — it can only query for it through predefined tools that return real data from the database. If the data does not exist, the analyst says so. It cannot invent a vessel that was not there.
This is the same principle I applied in building production AI agents — autonomous systems need hard constraints on what they can and cannot do. In TideWatch, the constraint is absolute: the analyst can reason about data, but it cannot create data.
Lessons Learned
1. Fusion Is the Hard Part
Individual data sources are relatively straightforward. AIS ingestion is a solved problem. SAR processing has well-documented algorithms. OSINT collection is routine. The hard part is fusion — combining data from sources with different temporal resolutions, spatial accuracies, and reliability characteristics into a coherent operational picture.
AIS updates every few seconds. SAR imagery arrives every few days. OSINT articles appear irregularly. Fusing these into a single timeline that an analyst can query requires careful handling of temporal uncertainty, spatial matching tolerances, and confidence scoring.
2. False Positive Management Is Everything
A maritime anomaly detection system that generates 500 alerts per day is useless. Analysts will ignore it within a week. The system must be quiet — surfacing only the alerts that genuinely warrant attention.
This required extensive tuning of detection thresholds, score-ranking algorithms, and deduplication logic. A vessel that loiters near a cable route might be a fishing boat, a vessel waiting for a pilot, or a ship with engine trouble. The system needs enough context to distinguish routine behavior from genuinely suspicious activity.
The current approach uses layered scoring: proximity to infrastructure × behavioral anomaly score × vessel type risk factor × time-of-day weighting. Only alerts above a configurable threshold reach the analyst feed.
3. Explainability Is Non-Negotiable
In security applications, "the AI detected an anomaly" is not sufficient. The analyst needs to understand why the alert was raised, what data supports it, and how confident the system is. Every TideWatch alert includes:
- The specific detector that triggered it
- The data points that contributed to the score
- The confidence level
- Links to the raw underlying data
This explainability requirement shaped the entire architecture. It is why the anomaly detectors are rule-based rather than black-box ML models. It is why the LLM analyst uses typed tools rather than free-form database access. It is why every data point has a provenance chain back to its source.
4. Scope Discipline Predicts Delivery
The temptation in building a maritime surveillance system is to boil the ocean — detect everything, everywhere, for everyone. TideWatch resists this. Baltic Sea. Subsea cables. Suspicious vessels. That is the scope. Everything else is Phase 2 or Phase 3.
This discipline is what allowed a single developer to build a functional system. A global maritime surveillance platform would require a team of dozens and years of development. A Baltic cable-threat detection system can be built, tested, and demonstrated by one person with the right architecture.
What TideWatch Does Not Do
Honest framing matters more than over-claiming:
- TideWatch does not replace classified intelligence systems. It operates in the unclassified space with open data.
- TideWatch does not provide continuous dark-vessel detection. SAR satellite revisit times limit temporal coverage.
- TideWatch does not predict attacks. It detects anomalous behavior that may warrant investigation.
- TideWatch does not make decisions. It provides evidence-grounded information to human analysts who make decisions.
What Comes Next
The roadmap is phased by design:
Phase 2 adds red/blue agent-based wargaming (reinforcement learning for course-of-action ranking), partner pilots with cable operators and coastal agencies, and allied platform integration pathways.
Phase 3 expands beyond the Baltic — North Sea, Eastern Mediterranean, Black Sea — with edge deployment on patrol assets and multi-tenant organizational access.
But Phase 1 must prove the pipeline works on a constrained AOI before expanding. Scope discipline is the single biggest predictor of delivery in security technology projects.
The Broader Lesson
TideWatch exists because grey-zone warfare has created a new category of threat that falls between traditional military intelligence and civilian law enforcement. The attacks on Baltic infrastructure are not conventional military operations — they use commercial vessels, plausible deniability, and the vastness of the maritime domain to avoid attribution.
Detecting these threats requires fusing data sources that traditionally live in separate silos — maritime traffic data, satellite imagery, open-source intelligence. And it requires making that fused picture accessible to analysts who may not have access to classified systems.
The technology to do this exists. AIS data is openly available. Sentinel-1 SAR imagery is free through Copernicus. LLMs can serve as evidence-grounded analyst interfaces. PostgreSQL with PostGIS can handle the spatiotemporal data model. The missing piece was not technology — it was integration.
TideWatch is that integration. Narrow by design. Open by principle. Built to prove that real-time grey-zone maritime threat detection is possible with open data and modern AI architecture.
TideWatch is live at tidewatchlabs.com. If you are working on maritime security, critical infrastructure protection, or OSINT fusion and want to discuss the architecture or explore collaboration, get in touch.
FAQ
What is TideWatch?
TideWatch is a real-time grey-zone maritime threat detection platform for the Baltic Sea. It fuses live AIS vessel tracking, Sentinel-1 SAR satellite imagery, and OSINT into a single analyst console with an AI-powered natural-language analyst. It detects dark vessels, loitering near subsea cables, and anomalous vessel behavior using only open or commercially licensable data.
How does TideWatch detect dark vessels?
TideWatch uses Sentinel-1 SAR satellite imagery processed with CFAR amplitude detection to identify vessel-sized objects on the water surface. These SAR detections are then cross-referenced against the live AIS database. If a satellite sees a vessel-sized object where no AIS-broadcasting vessel exists, it generates a dark-vessel alert.
Does TideWatch use classified data?
No. TideWatch operates exclusively on open or commercially licensable data — AIS via AISStream, Sentinel-1 SAR via Copernicus, and curated OSINT sources. This ensures defensible provenance for every alert and allows the system to operate without security clearances or classified networks.
How does the AI analyst avoid hallucinating vessel data?
The LLM analyst operates through typed tools only — predefined functions like get_vessel_history and get_alerts_in_area that return real data from the database. The AI cannot generate vessel data, fabricate positions, or invent incidents. Every claim links back to a raw data ID with full provenance.