Twenty-five percent of the world's seaborne oil and twenty percent of its LNG pass through a strait that is 33 kilometres wide at its narrowest point, and when something goes wrong there, the whole world feels it within hours. I wanted to build a system that could see it happening in real time.
The premise of HormuzWatch is this: aggregate AIS vessel tracking, news feeds, and market data for the Strait of Hormuz, run it through an Apache Flink streaming pipeline, and use Claude AI to produce structured intelligence outputs. The frontend is built on React with Mapbox GL for live vessel rendering, the backend on FastAPI and Python 3.12, and the streaming layer on Apache Kafka via Aiven. The full project page has a detailed breakdown of the architecture, including a live Flink DAG screenshot from Ververica Cloud.
What actually happens at the strait
The Strait of Hormuz has been a geopolitical flashpoint for decades, with the IRGC using vessel seizure as a diplomatic instrument repeatedly, as retaliation, as leverage, and as a demonstration of its ability to threaten the global energy supply from a single chokepoint. The incident history that HormuzWatch tracks spans from Iran's 2011 closure threat, which caused tanker war-risk insurance to spike 200% in 48 hours, through the peak of EU sanctions in 2012, when dark AIS events near Qeshm and Larak Islands increased 340% versus baseline as the shadow fleet began to emerge, to the 2019 tanker attacks on the MT Front Altair and Kokuka Courageous, which prompted Lloyd's of London to suspend war-risk cover and moved Brent crude 4% in 24 hours, and through the Houthi escalation in 2024, which shifted compliant tanker routing while driving sanctioned vessel density in Hormuz upward.
Running alongside the overt incidents is a quieter and more systematic operation: hundreds of tankers carrying sanctioned Iranian and Russian crude that operate outside the conventional maritime system, conducting ship-to-ship transfers in international waters under cover of darkness to obscure cargo origins, with transponders either turned off entirely or broadcasting false position data to place the vessel somewhere it is not. Lloyd's List Intelligence recorded a significant uptick in these spoofing incidents in 2025, with the Hormuz region among the most active areas globally.
The 2026 crisis, which began when the strait was effectively closed to merchant traffic following the US-Israeli air campaign against Iran, demonstrated at extreme scale what was always the latent risk: Brent crude surpassed $100 per barrel on March 8 and eventually reached $126, Dubai crude hit a record $166, Gulf Arab states cut combined production by over 10 million barrels per day, and by mid-April more than 600 loaded tankers were stranded inside the Persian Gulf, in what economists have called the largest disruption to global energy supply since the 1970s. HormuzWatch was built for the normal operating environment rather than this scenario, but the signals the system was designed to detect were exactly the early indicators that preceded it.
Why real-time is not optional
The speed at which a strait incident propagates into global markets is not measured in hours: when the Stena Impero was seized in July 2019, oil prices moved within minutes of the first reports, and when tankers in the Gulf of Oman were struck the following month, Brent was up 4% before most analysts had filed their first note. The information advantage window between an event occurring and markets pricing it in is narrow and shrinking, and conventional intelligence pipelines, the kind that ingest news and produce reports on human timescales, close that window from the wrong end.
The same is true for risk assessment. A single dark AIS event near Qeshm Island at 0600 means almost nothing in isolation, but the same event correlated with a Reuters headline about new sanctions enforcement that appeared 14 minutes earlier, a 2% uptick in shipping insurance premiums overnight, and three other vessels in the same grid cell going dark in the preceding 48 hours means something specific. That correlation requires a system watching all of those streams simultaneously and holding state across them, something no human analyst running manual queries can do in time for the information to be actionable.
Why Flink, and why it matters
Most data pipelines are batch, running on a schedule, and for most business intelligence that is entirely sufficient. Recognizing a pattern that forms across independent event streams, with variable and unpredictable gaps between the signals, requires something fundamentally different: a stateful streaming engine that processes events as they arrive and never stops watching.
Apache Flink keeps a continuously updated model of the world in memory, so that when the fourteenth AIS position update for a vessel arrives, the system already knows its last 13 positions, its heading and speed, its flag state, whether its MMSI matches any sanctions list, and what the current risk score is for its grid cell, evaluating every new event in the context of everything it has already seen.
The Flink job ran on Ververica Cloud and had six main operators. The DarkAISDetector flagged vessels whose transponder went silent after broadcasting a position inside the strait's high-risk zones, using configurable silence thresholds calibrated to vessel type and historical behaviour. The SanctionsScreener matched incoming MMSIs against a maintained list of IMO-sanctioned vessels and known shadow fleet identifiers. The NewsAISCorrelator used a sliding time window to match news events mentioning specific vessel names, flag states, or geographic coordinates against recent AIS anomalies. The MultiSignalCorrelator was the most complex piece: a Complex Event Processing rule engine that fired when a configurable combination of detectors triggered within a defined time window, producing a scored intelligence event with a significance value between 0 and 100. The RiskHeatmapAggregator maintained a spatial grid over the strait and updated cell-level risk scores continuously. The MarketDataJoiner enriched outbound events with the current Brent price, relevant shipping indices, and Polymarket contract prices for Iran-related risk events.
Each operator feeds the next, building a composite risk picture: a dark AIS event triggers the sanctions screener, and if the vessel matches the significance score rises; a correlated news event firing within the configured time window raises it further; prior elevated activity in the grid cell raises it further still. When the composite score crosses the threshold, the event goes to Claude for synthesis.
The AI cost problem
Running Claude continuously on a live event stream is expensive if you are not careful, and early tests produced briefings every few minutes, which is far too frequent for anything that is not a genuinely significant event and also adds up quickly at 2025 model costs. The solution was a four-layer control system: a significance scorer that filters out events below a configurable threshold, a cooldown timer that prevents consecutive briefings within a minimum interval, a deduplication layer that suppresses near-identical events, and a hard budget cap that limits daily spend regardless of what else happens.
The result was a daily cost of between $0.02 and $0.15, while maintaining a guarantee that any genuinely significant event would trigger a briefing within two minutes. The significance scorer was the hardest part to calibrate: set the threshold too high and real events get filtered out, too low and you burn budget on noise, and calibrating it required using historical incident data from 2019 to 2024 as ground truth.
What the system actually produces
The intelligence output is grounded in live Flink-processed state: actual vessel positions, actual risk scores, actual market prices at the moment the triggering event fired. A real output from the live system read: risk score 82/100, driven by a dark AIS detection on MMSI 423XXXXXX correlated with a Reuters item about U.S. maximum pressure on Iranian oil exports, Brent up $1.40 on the session, a significance score high enough to trigger a full Claude synthesis, and that output came from what the pipeline observed in the last processing window rather than from anything a language model knew about the strait in general.
The natural language query interface worked the same way, with answers grounded in live state rather than training data: how many vessels went dark in the last 48 hours, where they are relative to the shipping lanes, what the current risk score is for the northern approach to the strait, how today's AIS coverage compares to the 30-day baseline.
Current status
HormuzWatch is dormant, partly because running a 24/7 Flink cluster with live AIS feeds has real infrastructure costs, and partly because the 2026 crisis has made those feeds significantly more expensive and harder to access. The source is on GitHub and the full project page includes architecture diagrams, a live Flink DAG screenshot, and a walkthrough of every pipeline operator. The architecture, stateful multi-signal correlation feeding a cost-controlled AI synthesis layer, is the transferable piece. The strait is just the use case that made it legible.