Machine Learning for Pest Pathway Prediction: A Cautious Look at What's Possible
The dirty secret of biosecurity is that we mostly find out about pathways after a pest has used them. Brown marmorated stink bug taught us about used vehicles from particular ports. Khapra beetle taught us about a long tail of innocuous-looking household goods. Varroa destructor taught us, again, that bees move with people in ways our surveillance was not built to catch.
The promise of machine learning in this space is straightforward: instead of waiting for the next pathway to reveal itself the hard way, use everything we already know about trade, climate, host availability, and historical interceptions to predict where the highest-risk movement is happening right now.
It’s a real promise. It’s also more complicated than the press releases suggest.
What the models are actually doing
A pathway prediction model typically pulls together a few categories of data.
Trade data — what’s coming into the country, from where, in what volumes, in what packaging. This is reasonably well captured by customs and biosecurity declarations, though completeness varies and the granularity is rougher than people assume.
Pest distribution and biology — where the pest is known to occur, what hosts it prefers, what conditions support establishment. This is built from CABI distribution data, EPPO records, scientific literature, and global pest reporting networks.
Climate and host data — where in Australia conditions and host availability would support establishment. CSIRO and the Bureau of Meteorology have made a lot of this data accessible, and groundwork by researchers at the University of Queensland and ANU has been quietly important.
Historical interception data — what’s been caught at the border, where, in what season, on what commodities. This is the closest thing to ground truth the models have.
The model output, in its useful form, is a ranked list of pathway-pest combinations with estimated probability and consequence weights. That feeds prioritisation of inspection effort, intelligence work, and pre-border partnerships.
Where it’s adding real value
The federal department has been quietly running pathway models for a while now. The earliest production uses were unglamorous — better seasonal alerts for high-risk commodities, better targeting of random inspection effort, refinement of declared-cargo intervention rates. The wins are measured in interception rates per inspection hour, and they’ve been meaningful.
The more interesting work is happening on emerging pathways. E-commerce parcels, second-hand vehicle imports, used machinery, biofouling on vessels arriving from poorly-monitored regions. These are pathways where traditional intelligence is thin and where models that fuse weak signals from many sources can identify risk clusters humans would miss.
Several state agencies are also running their own work — particularly Queensland and NSW, where surveillance for fall armyworm, varroa, and tropical fruit fly has been a multi-year focus. The Department of Agriculture, Fisheries and Forestry has published useful summaries of where the science is up to. Some of the recent work on myrtle rust spread modelling in eastern Australia has demonstrably improved early warning to nursery and revegetation operators.
Where the limitations bite
The biggest limitation is that models are only as good as the worst-quality data feeding them. Trade data has gaps. Pest distribution data is uneven across the world — we know much more about pests in OECD countries than we do about pests in source regions that actually matter. Historical interception data has selection bias built into it: we found what we looked for, on the pathways we were already inspecting.
The result is that models can become very good at predicting yesterday’s pathway risks. They’re much less good at the unprecedented event — the pest that hadn’t been intercepted before, the pathway that hadn’t been used before, the host shift that wasn’t documented in the literature. Black swans are exactly what models trained on historical data struggle with, and the costliest biosecurity events tend to be black swans.
A second limitation is operational. Even when a model outputs a clear signal — say, increased risk of a particular pest on a particular commodity from a particular origin — the resourcing to act is finite. Inspection effort is constrained by people, dock space, and laboratory throughput. A model that tells you to inspect 8% more of incoming containers from one origin only helps if you can actually deploy that inspection capacity. Most of the time you’re reallocating from somewhere else, which means accepting elevated risk in a different pathway. Models don’t solve that tradeoff; they sharpen it.
The third limitation is the harder one to talk about: false confidence. A well-presented model output looks authoritative. It can make decision-makers comfortable with shifting resources away from pathways the model rated low-risk. When the model is wrong — and on a low-probability, high-consequence event it will be wrong — the consequences are paid back in the wrong column. Good biosecurity practice still requires irreducible diversity of inspection effort, even on pathways the model has deprioritised.
The role of outside expertise
A handful of state agencies and several research bodies have brought in outside ML expertise to build their pathway models. The good engagements have a few things in common. The team building the model includes biosecurity practitioners as primary stakeholders, not just consumers. The model’s training data and assumptions are documented in a form that another scientist can replicate. The deployment plan includes ongoing performance monitoring, not just an initial validation.
Several Australian consultancies have done credible work in this space. I’ve seen reasonable engagements from groups including the Team400 AI consultancy and a few others working at the intersection of operational data and government science. The pattern is the same as it is in other domains: the technical work is secondary to the operational integration. A great model that nobody acts on is worth nothing.
What I’d argue for
Machine learning is useful in biosecurity pathway analysis. It’s not a substitute for good surveillance or for trade pre-border work. It’s a tool that lets a thin biosecurity workforce make sharper decisions about where to put scarce attention.
Two things matter more than the modelling itself. First, the underlying data — the trade declarations, the interception records, the pest distribution information — needs ongoing investment. A model trained on poor data will give confident bad answers. Second, the operational culture has to absorb model outputs as one input among many, not as a substitute for biosecurity judgement.
Get those right and the modelling earns its keep. Get them wrong and you’ll have a beautifully presented dashboard and a worse biosecurity outcome than the inspection officers had ten years ago with their experience and a checklist. The technology doesn’t care which.