TAPAS.network | 12 June 2025 | Special Item | Tom van Vuren and Philippe Perret

Releasing the value of transport data

Tom van Vuren and Philippe Perret look at wider use of real-time transport data. The authors refer to a recent DfT report on their estimates of the potential of integrated network digital twins. We include a summary.

Valuing traditional transport data – Tom van Vuren

DATA is the new oil. Remember that? I wasn’t sure at the time if this was meant to say that data was going to be the most valuable commodity in the world, or whether it was going to grease the wheels of an integrated, interconnected transport system and ultimately society.

The quote is accredited to Clive Humby, a British mathematician, who apparently coined it in 2006. And although data is a coveted resource, just as oil was in the 1960s and 70s, it is unlike oil not a finite resource – if anything, we are drowning in it!

But what makes the phrase particularly relevant in a transport context is that, despite its abundance, or probably because of its abundance, data (like oil) is of little value in its raw form. It needs processing to release its value. Has Humby’s quote delivered on its promise? What is the value of data now, almost 20 years later? How can we even estimate that value?

A good place to start is the 2019 PwC report entitled Putting a Value on Data. The report refers to work done by the World Economic Forum (WEF: The value of data, 2017), who estimated that the value of data globally would be $3 trillion by 2020. I haven’t been able to establish if the prediction has come true, but what I did find was that the total amount of data stored in 2025 is almost 200 zettabytes, or 10^21 bytes. By the way – the total world GDP in 2023 was estimated at just over $100 trillion, so in WEF terms, data would have represented 3% of that – somewhat hard to believe.

The WEF document argues how in 2006, the most valuable firms in the world were oil and energy companies but that ten years later the list is dominated by data firms like Alphabet, Apple, Facebook, Amazon and Microsoft. In 2025, almost ten years later again, NVIDIA and TESLA have joined the ranks, and AI will inevitably upset these rankings again in the decade to come – and make the question of the value of data, and who owns is, even more pertinent.

The PwC report identifies three possible approaches to data valuation: from the cost approach (what was the expense in collecting the data), via a market approach (what others are willing to pay) to an income approach (which is the value generated or the costs saved by using the data). Now I don’t want to speculate on the profit margin of survey companies and general data providers, so let’s start estimating the value of data at its market value.

My uneducated guess is that in the UK we spend, as a profession, somewhere between £10 million and £25 million on transport data per year. This is data collected purely for the intended purpose of transport planning. Disagree with me in the comments, if you have a better value (which is important as all further calculations build on this value). I exclude permanent surveys carried out by the Department for Transport, such as the National Travel Survey, or the 8,000 manually collected traffic counts used in their Road Traffic Statistics. I also exclude data that is collected for purely operational reasons, such as ANPR cameras, and big, passively collected data (i.e. not originally intended for transport planning purposes). But I include all transport data that is paid for specifically, including for example, travel patterns obtained from mobile network data, such as through Amey’s CitiLogik product. Is £10 million to £25 million a reasonable starting point?

Moving to the income approach, in other words what is the value generated or the costs saved. Here we can separate out three components:

The direct benefits to the project or policy for which the data was collected, and for which the data are used to build a business case for funding
.The indirect benefits of subsequent projects and policies that can and will be assessed using the same data, including those possible interventions that are not taken forward for funding
.The wider benefits of the data being used more widely in policy development and decision-making

Direct benefits

I don’t think it’s unreasonable to assume a medium to high value for money, with a benefit to cost ratio of around two, for all data collected specifically for a transport project or policy, leading to a minimum market value of annually collected transport planning data around £20-£50 million. And this may be in itself a sufficient return, but such data is often usefully repurposed, for example when embedded in a transport model used for a wider range of originally intended or later projects and policies.

Indirect benefits

Based on my experience with, for example, PRISM in the West Midlands, but also with a nod to National Highways’ Regional Transport Models, a factor of five is probably not an unreasonable estimate for these indirect benefits – in other words, subsequent projects and policies that benefit from the originally collected data (whether they are ultimately progressed or not) - would have a combined value of perhaps twice or up to ten times that of the original project. You can challenge that – in the case of the Lower Thames Crossing, you probably look at the lower end of that range, but for more generic strategic models developed for major conurbations towards the higher end in terms of indirect benefits of the originally collected data. The assumed average factor of five would lead to £40-£250 million wider benefits over the lifetime of the data validity.

Wider benefits

Although its relatively straightforward to assign a value to the use of data embedded in models for subsequent, sometimes unintended projects, not all value is released through the use of data in models. I would argue that actually most value comes from ad-hoc usage of data to inform policy and decision-making, sometimes even by organisations that were not responsible for the original data collection. Think of a local mode share to support a BSIP, local traffic counts to support or oppose a housing development, origins and destinations of city centre trips to inform a parking policy. Let’s double those wider benefits for those more general planning purposes – leading to a total value of transport data collected annually of £80-£500 million, and a mid-point of around £300 million annually, so at least the same order of magnitude as the recent DfT estimates of the potential of integrated network digital twins.

Yes, I admit, these are rough calculations, and based on assumptions that you can easily disagree with. And I am happy for you to share your own estimates in the comments. Why is this value important? For me there are at least two reasons. The first is that, despite the increase in open source data, and particularly passively collected big data (remember those 200 zettabytes?), there remains a need for and value in data specifically collected for transport planning purposes; in many cases actually essential to validate and correct those big data sources that were not collected for transport planning purposes in the first instance. The second is that this data can only release its true value (a factor of around 20 according to my estimates) if that data is shared, made widely available, and used for a wide range of originally unintended purposes. A report for Transport for London (Assessing the value of TfL’s open data and digital partnerships, 2017) estimated the Gross Value Add from using TfL data by tech companies directly and across the supply chain and wider economy as between £12 million and £15 million GVA per annum. This can be and should be repeated elsewhere. And some good examples are emerging, for example those described on this webpage of Local Authority transport data sharing case studies: https://www.gov.uk/guidance/local-authority-transport-data-sharing-case-studies.

We don’t start this journey from scratch. The Department for Transport published its Transport Data Strategy in 2023. Full of good intentions, worth another read, but also in need of the suggested actions being followed up to release that data value.

And the greatest data value may well lie not in its use in transport planning, but in operational improvements. The Road Haulage Association estimates that congestion costs the UK economy more than £30 billion per year, and the National Audit Office estimated that in 2006-07 almost 800,000 incidents caused 14 million minutes of delay to rail journeys costing a minimum of £1 billion in terms of time lost to passengers. This must easily have doubled or trebled since then. Being able to use the annually collected data to reduce these costs by even 1% would lead to additionally released value of more than £300 million, doubling my estimates above.

What is an excellent segue back to data being the new oil, and releasing its full value through careful, dedicated and imaginative processing. Re-using transport data, and particularly big data for operational purposes triggers a whole different set of considerations. The accompanying blog by Philippe Perret delves deeper.

Extending the argument to big data – Philippe Perret

IN HIS associated blog above, Tom van Vuren discusses the value of data, referring to the 2006 Clive Humby quote that data would be the new oil. Tom limits his discussions to traditionally collected data sources. In this article I want to look at two different and associated issues - big data in transport and how to extract value from it, the same way as the value from oil needs to be extracted by processing. First, what exactly is big data in transport? It is an expansive concept that encompasses a variety of data sources. This includes everything from ticket sales and induction loops to mobile phone data and even CCTV feeds. Interestingly, some of these data streams are collected passively or for non-transport purposes, but they nonetheless hold significant value for transport planning when analysed effectively.

The key to unlocking this value lies in processing. Much like oil, raw data needs to be refined to extract actionable insights. The challenge here is not just collecting the data but figuring out how to filter, combine, and analyse it in a way that informs better decisions and solutions. Done right, this process doesn't just help us understand transport systems better; it helps us improve them, making them smarter, more efficient, and more user-focused.

Quite often big data is characterised by the four Vs:

Volume
Variety
Velocity
Veracity

We have covered variety and take volume as a given, so that leaves velocity, or how quickly data is gathered and used, and veracity, which is all about its accuracy and trustworthiness.

Both are critical to the definition of use cases... And they may be more connected than initially thought. Why is that you ask? In real life data can very rarely be directly used… It needs to be stored, cleaned and processed to be used to good effect. Raw data is like raw oil.

The concept of big data is that it keeps being generated, but it is quite ephemeral in nature, if not captured and stored or immediately processed it is lost. And there is a lot of it!

This brings us to the next point, it takes time and effort (cost!) to handle big data, even more so when cleaning (removing worthless data and outliers), inference (applying algorithms on data to produce valuable outputs from point data) and enrichment (either long term analysis of the data itself to create new insights or blending with other datasets) is required to extract value from the data. As an example, mobile network data from BT produces about 25 billion geospatial data points a day. This data needs to be captured live and stored, generating 2TB of data daily, before any post processing is undertaken on it…

What happens when looking at this versus use cases in transport, the income value as described by Tom. Transport can broadly speaking be split into:

Operations
Monitoring
Planning and forecasting
Trending

Operations require near immediate access to data, with minimum latency as close as real time (real time does not exist I would argue) data, think induction loops and other detectors for signal operations.

Even data a few seconds or minutes late diminishes its usefulness enormously hence value. This means that the data coverage and processing must be kept to a minimum to retain minimal latency.

Wider coverage datasets such GPS or mobile network data might be too slow for that, but maybe their use in a fast pace low latency environment have not been yet fully explored. They are slower because of the data volumes handled and processing required to turn information into useful insights through inference, and that lag means less direct relevance to network management impacting perceived value.

Global datasets are powerful for monitoring and planning, especially when enriched. For example, mobile network data can reveal long-term trends like home and work locations. This insight can enhance quicker, more current data, helping to understand trip purposes in real time. Interestingly, slower, long-term analytics can support faster decision-making using the same data.

These large datasets go beyond movement, adding context by combining information, like weblog (about 1.7TB of data generated a day) and mobile network data, to identify EV users and their travel patterns. This is especially valuable as modelling and appraisal now considers societal benefits and equity, and not just cost-benefit ratios. It also helps track global behaviour changes without costly, time-consuming travel surveys. However, processing and aggregating such vast data, often to ensure privacy and simplicity, is complex and typically requires expensive cloud-based solutions.

In a sense, the value of the data, as perceived feels as shown in the figure below.

Real time data has a high value, which then drops quickly as data is seen as coming too late for meaningful intervention and then value gradually increases as latency goes beyond the hour, as it gets enriched and use cases change. But maybe, like oil, we should think about using data wisely. This is why slightly different approaches may be worth considering. Two of these spring to mind.

First the blending of near real time data (the few minutes late version) and modelling, leading to short term predictions, as the delay could be outweighed by additional information from enrichment. Some software providers have started this journey, but adoption is slow.

Second, the reuse of data, and how real time data could be repurposed and blended with longer latency but enriched data. This is where, potentially, the concept of digital twins could play its part. But this would require more than just data storage, it is about processing valuable insights on the fly and collating these in a meaningful way building a picture over time such that most real time, monitoring and trending information is there for the different purposes and users.

The possibilities for big data in transport are immense. From identifying bottlenecks on highways to predicting passenger demand on trains, the potential applications can transform everything from public transit to city planning. The promise? A system that works better not just for planners but for the people who rely on it every day. But this requires a change in mindset, looking at data and its uses in a more holistic way, moving away from a very project based mindset… This returns us neatly to Tom’s accompanying blog, in which he discusses exactly that point: like oil, the true value of data extracted by multiple, and often unintended uses.

DfT research paper highlights £850m benefits of digital twins

THE REPORT on the benefits of digital twins in an integrated transport network was produced for the Department for Transport by a consortium led by Arup and published last autmn (2024) ,

The research paper, titled Integrated network management digital twin: economic benefits analysis, shows that digital twins can benefit the economy, environment, and public.

It defines a Digital Twin as a digital representation or model of a physical system like a road network, vehicle or tunnel, which can be used to influence or control its operation. DTs can also be used to test decisions and how different actions could affect transport in the UK, it says .

In the study, digital twins were used to model systems such as road networks, vehicles, or tunnels to show how they can benefit network management systems

It also outlines the overall benefits of using an integrated network management DT worth approximately £850 million (present value using discounted 2010 prices) and uses a 10-year appraisal period.

Department for Transport’s (DfT) Transport Research and Innovation Board identified integrated network management as the highest priority case. This research was conducted on behalf of DfT a consortium led by Arup, including the Connected Places Catapult and Digital Twin Hub.

The paper identifies that by linking different modes of network management systems together, for example, air and road traffic management connected digital twins can enable seamless integration of various transport modes, optimising journeys across sea, road networks, active travel, bus, tram and rail systems, and air traffic, and between geographical boundaries. This integration can lead to improved situational awareness, reduced journey times, congestion and emissions, and enhanced overall transport efficiency, the paper says.

It also anticipates promoting better efficiencies with these systems, saying digital twins offer the potential for a comprehensive, real-time view of the transport network, allowing for rapid identification of incidents and swift, coordinated responses. This capability enhances safety, minimises disruptions and enables more effective collaboration between government agencies and emergency services during crises.

Another dimension is supporting better responses to incidents or emergencies, enabling data sharing and co-operation with adjacent sectors such as energy, and supporting the growth of innovations including AI.

Ryan Hood, digital highways leader at Arup, said: “This research underscores the potential of digital twins to optimise the performance of our transport networks. By harnessing data and technology to support a collaborative, multi-modal approach, we can create a safer, more efficient, integrated, and sustainable transport system. This technology not only promises significant economic benefits but also paves the way for a more resilient, predictive, and responsive system that can adapt to the evolving needs of our society.”

https://assets.publishing.service.gov.uk/media/6705441fe84ae1fd8592eff9/integrated-network-management-digital-twins-economic-benefits-analysis-summary.pdf

Tom van Vuren is Policy Director at the Transport Planning Society, Head of Digital Transport at Amey and a Visiting Professor at the Institute for Transport Studies at the University of Leeds.
Philippe Perret is Mobility Insights Technical Manager at BT Active intelligence.

d20-20250612

taster

Releasing the value of transport data

Read more articles by Tom van Vuren

Read more articles on TAPAS