Using xmlstarlet to extract DATEX2 locations
The National Transport Information Service (NTIS) provides live information on Highways England’s road network. This includes Motorway Incident Detection and Automatic Signalling (MIDAS), which provides information from around 6000 sites and is around 37GB of live data a day. In this article I describe how I extract the locations of the sites from DATEX2 into a CSV file which can be easily imported.
The NTIS Network and Asset model is approximately updated every fortnight and can be downloaded from their Subscriber Portal.
Un-compressed it is 600 MB of DATEX2 XML files:
-rw-rw-r-- 1 owen owen 386M Oct 5 20:31 NTISModel-PredefinedLocations-2021-10-05-v15.0.xml
-rw-rw-r-- 1 owen owen 27M Oct 5 20:31 NTISModel-VMSTables-2021-10-05-v15.0.xml
-rw-rw-r-- 1 owen owen 148M Oct 5 20:31 NTISModel-MeasurementSites-2021-10-05-v15.0.xml
Each file contains locations for different data sources. The d2lm:feedDescription
of the DATEX2 document describes which locations are included:
<d2lm:value lang="en">Includes: Link shapes (NTIS_Link_Shape_*)</d2lm:value>
<d2lm:value lang="en">Includes: Network Node-to-Node Links (NTIS_Network_Links)</d2lm:value>
<d2lm:value lang="en">Includes: HATRIS Sections (NTIS_HATRIS_Section_*)</d2lm:value>
<d2lm:value lang="en">Includes: Network Nodes (NTIS_Network_Nodes)</d2lm:value>
<d2lm:value lang="en">Includes: Alternate Routes (NTIS_Alternate_Route_*)</d2lm:value>
Each of these different datasets are within a <d2lm:measurementSiteTable/>
XML element.
Each MIDAS update consists of a <d2lm:siteMeasurements/>
element that includes a <d2lm:measurementSiteReference/>
. The id
attribute of <d2lm:measurementSiteReference/>
corresponds to a <d2lm:measurementSiteRecord/>
in the NTISModel-MeasurementSites-2021-10-05-v15.0.xml
.
For example if the MIDAS update contains:
<d2lm:measurementSiteReference targetClass="MeasurementSiteRecord" version="14.12" id="21DDFF56CA1D4021B72013BABC0ACB75"/>
Then the location 21DDFF56CA1D4021B72013BABC0ACB75
found in NTISModel-MeasurementSites-2021-10-05-v15.0.xml
is:
<d2lm:measurementSiteRecord version="15.0" id="21DDFF56CA1D4021B72013BABC0ACB75">
<d2lm:measurementEquipmentReference/>
<d2lm:measurementEquipmentTypeUsed/>
<d2lm:measurementSiteIdentification>M25/7106B</d2lm:measurementSiteIdentification>
<d2lm:measurementSpecificCharacteristics/>
<d2lm:measurementSiteLocation/>
</d2lm:measurementSiteRecord>
The <d2lm:measurementSiteLocation/>
in <d2lm:measurementSiteRecord/>
includes a longitude and latitude coordinate:
<d2lm:measurementSiteLocation xsi:type="d2lm:Point">
<d2lm:locationForDisplay>
<d2lm:latitude>51.4690869860309</d2lm:latitude>
<d2lm:longitude>-0.507936222455313</d2lm:longitude>
</d2lm:locationForDisplay>
<d2lm:pointAlongLinearElement>
<d2lm:linearElement xsi:type="d2lm:LinearElementByCode">
<d2lm:linearElementReferenceModel>NTIS_Network_Links</d2lm:linearElementReferenceModel>
<d2lm:linearElementReferenceModelVersion>15.0</d2lm:linearElementReferenceModelVersion>
<d2lm:linearElementIdentifier>200045820</d2lm:linearElementIdentifier>
</d2lm:linearElement>
<d2lm:distanceAlongLinearElement xsi:type="d2lm:DistanceFromLinearElementStart">
<d2lm:distanceAlong>562</d2lm:distanceAlong>
</d2lm:distanceAlongLinearElement>
</d2lm:pointAlongLinearElement>
</d2lm:measurementSiteLocation>
We know that the MIDAS measurement sites are in the <d2lm:measurementSiteTable version="15.0" id="NTIS_MIDAS_Measurement_Sites">
element.
Listing the id
of every measurementSiteTable
:
$ xmlstarlet sel -T -N d2lm=http://datex2.eu/schema/2/2_0 -t \
-m "//d2lm:measurementSiteTable" \
-v '@id' -n \
NTISModel-MeasurementSites-2021-10-05-v15.0.f.xml
NTIS_TAME_Measurement_Sites
NTIS_MIDAS_Measurement_Sites
NTIS_TMU_Measurement_Sites
Listing the id
of the first 10 measurementSiteRecord
:
$ xmlstarlet sel -T -N d2lm=http://datex2.eu/schema/2/2_0 -t \
-m "//d2lm:measurementSiteTable[@id='NTIS_MIDAS_Measurement_Sites']/d2lm:measurementSiteRecord" \
-v '@id' -n \
NTISModel-MeasurementSites-2021-10-05-v15.0.f.xml | \
head -n 10
BAAE334664FB4C86B7DDC8A62FE05324
742A1048CE6B4AFBA75068CC61101807
1EAA9F6E29904E4FB6FEAF7F23AE0FF5
888DF183CD7F4E20A01BF0DA3C5F3F29
8A69B5B782BE4652A48182D2253CBCF3
CD5716CEEEE745A2A15B53E38AE27C90
4E367C6BD8314E8082A8C2B2D432BAD7
820CB0C74FFF4AC5AB1CC7E8047A76AE
78C668D94FA3477782AB0E110C8F69D1
FFD9A5B1735342499DDF4618DF2E9D62
Extract the longitude and latitude of the locations and export as CSV:
$ xmlstarlet sel -T -N d2lm=http://datex2.eu/schema/2/2_0 \
-t -m "//d2lm:measurementSiteTable[@id='NTIS_MIDAS_Measurement_Sites']/d2lm:measurementSiteRecord" \
-v '@id' -o , \
-v "d2lm:measurementSiteIdentification" -o ',' \
-m "d2lm:measurementSiteLocation/d2lm:locationForDisplay" \
-v "d2lm:latitude" -o ',' \
-v "d2lm:longitude" -n \
NTISModel-MeasurementSites-2021-10-05-v15.0.f.xml | \
head -n 10
BAAE334664FB4C86B7DDC8A62FE05324,M25/4632M,51.3139308901129,-0.330020531095879
742A1048CE6B4AFBA75068CC61101807,M25/4633B,51.3136236566364,-0.3299168939259
1EAA9F6E29904E4FB6FEAF7F23AE0FF5,M25/4577B,51.2851570516136,-0.268955814584181
888DF183CD7F4E20A01BF0DA3C5F3F29,M5/7138B,52.4497699676406,-2.01467866743872
8A69B5B782BE4652A48182D2253CBCF3,M42/6195B,52.3617730140793,-1.94706769596303
CD5716CEEEE745A2A15B53E38AE27C90,M42/6194A,52.3619261486532,-1.94772833610851
4E367C6BD8314E8082A8C2B2D432BAD7,M42/6195K,52.3621142579688,-1.94625960371606
820CB0C74FFF4AC5AB1CC7E8047A76AE,M42/6184A,52.3607801001229,-1.95915434933599
78C668D94FA3477782AB0E110C8F69D1,M25/4440B,51.2578273395404,-0.09683703087205
FFD9A5B1735342499DDF4618DF2E9D62,M25/4262B,51.2923678345806,0.142686623586954