PublicTransport Engine 0.10 rc2
Add Support for new Service Providers
Sections

XML file structure

To add support for a new service provider you need to create an "accessor info xml". This xml file describes where to download the data and how to parse it. The filename starts with the country code or "international" or "unknown" followed by "_" and a short name of the service provider, e.g. "de_db.xml", "ch_sbb.xml", "sk_atlas.xml", "international_flightstats.xml". Parsing can be done in a separate script.
There is also a nice tool called TimetableMate, which you can download from the kde-look.org site for publictransport. It's a little IDE for creating timetable accessors for the publictransport data engine. It has a GUI to edit the accessors settings and automatically generates an XML file from them (and vice versa). It also features script editing, syntax checking, code-completion (a little bit like in KDevelop but only for publictransport specific stuff), automatic tests, web page viewer, etc.

Here is an overview of the allowed tags in the XML file (required child tags of the accessorInfo tag are highlighted):

Tag Parent Tag Optional?

Description

<accessorInfo> Root Required

This is the root item.

<xml_file_version type="publictransport" version="1.0" /> <accessorInfo> Not used

This is currently not used by the data engine. But to be sure that the xml is parsed correctly you should add this tag. The version is the version of the xml file structure, current version is 1.0.

<name> <accessorInfo> Required

The name of the accessor. If it provides data for international stops it should begin with "International", if it's specific for a country or city it should begin with the name of that country or city. That should be followed by a short url to the service provider.

<author> <accessorInfo> Required

Contains information about the author of this accessor info xml.

<fullname> <author> Required

The full name of the author of this accessor info xml.

<short> <author> (Optional)

A short name for the author of this accessor info xml (eg. the initials).

<email> <author> (Optional)

The email address of the author of this accessor info xml.

<version> <accessorInfo> Required

The version of this accessor info xml, should start with "1.0".

<type> <accessorInfo> Required

Can be either HTML or XML.

<cities> <accessorInfo> (Optional)

A list of cities the service provider has data for (with surrounding <city>-tags).

<city> <cities> (Optional)

A city in the list of cities (<cities>). Can have an attribute "replaceWith", to replace city names with values used by the service provider.

<description> <accessorInfo> Required

A description of the service provider / accessor. You don't need to list the features supported by the accessor here, the feature list is generated automatically.

<url> <accessorInfo> Required

An url to the service provider home page.

<shortUrl> <accessorInfo> Required

A short version of the url, used as link text.

<rawUrls> <accessorInfo> Required

Contains the used "raw urls". A raw url is a string with placeholders that are replaced with values to get a real url.

<departures> <rawUrls> Required

A raw url (in a CDATA tag) to a page containing a departure / arrival list. The following substrings are replaced by current values: {stop} (the stop name), {type} (arr or dep for arrivals or departures), {time} (the time of the first departure / arrival), {maxCount} (maximal number of departures / arrivals).

<journeys> <rawUrls> (Optional)

A raw url (in a CDATA tag) to a page containing a journey list. The following substrings are replaced by current values: {startStop} (the name of the stop where the journey starts), {targetStop} (the name of the stop where the journey ends), {time} (the time of the first journey), {maxCount} (maximal number of journeys).

<stopSuggestions> <rawUrls> (Optional)

A raw url (in a CDATA tag) to a page containing a list of stop suggestions. Normally this tag isn't needed, because the url is the same as the url to the departure list. When the stop name is ambiguous the service provider can show a page containing a list of stop suggestions. You may want to use this tag if you want to parse XML files for departure lists and get the stop suggestions from an HTML page or if there is a special url only for stop suggestions.

<script> <accessorInfo> Required, if no regExps are set

Contains the filename of the script to be used to parse timetable documents. The script must be in the same directory as the XML file. Always use HTML as type when using a script, you can also parse XML files in the script.

<changelog> <accessorInfo> (Optional)

Contains changelog entries for this accessor.

<entry> <changelog> (Optional)

Contains a changelog entry for this accessor. The entry description is read from the contents of the <entry> tag. Attributes "since" (the accessor version where this change was applied) and "released_with" (the publictransport data engine version this accessor was first released with) can be added.

<sessionKey> <accessorInfo> (Optional)

Contains information about usage of session keys, if required by the service provider.

<url> <sessionKey> Required

Contains the URL to a document containing a session key.

<putInto> <sessionKey> (Optional)

The name of a place where to put the session key in requests (currently only "CustomHeader" is supported). May contain a "data" attribute, which is used for "CustomHeader" as name of the custom HTTP header.



Script file structure

Scripts are executed using Kross, which supports JavaScript, Python and Ruby. JavaScript is tested, the other languages may also work. There are functions with special names that get called by the data engine when needed:



Accessor Examples


A Simple Accessor

Here is an example of a simple accessor info xml of an accessor which uses a script to parse data from the service provider:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>

<accessorInfo fileVersion="1.0" version="1.1" type="HTML">

<name lang="en">National Society of French Railways (SNCF)</name>
<name lang="fr">Société Nationale des Chemins de fer Français (SNCF)</name>

<description lang="en">This service provider works for trains and some Buses in France.</description>
<description lang="de">Dieser Service-Provider liefert Ergebnisse für Züge und einige Busse in Frankreich.</description>

<author> <fullname>Friedrich Pülz</fullname> <short>fpuelz</short> <email>fpuelz@gmx.de</email> </author>
<useSeperateCityValue>false</useSeperateCityValue>

<url>http://www.gares-en-mouvement.com/</url>
<shortUrl>www.gares-en-mouvement.com</shortUrl>
<!-- <credit>Société Nationale des Chemins de fer Français (SNCF)</credit> -->

<rawUrls>
<!--     <departures><![CDATA[http://www.gares-en-mouvement.com/infostempsreel-depart-en-1-{stop}-0.html]]></departures> -->
<!--     <departures><![CDATA[http://www.gares-en-mouvement.com/infos_temps_reel.php?gare={stop}&tab=dep&langue=en]]></departures> -->
    <departures><![CDATA[http://www.gares-en-mouvement.com/en/{stop}/horaires-temps-reel/dep/]]></departures>
    
<!--     <stopSuggestions><![CDATA[http://www.gares-en-mouvement.com/include/completion.php?q={stop}&limit=10]]></stopSuggestions> -->
<!--     <stopSuggestions><![CDATA[http://www.gares-en-mouvement.com/accueil-en-1.html]]></stopSuggestions> -->
    <stopSuggestions><![CDATA[http://www.gares-en-mouvement.com/index.php]]></stopSuggestions>
</rawUrls>

<script>fr_gares.js</script>

<changelog>
    <entry since="1.1" releasedWith="0.10">Updated to new website layout, making it pass the unit test.</entry>
	<entry since="1.0" releasedWith="0.6.3">Initial version.</entry>
</changelog>

</accessorInfo>


A Simple Parsing-Script

This is an example of a script used to parse data from the service provider (actually the one used by the XML from the last section).

/** Accessor for gares-en-mouvement.com (france).
* © 2011, Friedrich Pülz */

function usedTimetableInformations() {
    return [ 'TypeOfVehicle', 'Operator', 'Platform', 'Delay', 'StopID' ];
}

function parseTimetable( html ) {
    // Find block of departures
    var pos = html.search( /<table [^>]*?class="tab_horaires_tps_reel"[^>]*>/i );
    if ( pos == -1 ) {
		helper.error("Result table not found!", html);
		return;
	}
    var end = html.indexOf( '</table>', pos + 1 );
    var str = html.substr( pos, end - pos );

    var tbody = helper.extractBlock( str, "<tbody>", "</tbody>" );
    
    // Initialize regular expressions (compile them only once)
    var departuresRegExp = /<tr class="[^"]*?">([\s\S]*?)<\/tr>/ig;
    var columnsRegExp = /<td[^>]*?>([\s\S]*?)<\/td>/ig;
    var typeOfVehicleRegExp = /<img src="[^"]*"\s+alt="([^"]+)"\s*\/>/i;
    var timeRegExp = /(\d{2})<abbr title="(?:Time|heure)">h<\/abbr>(\d{2})/i;
    var delayRegExp = /Delay\s*:\s*(\d+)/i;
    
    // Go through all departure blocks
    while ( (departure = departuresRegExp.exec(str)) ) {
		departure = departure[1];
		
		// Get column contents
		var columns = new Array;
		while ( (column = columnsRegExp.exec(departure)) ) {
			column = column[1];
			columns.push( column );
		}
		if ( columns.length < 6 ) {
			helper.error("Too less columns in a departure row found (" + columns.length + ")!", departure);
			continue; // Too less columns
		}
			
		// Parse time column
		var timeValues = timeRegExp.exec( columns[2] );
		if ( timeValues == null || timeValues.length != 3 ) {
			helper.error("Unexpected string in time column!", columns[2]);
			continue; // Unexpected string in time column
		}
		var hour = timeValues[1];
		var minute = timeValues[2];
		
// <img src="images/tvs/30bleu_clair.gif" alt="TGV"/>
		
		// Parse vehicle type column
		var vehicleTypeValues = typeOfVehicleRegExp.exec( columns[0] );
		if ( vehicleTypeValues == null ) { //|| vehicleTypeValues.length < 3 ) {
			helper.error("Unexpected string in type of vehicle column!", columns[0]);
			continue; // Unexpected string in vehicle type column
		}
		var operator = vehicleTypeValues[1];
		var typeOfVehicle = "ICE"; //vehicleTypeValues[2]; TODO
// "intercity and regional train" => "express"
// "high-speed train" => "highspeed train"
		if ( operator == null ) {
			operator = "";
		}

		// Parse delay column
		var delay = -1;
		if ( (delayArr = delayRegExp.exec(columns[4])) ) {
			delay = parseInt( delayArr[1] );
			if ( delay == NaN )
				delay = -1; // error while parsing
		}
		
		var transportLine = helper.trim( helper.stripTags(columns[1]) );
		var targetString = helper.camelCase( helper.trim(helper.stripTags(columns[3])) );
		var platformString = helper.trim( helper.stripTags(columns[5]) );
		
		// Add departure
		timetableData.clear();
		timetableData.set( 'TransportLine', transportLine );
		timetableData.set( 'TypeOfVehicle', typeOfVehicle );
		timetableData.set( 'Target', targetString );
		timetableData.set( 'DepartureHour', hour );
		timetableData.set( 'DepartureMinute', minute );
		timetableData.set( 'Delay', delay );
		timetableData.set( 'Operator', operator );
		timetableData.set( 'Platform', platformString );
		result.addData( timetableData );
    }
}

function parsePossibleStops( html ) {
    // Find block of stops
    var pos = html.search( /<div id="liste_gare"[^>]*>/i );
    if ( pos == -1 ) {
		helper.error("Stop list element not found!", html);
		return;
	}
    var str = html.substr( pos );
	
    // Initialize regular expressions (compile them only once)
    var stopRegExp = /<li>\s*<a href="[^"]*?\/fr\/([^"]+)\/accueil[^"]*">([^<]+)<\/a>\s*<\/li>/ig;
    
    // Go through all stop options
    while ( (stop = stopRegExp.exec(str)) ) {
		var stopID = stop[1];
		var stopName = stop[2];
		
		// Add stop
		timetableData.clear();
		timetableData.set( 'StopID', stopID );
		timetableData.set( 'StopName', stopName );
		result.addData( timetableData );
    }
    
    return result.hasData();
}