Category Archives: open data

TEC and their open data policy

Last year, at Open Belgium 2014, TEC announced they would be doing Open Data. In May, the first BELTAC format was released, and not much later, we had converted it to GTFS (thanks to OpenOV): the facto standard that the world wide open transport community can use. The reasons were simple: we don’t have the means to follow up on everything: just use the data and that should be the end of it.

This year, we’ve had the honour to welcome TEC again as one of the speakers in our Open Belgium 2015 edition. This year it was different, as the keynote was given by Thomas Hermine from NextRide: a start-up that reuses data from the TEC to create a user-friendly application that indicates when the next bus will pass your stop.

As the only transport company in Belgium not having created an app, TEC still hasn’t a plan to do so. Instead, they are announcing the “TEC approved application” label: you can now apply to have your app listed among the official TEC apps.

Today, TEC is releasing data files. They are not easy to interpret. iRail will soon release an API that makes it easy for all developers to build an app without having to write server-side code. Hopefully, a lot more apps will apply for TEC’s label in the near future!

The first TEC approved app label goes to the app NextRide was announced during Open Belgium 2015

Mapping public transport in Belgium

Open Street Map is a crowd sourced map which can be reused by anyone. You can for example look up the Brussels North station, and see how all bus stops, platforms, railway tracks are shown with a very good precision.

Brussels North station

The Brussels North Station in Open Street Map

This is not a coincidence. We have a very active Open Street Map Belgium community (just as iRail, it is part of Open Knowledge Belgium), that is working on integrating existing datasets with Open Street Map, and correcting errors when they see them. Polyglot wrote a very nice article which explains better what the workflow is.

Belgian volunteers add the data of NMBS/SNCB in the maps of Open Street Map

If you would like to start reusing the open transport data within Open Street Map for your project, you can always get in touch with the guys at osm.be. A nice place to start is downloading a dump of entire Open Street Map over here: http://wiki.openstreetmap.org/wiki/Planet.osm

Pieter

E-gov awards Agoria (nl)

Verdient jouw project een e-gov award in 2014?

Agoria ICT organiseert ook dit jaar de e-gov Awards om de meest verdienstelijke informatiseringsprojecten van de publieke overheidsdiensten in de bloemetjes te zetten.
Naast een award voor innovatie, gebruiksvriendelijkheid, rendabiliteit en samenwerking zal er deze editie van de e-gov Awards voor de eerste keer ook een award voor gebruik van open data toegekend worden.

Maar het uiteindelijke doel blijft uiteraard om projecten te belonen die zowel voor de overheid als voor de gebruiker (burger of onderneming) een tijds- en middelenbesparing opleveren.
De administratie moderniseert in de eerste plaats om burgers, bedrijven en ambtenaren een efficië ;nte service aan te bieden.

Agoria wil net als de voorbije jaren alle operationele e-gov projecten uitnodigen om mee te dingen naar de e-gov Awards.

Dit jaar worden er dus zes e-gov Awards toegekend en komen opnieuw alle projecten in aanmerking voor elke award.

Dus doe zeker mee en moedig uw collega’s en/of uw medewerkers zeker aan hun kandidatuur in te dienen voor de e-gov Awards 2014.

De felbegeerde prijzen worden dit jaar officieel uitgereikt tijdens de e-gov gala-avond die doorgaat op donderdag 4 december 2014.

Hoe deelnemen ?

Deelnemen is net als andere jaren heel eenvoudig !
Op www.egovawards.be vindt u het inschrijvingsformulier en alle nuttige inlichtingen zoals het wedstrijdreglement.

Let op : De deadline voor het indienen van uw dossier is vrijdag  3 oktober !

Wij hopen dat de jury u als één van de winnaars op onze Award-avond mag bekronen.

Alvast veel succes !

Opening up transit data

In this TEDxGhent talk, Pieter Colpaert explains his PhD in 3 minutes. It’s at the same time 3 simple reasons why it’s better to open up transit information, rather than each party keeping it to themselves:

  1. We don’t want a separate app for each and every transit agency.
  2. We don’t want the transit agencies themselves to decide what questions we will ask to our smartphone when the data would be shared among transit agencies.
  3. We don’t want only 1 private company to build the only application: there are way too many users for this 1 company to come up with an app that works for everyone.

Opening up the data for (new) companies to enrich their services or products and opening up the data for the Web so that smart clients can do their own reasoning over the data, is what we see as the next big innovation in transport information.

 

From data model to HTTP

When creating a data model, you already inherently decide the questions which are going to be asked. A data model is in most of the cases not created by data experts which cost a lot of money, but mostly by a civil servant that decided to start a new spreadsheet, by an application developer that needed access to certain data or by someone that simply needs an answer to a couple of questions.

An example? Well, here’s a small spreadsheet which has been initiated by Antti Poikola which collects app competitions on top of Open (Government) Data:

The CSV version of this dataset can be found here

Antti decided to choose a pragmatic approach to the data model: we can see a contest name, the link towards the contest, the year(s) in which the contest is held, the country, the city or region, the level, the organization who organizes the event and finally the theme of the contest. The questions that can be answered with this dataset are already visible in the data: which events are organized in a certain year? Which are the events organized around the theme of transport? And so forth.

It gets a bit more complex when we are going to ask combined questions, such as: which are the events that are organized in 2010 around transport? You can quickly answer these by answering the 2 separate questions and taking the intersection, or just doing 2 filter actions in your spreadsheet program, or by using query language such as SQL and doing a SELECT with a WHERE clausule.

select * from CSV-file WHERE year == 2010 && theme == transport

Open Data on the Web

Data as such is nothing more than a collection of facts or statements. These statements can be structured in various ways: hidden in full text documents, structured in tables or structured using graphs. Just like language and conversations, data and data exchange may have a lot of problems such as misinterpretation, misrepresenting reality, not understandable to third parties, and so on.

On the Web we don’t talk to only one other person: e.g., we publish blog posts which we want to be read by an audience that is as large as possible. That’s probably why I chose to write this blog post in English.

Open Data is the name we give to a dataset that is openly licensed. That means everyone is free to use, reuse and redistribute the data. In my work today I focus on an even smaller subset. I focus on Open Data where publishers want to see the data used by an audience of people and machines that is as large as possible. That’s why I will talk about “Open Data on the Web” and not just about data that had to be opened by some law and is available if you browse for it very carefully. “Open Data on the Web” is not something I have invented, the standardization body of the W3C has a working group about it.

In order for me to explain how targeting an audience that is as large as possible with data works, I need to explain a bit more about how I see data. In datasets, we still use human language to express facts. For instance, in the spreadsheet above, you could recognize a fact as a row in the table: it describes one event with several properties. The table is data, because it is a collections of facts. What do I find the most workable form the data can be in? Well, the maximum number of atomic facts that can be extracted from a file.

Data in its most fundamental form

The most concise way you can express a fact using language is using three words: a subject, a predicate and an object. Creating a list of these triples makes you able to represent any dataset in its most fundamental form.

For example, the spreadsheet above could become:

<http://webarchive.nationalarchives.gov.uk/20100402134053/showusabetterway.com/> name "Show us a better way" .
<http://webarchive.nationalarchives.gov.uk/20100402134053/showusabetterway.com/> year 2008.
<http://webarchive.nationalarchives.gov.uk/20100402134053/showusabetterway.com/> country "UK".
# And so forth...

This is the most fundamental way to represent this dataset and it makes studying it a lot easier. For instance, what are the identifiers and words used within this dataset? And how well can be understood by machines that are crawling the Web? We can much quicker see the words and compare facts, without having to worry about the serialization of the data (XML, JSON, CSV…).

We have used this triple structure as the start to quantify the interoperability of datasets published to the Web in a journal called Computer. The article is going to be published in October of 2014 (you can always request a preprint version by sending me an e-mail):

[bibtex file=refs.bib key=colpaert_computer_2014]

In the paper we have added a section about Linked Open Data as the next logical step. When you want to achieve that a machine will understand what “year” and “country” and “UK” and “name” means in your context, the most easy thing to do would be to create a look up service that can show you more information about how to interpret the data. So instead of using words which can be very ambiguous, we are going to use HTTP URIs: third parties can look up more information about the term by using a browser, but also machines are able to request more machine readable information. Furthermore, URIs cannot be ambiguous as there is one party in charge of maintaining the meaning of every URI.

For instance, we have introduced a number of HTTP URIs to describe opening hours on the Web. You can read the paper, visit the website with more explanation or cite it as follows:

[bibtex file=refs.bib key=pc_openinghours]

From data model to HTTP

Now comes a very difficult question: do we need to publish all data using Linked Data techniques and this triple structure? Did the creator of this spreadsheet make the wrong decision and should he have gone for a Linked Data architecture instead?

Well, for the time being: certainly not. It is way too expensive and way too difficult to start doing that, while the return is not high enough. What is the goal of this spreadsheet? To create a collaborative list of app competitions, which will answer questions such as when and what theme. The creator succeeded in this goal using a collaborative spreadsheet.

Yet, there are much more difficult datasets to publish than a spreadsheet. Take for instance all the businesses in Belgium. This is assembled in the Belgian Crossroads Bank for Enterprises. The things people want to use this dataset for has become beyond imagination, and thus the government service has chosen to publish data dumps from time to time, which you can download in a zip archive. This makes the dataset accessible over HTTP, but not queryable over the Web (or HTTP). You can only asks question to the dataset when you have downloaded the dataset and stored it in your own local datastore.

Yet, Paul Hermans and Tom Geudens have created something really interesting as a hobby project: republish the data directly through HTTP, mapping the data to a URI structure. Now, you can get an overview of every company in Belgium at a URI, for instance, the Open Knowledge Foundation Belgium can be found here: http://data.kbodata.be/organisation/0845_419_930#id, and you can as well get a JSON representation of the data about this company.

Now a benefit from the mapping work of Paul and Tom to these triples, is that using Linked Data Fragments, small question can be answered. For instance: “What is the preferred label of id http://data.kbodata.be/organisation/0845_419_930#id“? The answer to this question is available at: http://data.kbodata.be/fragments?subject=http://data.kbodata.be/organisation/0845_419_930%23id&predicate=http://www.w3.org/2004/02/skos/core%23prefLabel.

When we want to solve difficult question on the Web, our programs can divide the difficult question in simple questions, fire them at the right servers on the Web and assemble the response for you. You can check out a demo of this at client.linkeddatafragments.org or you can read the paper:

[bibtex file=refs.bib key=verborgh_iswc_2014]

Make your data discoverable

Finally, whether your data is stored within a PDF file, spreadsheet or opened up using Linked Data Fragments, you still need to advertise your data if you want to maximize the reuse of your data. This is typically done using an Open Data Portal, of which I have outlined the essentials in a paper called “the 5 stars of Open Data Portals“:

[bibtex file=refs.bib key=colpaert20135]

This blog post is part of the deliverables for an Open Data project for the Flemish Government (EWI). If you want help in setting up an Open Data policy, if you want help convincing your colleagues, or if you want to organize an event, contact me at our not-for profit organization called Open Knowledge Belgium. If you want technical help to publish your data to the Web, contact me at Ghent University, we are always eager to have a chat.

 

TEC opens up data

It’s a great day for Open Data in Belgium. After all legal hassle with NMBS/SNCB, we can finally have a peace of mind as the Walloon bus company declares they don’t understand why it’s so difficult. This being said, the TEC has officially opened up its time schedules which you can find on the global data portal:

http://datahub.io/dataset/tec

Our friends from openov have been so kind to create a transformation server from the BLTAC format towards GTFS in around 4 hours. Thank you!

Happy coding :)

Fork the open transport data manifest

Transportation is a major contemporary issue, which has a direct impact on economic strength, environmental sustainability, and social equity. Accordingly, transport data – largely produced and/or gathered by public sector organisations or semi-private entities, quite often decentrally – represents one of the most valuable sources of (Public Sector) Information (PSI, also called ‘Open Data’), a key policy area for many, including the European Commission.

This is the first paragraph of the Open Transport Data Manifest. During the OKFestival in Helsinki, the open transport working group at OKFN organised a PSI sectoral meeting on transport data. The manifest is a summary of the entire day, where all kind of stakeholders from over 15 different countries were invited to help think about informing European travelers as good as possible.

The first slide of the manifest. More on github

The manifesto document was then transformed into an infographic by Miet Claes and Michael Vanderpoorten (both members of iRail). The infographic is also available in slides style. Because we want everyone to be able to help with the manifest, and to reuse this work, we have published the source files on github. We invite you to fork the infographic and use it for your own slides on open transport.

Help sending this infographic to the policy makers who need to see it and let us know on the mailing list.

Pieter

Finally Open Data, for one day

It goes without saying, the NMBS/SNCB (don’t mix them up with Infrabel or NMBS/SNCB Holding) and iRail haven’t been the best friends over the past 4 years. But we do not really worry about it: they are even fighting with Infrabel, the railway operator, over who owns the data*, and even the biggest organizations and research groups in Belgium scrape the data from their website.

Bad news! The 3d of October, the railway people are on strike. As usual with these strikes no one understands exactly why, and if we dare to ask why, the commuters are too selfish.

But, we have good news as well. Thanks to the fact of the strike, we can now publish a dataset of 1 full day. We have created a GTFS (General Transit Feed Specification) which contains all the schedules for October the third. We are accepted through all the tests and we got published at the official GTFS Data Exchange website.

And we have a call for participation! If you are a developer, a graphic designer, an artist, a producer, an author, basically anything, wouldn’t you like to participate in our 3d of October take-over iRail.be contest? Make a website, send us the code, get published on iRail.be on the 3d of October!

* For your interest, transport data is not copyrightable. Therefore, there is no such thing as a transport data owner. The railway companies should realize that they are responsible for getting people from one place to another and the other way around. This includes providing people with the correct data, something SNCB/NMBS has always refused to do.