Author Archives: pietercolpaert

Accepted for a session at the Open Belgium conference

We’re accepted for a 90 minute session on Open Transport and iRail at the Open Belgium 2015 conference in Namur!

The conference is in 2.5 months, so we still got some time to set everything up, yet I would like to get some direction: what would you like to discuss? Who do you want me to invite? You can send suggestions to me by e-mail ( or reply on twitter to the tweet announcing this post.



Open Belgium is a yearly conference first organised in 2014. It’s a come together of Open Knowledge experts and enthusiasts throughout the country to discuss the state of play. In 2015, Open Knowledge Belgium is organising this conference in Namur on the 23d of February. You can secure tickets through the website (mind that contributors to a session can get access to the conference for free).

Open Ghent

Open Ghent is an event where we invite everyone to the Lakenhal next to the Belfry in Ghent. Join us in this UNESCO World Heritage protected hall and find out how Ghent, which is also home to the headquarters of Open Knowledge Belgium, has been a pioneer in opening up its data and is still innovating the city up to this day. The location for this event is no coincidence as well. Artoria, a company specialised in historical research and concessionaire of the Belfry was one of the partners of open Summer of code 2014. Together with two #oSoc14 students they created an application for the Belfry in Ghent, based on Open Source Augemented Reality engines and available on GitHub. During this event, Artoria wants to test out the application and gather feedback. But that’s not all. The DataTank which is an open source RESTful data management system, presented by the developers of We Open Data, will present the latest update and the features of the system. Come join us at this free event.


14:00: Introduction on Open Data in Gent

14:25: Short pitch on open Summer of code 2014 and 2015 by Pieter-Jan Pauwels

14:30: Short presentation on how the ‘Belfort app’ became a reality

14:45: Visit to the Belfry aan het Belfort + Feedback on their application

15:30: Presentation of The DataTank and how they want to work with the community in the future.

15:45: Networking drink

Register here:

Open Knowledge Belgium presented

Open Knowledge Belgium is an umbrella organisation for Open Knowledge initiatives in Belgium. Our most important asset are our working groups. Open Knowledge Belgium tries to support these working groups working around a certain open subject in a loose setting by providing them help with submitting projects, bringing them together on a yearly conference, providing them with technical tools to e.g., open up data (, providing them the contacts to talk with ambitious students during summer of code and so forth.

Each of these working groups have a representative in in the general Open Knowledge Belgium board of directors. And we have a full-time community manager.

How are we funded? Find out in this presentation:

This presentation has been given on the 25th of September at the European citizen science conference at the European Commission, as well as at Waag society in Amsterdam before members of the Apps for Europe project.

Launching the Linked Open Transport vocabularies

How far do you live from work?

Did you answer this question in minutes or in kilometers? Many answer this in minutes. Now, imagine how machines would have to get to know the answer to such question for you: it would need a lot of data.

That data is in some cases, e.g., for Amsterdam, already sufficiently available as open data: open street map and the openOV initiative in The Netherlands help. Yet, to achieve this system, we need to do a huge job in integrating data and integrate these datasets on 1 machine. What if we can advance on the state of the art and use Semantic Web/Linked Data technologies to facilitate all this?

This is what I need for my PhD as well. So, we have started creating 4 vocabularies: one for transit feeds or timeschedules, one for categorizing transport datasets, one for road traffic events and one for real-time arrivals and departures of public transport.

One of these vocabularies has now been released: – the Linked GTFS vocabulary. You can help out building these vocabularies at our github repository, or you can just dig in and start using our terms. You can now browse this at our Linked Open Vocabularies project:

iRail at ISWC2014

ISWC is the international conference on the Semantic Web where I’m presenting the recent development of towards a hypermedia API. I’m hoping to gather feedback on our way of working and hope it can set an example for development of Linked Data interfaces in route planning. You can find the slides below:

You can test iRail as a linked data interface by running some command line commands:

#List all stations the Belgian railway company (NMBS) is operating on:
curl -H "accept: application/json"
#show all departures for a certain stop
curl -H "accept: application/json"

E-gov awards Agoria (nl)

Verdient jouw project een e-gov award in 2014?

Agoria ICT organiseert ook dit jaar de e-gov Awards om de meest verdienstelijke informatiseringsprojecten van de publieke overheidsdiensten in de bloemetjes te zetten.
Naast een award voor innovatie, gebruiksvriendelijkheid, rendabiliteit en samenwerking zal er deze editie van de e-gov Awards voor de eerste keer ook een award voor gebruik van open data toegekend worden.

Maar het uiteindelijke doel blijft uiteraard om projecten te belonen die zowel voor de overheid als voor de gebruiker (burger of onderneming) een tijds- en middelenbesparing opleveren.
De administratie moderniseert in de eerste plaats om burgers, bedrijven en ambtenaren een efficië ;nte service aan te bieden.

Agoria wil net als de voorbije jaren alle operationele e-gov projecten uitnodigen om mee te dingen naar de e-gov Awards.

Dit jaar worden er dus zes e-gov Awards toegekend en komen opnieuw alle projecten in aanmerking voor elke award.

Dus doe zeker mee en moedig uw collega’s en/of uw medewerkers zeker aan hun kandidatuur in te dienen voor de e-gov Awards 2014.

De felbegeerde prijzen worden dit jaar officieel uitgereikt tijdens de e-gov gala-avond die doorgaat op donderdag 4 december 2014.

Hoe deelnemen ?

Deelnemen is net als andere jaren heel eenvoudig !
Op vindt u het inschrijvingsformulier en alle nuttige inlichtingen zoals het wedstrijdreglement.

Let op : De deadline voor het indienen van uw dossier is vrijdag  3 oktober !

Wij hopen dat de jury u als één van de winnaars op onze Award-avond mag bekronen.

Alvast veel succes !

Opening up transit data

In this TEDxGhent talk, Pieter Colpaert explains his PhD in 3 minutes. It’s at the same time 3 simple reasons why it’s better to open up transit information, rather than each party keeping it to themselves:

  1. We don’t want a separate app for each and every transit agency.
  2. We don’t want the transit agencies themselves to decide what questions we will ask to our smartphone when the data would be shared among transit agencies.
  3. We don’t want only 1 private company to build the only application: there are way too many users for this 1 company to come up with an app that works for everyone.

Opening up the data for (new) companies to enrich their services or products and opening up the data for the Web so that smart clients can do their own reasoning over the data, is what we see as the next big innovation in transport information.


From data model to HTTP

When creating a data model, you already inherently decide the questions which are going to be asked. A data model is in most of the cases not created by data experts which cost a lot of money, but mostly by a civil servant that decided to start a new spreadsheet, by an application developer that needed access to certain data or by someone that simply needs an answer to a couple of questions.

An example? Well, here’s a small spreadsheet which has been initiated by Antti Poikola which collects app competitions on top of Open (Government) Data:

The CSV version of this dataset can be found here

Antti decided to choose a pragmatic approach to the data model: we can see a contest name, the link towards the contest, the year(s) in which the contest is held, the country, the city or region, the level, the organization who organizes the event and finally the theme of the contest. The questions that can be answered with this dataset are already visible in the data: which events are organized in a certain year? Which are the events organized around the theme of transport? And so forth.

It gets a bit more complex when we are going to ask combined questions, such as: which are the events that are organized in 2010 around transport? You can quickly answer these by answering the 2 separate questions and taking the intersection, or just doing 2 filter actions in your spreadsheet program, or by using query language such as SQL and doing a SELECT with a WHERE clausule.

select * from CSV-file WHERE year == 2010 && theme == transport

Open Data on the Web

Data as such is nothing more than a collection of facts or statements. These statements can be structured in various ways: hidden in full text documents, structured in tables or structured using graphs. Just like language and conversations, data and data exchange may have a lot of problems such as misinterpretation, misrepresenting reality, not understandable to third parties, and so on.

On the Web we don’t talk to only one other person: e.g., we publish blog posts which we want to be read by an audience that is as large as possible. That’s probably why I chose to write this blog post in English.

Open Data is the name we give to a dataset that is openly licensed. That means everyone is free to use, reuse and redistribute the data. In my work today I focus on an even smaller subset. I focus on Open Data where publishers want to see the data used by an audience of people and machines that is as large as possible. That’s why I will talk about “Open Data on the Web” and not just about data that had to be opened by some law and is available if you browse for it very carefully. “Open Data on the Web” is not something I have invented, the standardization body of the W3C has a working group about it.

In order for me to explain how targeting an audience that is as large as possible with data works, I need to explain a bit more about how I see data. In datasets, we still use human language to express facts. For instance, in the spreadsheet above, you could recognize a fact as a row in the table: it describes one event with several properties. The table is data, because it is a collections of facts. What do I find the most workable form the data can be in? Well, the maximum number of atomic facts that can be extracted from a file.

Data in its most fundamental form

The most concise way you can express a fact using language is using three words: a subject, a predicate and an object. Creating a list of these triples makes you able to represent any dataset in its most fundamental form.

For example, the spreadsheet above could become:

<> name "Show us a better way" .
<> year 2008.
<> country "UK".
# And so forth...

This is the most fundamental way to represent this dataset and it makes studying it a lot easier. For instance, what are the identifiers and words used within this dataset? And how well can be understood by machines that are crawling the Web? We can much quicker see the words and compare facts, without having to worry about the serialization of the data (XML, JSON, CSV…).

We have used this triple structure as the start to quantify the interoperability of datasets published to the Web in a journal called Computer. The article is going to be published in October of 2014 (you can always request a preprint version by sending me an e-mail):

[bibtex file=refs.bib key=colpaert_computer_2014]

In the paper we have added a section about Linked Open Data as the next logical step. When you want to achieve that a machine will understand what “year” and “country” and “UK” and “name” means in your context, the most easy thing to do would be to create a look up service that can show you more information about how to interpret the data. So instead of using words which can be very ambiguous, we are going to use HTTP URIs: third parties can look up more information about the term by using a browser, but also machines are able to request more machine readable information. Furthermore, URIs cannot be ambiguous as there is one party in charge of maintaining the meaning of every URI.

For instance, we have introduced a number of HTTP URIs to describe opening hours on the Web. You can read the paper, visit the website with more explanation or cite it as follows:

[bibtex file=refs.bib key=pc_openinghours]

From data model to HTTP

Now comes a very difficult question: do we need to publish all data using Linked Data techniques and this triple structure? Did the creator of this spreadsheet make the wrong decision and should he have gone for a Linked Data architecture instead?

Well, for the time being: certainly not. It is way too expensive and way too difficult to start doing that, while the return is not high enough. What is the goal of this spreadsheet? To create a collaborative list of app competitions, which will answer questions such as when and what theme. The creator succeeded in this goal using a collaborative spreadsheet.

Yet, there are much more difficult datasets to publish than a spreadsheet. Take for instance all the businesses in Belgium. This is assembled in the Belgian Crossroads Bank for Enterprises. The things people want to use this dataset for has become beyond imagination, and thus the government service has chosen to publish data dumps from time to time, which you can download in a zip archive. This makes the dataset accessible over HTTP, but not queryable over the Web (or HTTP). You can only asks question to the dataset when you have downloaded the dataset and stored it in your own local datastore.

Yet, Paul Hermans and Tom Geudens have created something really interesting as a hobby project: republish the data directly through HTTP, mapping the data to a URI structure. Now, you can get an overview of every company in Belgium at a URI, for instance, the Open Knowledge Foundation Belgium can be found here:, and you can as well get a JSON representation of the data about this company.

Now a benefit from the mapping work of Paul and Tom to these triples, is that using Linked Data Fragments, small question can be answered. For instance: “What is the preferred label of id“? The answer to this question is available at:

When we want to solve difficult question on the Web, our programs can divide the difficult question in simple questions, fire them at the right servers on the Web and assemble the response for you. You can check out a demo of this at or you can read the paper:

[bibtex file=refs.bib key=verborgh_iswc_2014]

Make your data discoverable

Finally, whether your data is stored within a PDF file, spreadsheet or opened up using Linked Data Fragments, you still need to advertise your data if you want to maximize the reuse of your data. This is typically done using an Open Data Portal, of which I have outlined the essentials in a paper called “the 5 stars of Open Data Portals“:

[bibtex file=refs.bib key=colpaert20135]

This blog post is part of the deliverables for an Open Data project for the Flemish Government (EWI). If you want help in setting up an Open Data policy, if you want help convincing your colleagues, or if you want to organize an event, contact me at our not-for profit organization called Open Knowledge Belgium. If you want technical help to publish your data to the Web, contact me at Ghent University, we are always eager to have a chat.


Submit your app written on top of the iRail API

The iRail APIs have been around for almost 4 years now. Did you build something from our APIs? We would love to know! Fill it out on our public google drive spreadsheet. You can find the current results over here:

We are looking for new ways to finance the maintenance and further development of a national API for mobility in Belgium. Therefore, we first need some numbers to show that our API does make an impact. Next, we will inform you about further development as we will start streamlining the interfaces into one URI space.

Looking forward to your responses!