Author Archives: pietercolpaert

I do not want your Open Data API, I’d rather scrape your website

This is an answer to a question I get too often: why don’t you like APIs for Open Data?

What is not to like about APIs? Today, if you want to be cool, you have an API for third parties to integrate with basically anything. While there is a lot of uncertainty about what an API exactly is, we can agree it is something which third party developers can use in their programming code to do something. APIs exist to talk to your IoT devices, APIs exist to other people’s software routines and libraries, and APIs exist to exchange data between different parties. And truly, I love this idea!

But when publishing Open Data, what people mean by “I want an API” can only be two things:

  1. The developer wants small fragments of the data which can be retrieved in JSON or XML (I know, you probably want JSON, but hey, also government and enterprise people are reading this blog!) instead of having to download a data dump or scraping HTML pages.
  2. The developer is lazy (in most cases this is a good property of a developer) and (s)he wants the data publisher to pay for a free service for their app.

For the first point: great you are advocating for open web standards such as JSON! Yet, what you are advocating for now, is a whole new channel, which requires new funding only to set it up: new servers, new consultants programming HTTP interfaces, new ETLs, etc. Because of observation that these APIs often then become a second inferior channel, some thought leaders started preaching “API first!”. This way, they preach their website itself should use this API to show the data. The core idea is good: document all your resources and work on a decent http URI-strategy. However, why do you still need a separate API? Your HTML pages are also resources part of this http URI-strategy, so you could as well just return JSON on a page using the http Accept header, or annotate your HTML pages with machine readable snippets.

For the second point, I will need a new chapter:

Services vs. data publishing

When you are creating a production ready app, you do not want the government to host e.g., your full text search service. Can you imagine Google relying on the full text search of your government to give you search results? Of course not:

  • you cannot rely on the government to provide you a reasonable uptime for this service that will be harder and harder to keep online if your user base grows
  • there are many different parameters to a full text search query, and if you want to innovate, you need to be in control of the query execution algorithm
  • you cannot ask combined full text search queries over different governmental websites: you can only ask the servers separately, but integrating both result sets will be troublesome.

Other example services than full text search services are things like geo-queries, or exposing a query language such as graphql, sparql or sql, or route planning, or geocoding, or …

So, if you want to use data from the government in your production apps or your next start-up, you will want data access which allows you to replicate the entire dataset on your own machines. That is why we need the government to publish their data licensed under e.g., a CC0 license.

But what does it mean to publish your data? What is the distinction between publishing your data and a data service? At iMinds, we like to visualize this as an axis as follows:

Data publishing vs. an API

A data dump is certainly data publishing, yet there are many drawbacks to publishing a data dump: when publishing data that changes often, having to update this data dump on your server every, let’s say second, is a bit much. That’s why I think a more Web-ish approach would not hurt: small documents (JSON, HTML, whatever) that link together a big Web of knowledge. An interesting way of working towards that is having a resource oriented approach, where you first identify all the resources you have in your dataset using a global identifier (such as a URI). Then, you create documents of data (identified by e.g., a URL) which contain something about these resources. The documents can be structured just like you would structure your website, and links direct you from one document to the other. This way, programmers can write source code that follows links (the idea of hypermedia!) to answer more difficult questions that those answered in only one document. And for the documents themself, they are small enough to contain rapidly changing data.

Examples

Setting the bad example

The Europeana API:  well documented by Ruben Verborgh in his blogpost The Lie of the API.

Setting the good example

Check out schema.org! It is a way to annotate your HTML-pages with rich snippets. This way, the scraping of the website to generate a data dump goes easier, and the entire Web becomes more structured as websites are annotated with similar properties.

Check out Linked Data Fragments. It is a way to publish your data in fragments which makes clients, when asking multiple small questions to download parts of the dataset just in time, still able to ask very complicated questions. This is the true power of the Web: combining resources to solve difficult questions.

Check out the website of the city of Ghent, which provide rich snippets with Linked Data

Check out Linked Connections: it is Linked Data Fragments applied to route planning.

Huh? So what are you doing with api.iRail.be? Is that not a good example?

Indeed: it is not a good example of how to publish Open Data. We also never said it is the goal of api.iRail.be to publish the data: after all, we are not the data owner. We have created a service, available for free for everyone, which enables everyone to do calculate routes on the Belgian railway network and display this in various apps, such as Railer and BeTrains. We do this as a non-profit project that wants to make our transport experience in Belgium better. For the data dumps itself, head to gtfs.irail.be. In the same logic, I applaud any initiative of open data reusers offering free services to hobby developers, but you will hear me complain when a data publisher is spending its time on this instead of raising their own data quality.

 

Always have a seat in the train

With our API’s query logs, with a ground truth by train specialists in Belgium, and with user feedback, we can show you how busy your train is. The only thing lacking to realizing this feature that the Dutch already have for years, is your support! Go to spitsgids.be and get a seat on your next journey.

The start of a new era

Time flies! But we have great news:

We can now officially close the era where iRail has been an advocacy group for Open Transport Data, and instead become something we have always wanted to become: a Living Lab (and/or hackerspace) to work towards a  better transport experience for travelers in Belgium. The next steps are going to entail a better collaboration with players like TreinTramBus, iMinds, the European Passenger Federation and everyone who has can help us reach our goal. A first thing is going to be announced next Monday, the 18th of April! We are quite psyched about this.

TreinTramBus and iRail have already been working together in the past, analyzing the query logs we have published in collaboration with iMinds’ Data Science Lab (my current employer). The research that has been carried out was presented at the WWW2016 conference in Canada, during the USEWOD workshop. The full paper can be downloaded here.

January updates

We are only two weeks in 2016 and it seems like NMBS/SNCB is keeping its promise as a presentation announced they now have an internal innovation project on Open Data. But more interestingly for us, the presentation, geving yesterday at the Digital Agenda Belgium event, announced they were wrong in 2010 sending Yeri a cease and desist letter. While people on twitter responded moderately (“How long is this going to take?”, “How about real-time data?”) we believe this is a huge step forward: it is the first time the NMBS/SNCB acknowledges we were on the right side of history. What iRail is concerned: we would love to start collaborating on a clean slate :)

 

It doesn’t stop there, today, we are featured in De Standaard with a piece on how NMBS should make use of their data in a better way:

http://standaard.be/cnt/dmf20160112_02064462

Train offer vs. Travel Intentions on average on a Thursday and on December 24th

Train offer vs. Travel Intentions on average on a Thursday and on December 24th

Indeed: the query log files of iRail can be used as an indication of travel intention. You can now also reuse the log files yourself:

TrainTracker

This is a guest blog post by Pepijn Mores, a student at HELMo Saint-Marie. In this blog post he shows how easy it is to create an app to fulfill his own transportation needs, which we thought is very interesting: it is not always the big companies that use the data from the SNCB. Would you like to write a blog post here yourself? Contact us!

For the final project of my C# course at HELMo Saint-Marie, I had the opportunity to choose my very own topic. Only two requirements needed to be met: it had to be developed in C# (obviously) and it had to involve a technology we didn’t study in class. The amount of possibilities was enormous , therefore I decided to search for a subject that would actually be helpful to me in some way. Because I return home every weekend and the train takes about 1,5h, I came up with the idea to make an application for my PC that would look up the train tables and check if I had enough time for a sandwich at Panos at one of the changes. It was the perfect solution for me to look up the trains when I was attending my last course on Friday afternoon, so that I could plan my trip home whilst sitting in class. So that’s where the API of iRail came in.

traintracker1

 

I studied the structure of the XML file the API returned and noticed the large amount of information it contained. I expected it to be just a list of stations with arrival and departure times, but instead I was surprised by information like the exact coordinates of the train stations and the code of the vehicle. I was stunned by the potential this data had. But because I had to meet a tight deadline and exams were coming up, I decided to stick to my original idea.

traintracker2

The transformation of the data of the XML into my own defined models was done using the Language Integrated Query (LINQ) feature in Visual Studio. After the transformation, the user can view the data in Windows Forms and access the connections details (with the note if there is time for a little snack).

I designed 3 simple views: one setup view, one view with a list of all connections and one detailed view. The setup view just takes the basic details needed to perform the request, the list view gives the user an overview over the coming connections and the detail view gives the user the exact hour and platform of departure/arrival.
The main thing I learned by developing this little application is not how much fun it is to work with LINQ or C#, nor did I learn to design a beautiful user interface (I am sorry if the bright red hurts your eyes). But I did learn that data on (public) transportation should be widely available and implemented in projects. Thanks to iRail, I recognised the opportunities and challenges that are present on public transport and open data in Belgium. Data on public transport carries a huge potential and inspired me to work on another train-related subject in the future (GTFS, yes I am looking at you).

traintracker3

NMBS/SNCB announces data sharing

Today, the Belgian railway company announces a data sharing program for third party developers. They will start sharing the planned schedules (“static data”). We are cautiously excited that this is the first step towards a real open data policy.

You can request your own 1-on-1 contract via http://www.belgianrail.be/nl/klantendienst/infodiensten-reistools/public-data.aspx. We hope to make our own version, linked to our identifiers, available through http://gtfs.irail.be soon!

[More updates later today]

What does this mean for the end-users?

Companies like Google, Nokia, Microsoft and others can now start reusing the data if they negotiate a one on one contract. Starting today, you will see route planners from third parties work with NMBS/SNCB! Android and Google Maps users will be very excited as they now will also see trains suggested to their destination instead of only buses when selecting public transit.

Why isn’t this open data yet?

Open Data means that there’s no discrimination in who can reuse that data and that there’s no restrictions in in what way it can be used and redistributed. For now, you can access the data after signing a 1 on 1 contract. We think it’s a good thing NMBS/SNCB is testing the water before the law that will oblige NMBS/SNCB to do real open data comes into action in 2016. We look forward working together and iRail is as well in the process of requesting a license. We however want to be able to republish the data once we added our data to the dataset, which now isn’t allowed by default.

Why is iRail (cautiously) happy?

We once were forced to stop building applications using the data of NMBS. The original posts from 2010 can still be found in our archives. Today, the first steps towards Open Data have been taken and we are sure they will not regret it. We will slowly migrate our servers to make use of this official dataset. We look forward to seeing you build awesome new things with this high quality data!

Contact Pieter: +32 486 74 71 22

– The iRail Team

Linked Connections: demo paper accepted at ISWC2015

I will host a demo at ISWC2015 on Linked Connections:

Ever since public transit agencies have found their way to the Web, they inform travelers using route planning software made available on their website. These travelers also need to be informed about other modes of transport, for which they have to consult other websites, or for which they have to ask the transit agency’s server maintainer to implement new functionalities. In this demo, we introduce an affordable publishing method for transit data, called Linked Connections, that can be used for intermodal route planning, by allowing user agents to execute the route planning algorithm. We publish paged documents containing a stream of hops between transit stops sorted by departure time. Using these documents, clients are able to perform intermodal route planning in a reasonable time. Furthermore, such clients are fully in charge of the algorithm, and can now also route in different ways by integrating datasets of a user’s choice. When visiting our demo, conference attendees will be able to calculate intermodal routes by querying the Web of data using their phone’s browser, without expensive server infrastructure.

Read the full text

[bibtex file=refs.bib key=colpaert_iswc_2015]

August updates

Hi all,

It’s been since April that we haven’t done another community update. Back then, we announced our migration to new servers, we’ve announced that we would have 2 students at open Summer of code 2015 and that we would start building a GTFS ourselves, as the law would now permit us to do so. NMBS/SNCB will open up their data by the end of this year… but why wait if their data is already on their website?

Launch of GTFS and GTFS-RT for the SNCB

Check out http://gtfs.irail.be! It now contains data from NMBS/SNCB for 2015, including our first realtime feed for the Belgian railway company (huge news for techies)! This feed contains the updates such as train delays and notifications on the entire network. You can discover what’s in this GTFS-RT by checking out our new dashboard http://analytics.irail.be.

2/3d of the iRail team at open summer of code 2015's final event

2/3d of the iRail team at open summer of code 2015’s final event

The project has been developed during open Summer of code 2015 by Brecht Van de Vyvere and Tim Tijssens. A report on their 3 week journey can be read over here: http://open.summerofcode.be/?s=gtfs.

Nice GTFS bla bla… but what does it mean for me?

If you are not technical, you might be delighted with this news: you can now do intermodal route planning across entire Belgium with apps such as CityMapper and Ally.

Furthermore, also webservice creators such as the guys from Navitia.io implemented our feeds in their webservice! As a developer, you can now use their API to get route planning advice from A to B across Europe.

New apps on top of our API

RailerApp

I know, this one was also announced in the April updates, but hey… they now added support for the Apple Watch!

apple watch with NMBS/SNCB info

Trainfo

A new Android app hit the market. Let us know what you think!

Our servers

Moving away from apache, now installing nginx and varnish for cachability. We used to have varnish installed over apache a while back. Then we’ve decided to put HTTPS on all our domains. Now, we’re going to make our servers faster again by installing varnish. This will be done by the lovely people of skyscrapers (who gladly gave me a spot to work when I was in Antwerp last week 
</p>
					</div><!-- .entry-content -->
		
		<footer class= This entry was posted in News & events, open data, Summer of code on by .

Green light for the Belgian federal Open Data strategy

When we first read the results of the 2014 Open Data Index, we said we had big expectations for 2015. We couldn’t be more right: today, the federal ministerial council has, by recommendation of Minister De Croo and Secretary of State Francken, approved an ambitious federal open data strategy. Open by default: an important step to embed Belgium into the digital global ecosystem.
The highlights of what has been decided:

  • State owned companies are included: the same strategy applies to e.g., Proximus, bpost or the Belgian railway company SNCB/NMBS.
  • “Comply or explain”: all datasets and finalised documents have to be opened up by default. When something is kept private or is available under a non-open license, the data owner is obliged to provide an explanation.
  • The default license is CC0 or licenses with no restrictions. This license has proven in the past to be the best open data license. More information on using CC0 for open data here.
  • Data should be provided in a machine-readable fashion. This is great news for app developers, yet it’s even more exciting for developers of machines that automatically discover datasets and are able to reuse datasets without human intervention.
  • All government services will have to appoint an open data champion, which is the contact point for the datasets within that organisation.

Open Knowledge Belgium, the board and its community members, are unanimous: we couldn’t be more excited. The strategy gives us the needed policy guidelines to build further on a more open Belgium.

Filed Away” by Mark Crossfield is licensed under CC BY-SA 2.0

HELP TO KEEP US AFLOAT

The Public Domain Review is a not-for-profit project and we rely on support from our readers to stay afloat. If you like what we do then please do consider making a donation. We welcome all contributions, big or small – everything helps!

Become a Patron




Make a one off Donation





SIGN UP TO THE NEWSLETTER

Sign up to get our free fortnightly newsletter which shall deliver direct to your inbox the latest brand new article and a digest of the most recent collection items. Simply add your details to the form below and click the link you receive via email to confirm your subscription!

Name:
E-mail:

Open & Agile Smart Cities meeting

I’ve presented Open Knowledge Belgium at the Open & Agile Smart Cities meeting in Brussels today. The meeting was more focus on Open Data than other kinds of Open. Here are the slides:

If you give a presentation on Open Knowledge Belgium yourself, feel free to reuse the slides.

Pieter


HELP TO KEEP US AFLOAT

The Public Domain Review is a not-for-profit project and we rely on support from our readers to stay afloat. If you like what we do then please do consider making a donation. We welcome all contributions, big or small – everything helps!

Become a Patron




Make a one off Donation





SIGN UP TO THE NEWSLETTER

Sign up to get our free fortnightly newsletter which shall deliver direct to your inbox the latest brand new article and a digest of the most recent collection items. Simply add your details to the form below and click the link you receive via email to confirm your subscription!

Name:
E-mail: