Category Archives: open data

I do not want your Open Data API, I’d rather scrape your website

This is an answer to a question I get too often: why don’t you like APIs for Open Data?

What is not to like about APIs? Today, if you want to be cool, you have an API for third parties to integrate with basically anything. While there is a lot of uncertainty about what an API exactly is, we can agree it is something which third party developers can use in their programming code to do something. APIs exist to talk to your IoT devices, APIs exist to other people’s software routines and libraries, and APIs exist to exchange data between different parties. And truly, I love this idea!

But when publishing Open Data, what people mean by “I want an API” can only be two things:

  1. The developer wants small fragments of the data which can be retrieved in JSON or XML (I know, you probably want JSON, but hey, also government and enterprise people are reading this blog!) instead of having to download a data dump or scraping HTML pages.
  2. The developer is lazy (in most cases this is a good property of a developer) and (s)he wants the data publisher to pay for a free service for their app.

For the first point: great you are advocating for open web standards such as JSON! Yet, what you are advocating for now, is a whole new channel, which requires new funding only to set it up: new servers, new consultants programming HTTP interfaces, new ETLs, etc. Because of observation that these APIs often then become a second inferior channel, some thought leaders started preaching “API first!”. This way, they preach their website itself should use this API to show the data. The core idea is good: document all your resources and work on a decent http URI-strategy. However, why do you still need a separate API? Your HTML pages are also resources part of this http URI-strategy, so you could as well just return JSON on a page using the http Accept header, or annotate your HTML pages with machine readable snippets.

For the second point, I will need a new chapter:

Services vs. data publishing

When you are creating a production ready app, you do not want the government to host e.g., your full text search service. Can you imagine Google relying on the full text search of your government to give you search results? Of course not:

  • you cannot rely on the government to provide you a reasonable uptime for this service that will be harder and harder to keep online if your user base grows
  • there are many different parameters to a full text search query, and if you want to innovate, you need to be in control of the query execution algorithm
  • you cannot ask combined full text search queries over different governmental websites: you can only ask the servers separately, but integrating both result sets will be troublesome.

Other example services than full text search services are things like geo-queries, or exposing a query language such as graphql, sparql or sql, or route planning, or geocoding, or …

So, if you want to use data from the government in your production apps or your next start-up, you will want data access which allows you to replicate the entire dataset on your own machines. That is why we need the government to publish their data licensed under e.g., a CC0 license.

But what does it mean to publish your data? What is the distinction between publishing your data and a data service? At iMinds, we like to visualize this as an axis as follows:

Data publishing vs. an API

A data dump is certainly data publishing, yet there are many drawbacks to publishing a data dump: when publishing data that changes often, having to update this data dump on your server every, let’s say second, is a bit much. That’s why I think a more Web-ish approach would not hurt: small documents (JSON, HTML, whatever) that link together a big Web of knowledge. An interesting way of working towards that is having a resource oriented approach, where you first identify all the resources you have in your dataset using a global identifier (such as a URI). Then, you create documents of data (identified by e.g., a URL) which contain something about these resources. The documents can be structured just like you would structure your website, and links direct you from one document to the other. This way, programmers can write source code that follows links (the idea of hypermedia!) to answer more difficult questions that those answered in only one document. And for the documents themself, they are small enough to contain rapidly changing data.

Examples

Setting the bad example

The Europeana API:  well documented by Ruben Verborgh in his blogpost The Lie of the API.

Setting the good example

Check out schema.org! It is a way to annotate your HTML-pages with rich snippets. This way, the scraping of the website to generate a data dump goes easier, and the entire Web becomes more structured as websites are annotated with similar properties.

Check out Linked Data Fragments. It is a way to publish your data in fragments which makes clients, when asking multiple small questions to download parts of the dataset just in time, still able to ask very complicated questions. This is the true power of the Web: combining resources to solve difficult questions.

Check out the website of the city of Ghent, which provide rich snippets with Linked Data

Check out Linked Connections: it is Linked Data Fragments applied to route planning.

Huh? So what are you doing with api.iRail.be? Is that not a good example?

Indeed: it is not a good example of how to publish Open Data. We also never said it is the goal of api.iRail.be to publish the data: after all, we are not the data owner. We have created a service, available for free for everyone, which enables everyone to do calculate routes on the Belgian railway network and display this in various apps, such as Railer and BeTrains. We do this as a non-profit project that wants to make our transport experience in Belgium better. For the data dumps itself, head to gtfs.irail.be. In the same logic, I applaud any initiative of open data reusers offering free services to hobby developers, but you will hear me complain when a data publisher is spending its time on this instead of raising their own data quality.

 

Always have a seat in the train

With our API’s query logs, with a ground truth by train specialists in Belgium, and with user feedback, we can show you how busy your train is. The only thing lacking to realizing this feature that the Dutch already have for years, is your support! Go to spitsgids.be and get a seat on your next journey.

The start of a new era

Time flies! But we have great news:

We can now officially close the era where iRail has been an advocacy group for Open Transport Data, and instead become something we have always wanted to become: a Living Lab (and/or hackerspace) to work towards a  better transport experience for travelers in Belgium. The next steps are going to entail a better collaboration with players like TreinTramBus, iMinds, the European Passenger Federation and everyone who has can help us reach our goal. A first thing is going to be announced next Monday, the 18th of April! We are quite psyched about this.

TreinTramBus and iRail have already been working together in the past, analyzing the query logs we have published in collaboration with iMinds’ Data Science Lab (my current employer). The research that has been carried out was presented at the WWW2016 conference in Canada, during the USEWOD workshop. The full paper can be downloaded here.

January updates

We are only two weeks in 2016 and it seems like NMBS/SNCB is keeping its promise as a presentation announced they now have an internal innovation project on Open Data. But more interestingly for us, the presentation, geving yesterday at the Digital Agenda Belgium event, announced they were wrong in 2010 sending Yeri a cease and desist letter. While people on twitter responded moderately (“How long is this going to take?”, “How about real-time data?”) we believe this is a huge step forward: it is the first time the NMBS/SNCB acknowledges we were on the right side of history. What iRail is concerned: we would love to start collaborating on a clean slate :)

 

It doesn’t stop there, today, we are featured in De Standaard with a piece on how NMBS should make use of their data in a better way:

http://standaard.be/cnt/dmf20160112_02064462

Train offer vs. Travel Intentions on average on a Thursday and on December 24th

Train offer vs. Travel Intentions on average on a Thursday and on December 24th

Indeed: the query log files of iRail can be used as an indication of travel intention. You can now also reuse the log files yourself:

NMBS/SNCB announces data sharing

Today, the Belgian railway company announces a data sharing program for third party developers. They will start sharing the planned schedules (“static data”). We are cautiously excited that this is the first step towards a real open data policy.

You can request your own 1-on-1 contract via http://www.belgianrail.be/nl/klantendienst/infodiensten-reistools/public-data.aspx. We hope to make our own version, linked to our identifiers, available through http://gtfs.irail.be soon!

[More updates later today]

What does this mean for the end-users?

Companies like Google, Nokia, Microsoft and others can now start reusing the data if they negotiate a one on one contract. Starting today, you will see route planners from third parties work with NMBS/SNCB! Android and Google Maps users will be very excited as they now will also see trains suggested to their destination instead of only buses when selecting public transit.

Why isn’t this open data yet?

Open Data means that there’s no discrimination in who can reuse that data and that there’s no restrictions in in what way it can be used and redistributed. For now, you can access the data after signing a 1 on 1 contract. We think it’s a good thing NMBS/SNCB is testing the water before the law that will oblige NMBS/SNCB to do real open data comes into action in 2016. We look forward working together and iRail is as well in the process of requesting a license. We however want to be able to republish the data once we added our data to the dataset, which now isn’t allowed by default.

Why is iRail (cautiously) happy?

We once were forced to stop building applications using the data of NMBS. The original posts from 2010 can still be found in our archives. Today, the first steps towards Open Data have been taken and we are sure they will not regret it. We will slowly migrate our servers to make use of this official dataset. We look forward to seeing you build awesome new things with this high quality data!

Contact Pieter: +32 486 74 71 22

– The iRail Team

August updates

Hi all,

It’s been since April that we haven’t done another community update. Back then, we announced our migration to new servers, we’ve announced that we would have 2 students at open Summer of code 2015 and that we would start building a GTFS ourselves, as the law would now permit us to do so. NMBS/SNCB will open up their data by the end of this year… but why wait if their data is already on their website?

Launch of GTFS and GTFS-RT for the SNCB

Check out http://gtfs.irail.be! It now contains data from NMBS/SNCB for 2015, including our first realtime feed for the Belgian railway company (huge news for techies)! This feed contains the updates such as train delays and notifications on the entire network. You can discover what’s in this GTFS-RT by checking out our new dashboard http://analytics.irail.be.

2/3d of the iRail team at open summer of code 2015's final event

2/3d of the iRail team at open summer of code 2015’s final event

The project has been developed during open Summer of code 2015 by Brecht Van de Vyvere and Tim Tijssens. A report on their 3 week journey can be read over here: http://open.summerofcode.be/?s=gtfs.

Nice GTFS bla bla… but what does it mean for me?

If you are not technical, you might be delighted with this news: you can now do intermodal route planning across entire Belgium with apps such as CityMapper and Ally.

Furthermore, also webservice creators such as the guys from Navitia.io implemented our feeds in their webservice! As a developer, you can now use their API to get route planning advice from A to B across Europe.

New apps on top of our API

RailerApp

I know, this one was also announced in the April updates, but hey… they now added support for the Apple Watch!

apple watch with NMBS/SNCB info

Trainfo

A new Android app hit the market. Let us know what you think!

Our servers

Moving away from apache, now installing nginx and varnish for cachability. We used to have varnish installed over apache a while back. Then we’ve decided to put HTTPS on all our domains. Now, we’re going to make our servers faster again by installing varnish. This will be done by the lovely people of skyscrapers (who gladly gave me a spot to work when I was in Antwerp last week 
</p>
					</div><!-- .entry-content -->
		
		<footer class= This entry was posted in News & events, open data, Summer of code on by .

Green light for the Belgian federal Open Data strategy

When we first read the results of the 2014 Open Data Index, we said we had big expectations for 2015. We couldn’t be more right: today, the federal ministerial council has, by recommendation of Minister De Croo and Secretary of State Francken, approved an ambitious federal open data strategy. Open by default: an important step to embed Belgium into the digital global ecosystem.
The highlights of what has been decided:

  • State owned companies are included: the same strategy applies to e.g., Proximus, bpost or the Belgian railway company SNCB/NMBS.
  • “Comply or explain”: all datasets and finalised documents have to be opened up by default. When something is kept private or is available under a non-open license, the data owner is obliged to provide an explanation.
  • The default license is CC0 or licenses with no restrictions. This license has proven in the past to be the best open data license. More information on using CC0 for open data here.
  • Data should be provided in a machine-readable fashion. This is great news for app developers, yet it’s even more exciting for developers of machines that automatically discover datasets and are able to reuse datasets without human intervention.
  • All government services will have to appoint an open data champion, which is the contact point for the datasets within that organisation.

Open Knowledge Belgium, the board and its community members, are unanimous: we couldn’t be more excited. The strategy gives us the needed policy guidelines to build further on a more open Belgium.

Filed Away” by Mark Crossfield is licensed under CC BY-SA 2.0

HELP TO KEEP US AFLOAT

The Public Domain Review is a not-for-profit project and we rely on support from our readers to stay afloat. If you like what we do then please do consider making a donation. We welcome all contributions, big or small – everything helps!

Become a Patron




Make a one off Donation





SIGN UP TO THE NEWSLETTER

Sign up to get our free fortnightly newsletter which shall deliver direct to your inbox the latest brand new article and a digest of the most recent collection items. Simply add your details to the form below and click the link you receive via email to confirm your subscription!

Name:
E-mail:

A law for Open Transport Data in Belgium

Today is the day that the minister of the Digital Agenda Alexander De Croo announced to create a law before the summer that will oblige NMBS/SNCB to open up their data.

UPDATE: More info on the PSI directive and this tweet can be found here:

http://decroo.belgium.be/nl/dienstregeling-nmbs-moet-beschikbaar-worden-als-open-data

April updates

Railtime is gone! It was an app which was perceived by many users as an app which offered a better user experience than the official NMBS/SNCB app (an app bought from a German service provider which was adapted for Belgium). The app got shut down in an effort of NMBS/SNCB to streamline its communication channels: “it would be ridiculous to have to maintain 2 apps”.

Best alternatives for RailTime: 2 iRail based apps are mentioned: BeTrains and RailerApp

Best alternatives for RailTime: 2 iRail based apps are mentioned: BeTrains and RailerApp

On this occasion, De Morgen wrote an article. In that article it is not only mentioning the efforts we have been doing, but it also advises its readers two alternatives for RailTime: one being BeTrains (an android app built on top of the iRail API) and the other being RailerApp (an iphone app built on top the iRail API). Couldn’t be more proud! In each case, it caused Railer installs to skyrocket:

Railer app installs skyrocketing after RailTime disappearing

Railer app installs skyrocketing after RailTime disappearing (installs per day)

De Morgen concluded that we need to strive for Open Data. Something we have been asking since 2010. With the press agreeing, minister Galant agreeing (cfr. her policy note 2015 in which she mentions open data) and our API which now handles an est. average of 300.000 requests per day, we hope that the NMBS/SNCB will take steps into opening up the data we value. In each case, we will be keeping a close eye on our mailbox.

With RailTime disappearing, we got a lot of questions whether the API would keep working. We answered these questions with a clear and sound “yes”. Yet, we didn’t take into account that the API endpoint providing information about specific trains used to work on top of the railtime mobile website instead of the NMBS/SNCB’s one (it used not to be able to give this kind of information). Luckily, @brechtvdv saved the day and wrote a new scraper into our code that was left unchanged since 2010. Thank you Brecht!

And fixing the Vehicle info bug wasn’t the only thing @brechtvdv did. He also started working on a The DataTank plug-in for automatically adding GTFS files. This will help start-ups and developers to get started with their own private API on top of data of De Lijn, MIVB/STIB and TEC. The code isn’t perfect yet, so we hope to receive feedback, issues and pull requests from everyone!

One day server load on http://irail.be - monitored on the new servers by SkyScrapers

One day server load on http://irail.be – monitored on the new servers by SkyScrapers

In the meantime, we have also done a server migration together with SkyScrapers (they are part of our community for 3 years now and have ever since provided managed hosting). We are now running the last version of ubuntu, which gives us PHP5.4+! This is a huge relief for a lot of our coders: PHP5.4 introduces nicer syntax which was previously unsupported. Furthermore, a lot of composer packages dropped support for PHP5.3 a while ago. With our shiny new servers in place, I have defined a couple of issues which will make contributing to the API a lot easier and painless.

Gitter

Gitter.im is now a service we use for our real-time collaborations on the entire iRail project

 

In order to be able to work together as a community more efficiently, we’re now on Gitter. It’s an instant messaging tool which integrates with Github. We hope to talk to you there! Oh, and if you really want to use IRC instead, there is a IRC bridge available at https://irc.gitter.im/

In order to fix all these issues, we’re looking for funding to pay for a team during open Summer of code 2015. Summer of code started out as iRail Summer of code back in 2011. It’s where we provide students with the training to take on open source and/or open data projects. This year we would like to have an iRail team. A team costs €6000. Anyone an idea how to raise such funds? If you’re willing to co-fund these students, please do contact me! pieter@iRail.be