Jeroens blog

Wikibase DataModel 1.0

I’m happy to announce the 1.0 release of Wikibase DataModel. Wikibase DataModel is the canonical PHP implementation of the Data Model at the heart of the Wikibase software.

This is a big release which has been some time in the making, even though many additions have been split of and included in previous releases. The highlights are as follows:

Removal of the (de)serialization code

The entities and value objects in Wikibase DataModel used to have toArray and newFromArray methods. This caused several problems, such as having a pile of static code, depending on configuration (which was done via global state) and adding support for an arbitrary array format to the responsibilities of the objects. This has been fully removed, and can now be done via the dedicated serialization components (Public format, internal format) which where released some time back.

In earlier versions of Wikibase DataModel, the Item and Property classes contained the array representation internally rather than fields for the value objects items and properties contain. While this was not visible via the getters and setters, which dealt with those value objects, it was exposed in the constructor. As of DataModel 1.0, the constructors take the value objects rather than the array representation.

Deprecation of Entity

Type hinting against the Entity class has been deprecated. This was announced on the Wikidata tech list some time back. While the class is still there, most methods it defines have been deprecated, and some have been removed. A new EntityDocument interface has been introduced in version 0.8.2, which can be used instead. As part of the cleanup here, Item has been made to only accept statements, rather than all claims, as it wrongly did before.


Many more changes and additions where made. You can view the full list of changes affecting users of the component in the release notes.

Posted in Programming, Software Tagged with: , , , , , , ,

SoCraTes 2014


Last week I attended SoCraTes 2014, the 4th International Software Craftsmanship and Testing Conference in Germany.

Since this was the first time I went there, I did not really know what to expect, and was slightly apprehensive. Turns out there was no need for that, the conference was the most fun and interesting I’ve been to recently, and definitely the most motivating as well.

What made it so great? Probably the nicest aspect of the conference where the people attending. Basically everyone there was passionate about what they were doing, interested in learning more, open minded, and respectful of others. This combined with the schedule being purely composed out of sessions people proposed at the start of the day made the whole atmosphere very different from that of your typical commercial conference. Apart from attending sessions on various topics, I also did some pair programming and played my first set of beachvolleyball games at a software conference.

I’m definitely going back next year!

Posted in Events, Programming Tagged with: , , , ,

Doctrine QueryBuilder table alias

The Doctrine project, best known for its Object Relational Mapper, also includes a database abstraction layer, used by the ORM. This abstraction layer is called DBAL, for DataBase Abstraction Layer.

Quickly after I started using DBAL in some Wikibase components, I got annoyed at how single table single table selects needed to be created. The QueryBuilder has a “from” method, in which one needs to specify the table name. The annoying bit is that you also had to specify a table alias, and then use this alias in the select and where calls as well.


In an hour of free time and several commits I made the parameter optional, enabling one to do the same without having to specify the alias:


This change will be part of the upcoming Doctrine DBAL 2.5.

Posted in Programming, Software Tagged with: , , , ,

Semantic MediaWiki 2.0 RC3

I am happy to announce the third release candidate for Semantic MediaWiki 2.0 is now available.

Semantic MediaWiki 2.0 is the next big release, which brings new features and many enhancements. Most notably vastly improved SPARQL store support, including a brand new connector for Jena Fuseki.

The target for the actual 2.0 release is August 3rd. This release candidate is meant to gather feedback and to provide you with a peak at 2.0 already. If you find any issues, please report them on our issue tracker.

Upgrading instructions

If you are using SMW via Composer, update the version in your composer.json to “~2.0@rc” and run “composer update”. If you where running dev versions of the 1.9 series using “~1.9@dev” or similar, switch to “~2.0@dev”. Note that several extensions to SMW, such as Semantic Maps and Semantic Result Formats, do not yet have any stable release which is installable together with SMW 2.x. If you are also running those, you will likely have to switch them to use a development version.

You can also download the SMW 2.0 RC3 tarball.

More detailed upgrading instructions will be made available for the 2.0 release.

Posted in News, Software Tagged with: , , , , ,

Some fun with iterators

Sometimes you need to loop over a big pile of stuff and execute an action for each item. In the Wikibase software, this for instance occurs when we want to rebuild or refresh a part of our secondary persistence, or when we want to create an export of sorts.

Historically we’ve created CLI scrips that build and executed calls to the database to get the data. As we’ve been paying more attention to separating concerns, such scripts have evolved into fetching the data via some persistence service. That avoids binding to a specific implementation to some extend. However, it still leads to the script knowing about the particular persistence service. In other words, it might not know the database layout, or that MySQL is used, it still things via the signature of the interface. And it’s entirely possible you want to use the code with a different source of the thing being iterated over that is in a format which the persistence interface is not suitable for.

All the code doing the iteration and invocation of the task needs to know is that there is a collection of a particular type it can loop over. This is what the Iterator interface is for. If you have the iteration code use an Iterator, you can implement and test most of your code without having the fetching part in place. You can simply feed in an ArrayIterator. This also demonstrates the script no longer knows if the data is already there or if (part of it) still needs to be retrieved.


When iterating over a big or expensive to retrieve set of data, one often wants to apply batching. Having to create an iterator every time is annoying though, and putting the iteration logic together with the actual retrieval code is not very nice. After having done this a few times, I realized that part could be abstracted out, and created a tiny new library: BatchingIterator. You can read how to use it.

Posted in Programming Tagged with: , , , , , , ,

Component design

This week I gave a presentation titled “Component design: Getting cohesion and coupling right” at Wikimedia Deutschland.


Components are a level of organization, in between classes and layers. They are an
important mechanism in avoiding monolithic designs and big balls of mud. While
everyone can recognize a component when they see one, many people are unaware of
the basic principles that guide good component design.

This presentation is aimed at developers. It is suitable both for people new to the field
and those with many years of experience.

The topics covered include:

* What is a component?
* Which things go together?
* How do components relate to each other?
* How are common problems avoided?

You can view the slides, or look at the source.

slides-1-benefits slides-2-questions slides-3-adp slides-4-cat

Posted in Programming Tagged with: , , , , , , , ,

Empower people around the world with microloans

In 2011, I created on account on, a website that facilitates microloans.

The basic idea being that you lend a small amount, typically 25 USD. The loans are made to poor people, that want to borrow money so they can invest in income sources. For instance new farm animals, stock for their shop, or machinery. That’s a lot more effective then giving someone some money that they use to buy food, without attempting to change their situation. What’s more, you get your money back most of the time, since they are loans. So once you got it back, you can make a new loan.

My first loans where on October 15, when I put in 100 USD. Not long after, I figured I had quite some money on my bank account that could just as well be put to good use. So I put in another 6000 USD. Since then I’ve made 1034 loans, to people in 56 different countries, with a total of 26000 USD lend.


I’ve lost 136.48 USD over all these loans, which I will never get back. In addition, the money has lost value due to inflation. Still, that is several hundred dollars that helped over a thousand people.

If that’s not enough to convince you that you should do the same, consider this entirely selfish reasoning. It’s good to split up your wealth, so you don’t end up with problems if something crashes. You can have money in different currencies, though that’s all still fiat. You can also put things into precious metals, though that comes with its own hassle. You can put money in stocks and whatnot, though that requires knowledge and can be quite risky. Then there are currencies, though that is even more risky. You can have money at multiple banks, in case one fails. Nowadays interest rates at most banks are below inflation rate, so you lose money, while supporting the bankers and risk losing it all when they fuck up. While a bank can fail, I cannot think of many events that would make me lose the money in these microloans, since it’s incredibly distributed. So while there is some loss, I’m more confident I will own (most of) this money in a year, then that on any of my bank accounts.

So, go ahead, create an account, and start lending.


Posted in Uncategorized Tagged with: ,

StarCarft arcade

These are replays of games on some of my favourite StarCarft 2 arcade games.


Hero battle, 3v3 (or [1-3]v[1-3]), ~30 min, ~90 APM

Mineralz and the THING

Survival (against human player), 7v1ish, ~40min

Desert Strike HotS

Tug of war, 3v3 (or [1-3]v[1-3]), ~30 min, ~10 APM

Tya’s Zerg Defence

Survival, 3 player, ~30 min

Star Battle

Hero battle, 6v6 (or [1-6]v[1-6]), ~40 min, ~90 APM

Posted in Gaming Tagged with: , , ,

Wikibase and Doctrine DBAL


When I started writing this blog post, I realized some introduction to the query components was first due. You can find it in my last blog post: The Wikidata phase3 software components.

In this post I described how the SQLStore uses a database abstraction layer to not bind itself to a particular relational database. The Wikibase software as a whole is already using a database abstraction layer, namely the one MediaWiki provides. Even though the main Wikibase applications, Wikibase Client and Wikibase Repo, depend on MediaWiki, the SQLStore does not. That rules out direct usage of the MediaWiki database abstraction layer.

Nevertheless, the MediaWiki database abstraction layer has served us reasonably well, and the team is familiar with it. Furthermore, using it does not introduce a new dependency that WMF operations might get mad about. So what we did was create a new set of interfaces for database interaction, much in line with those of the MediaWiki database abstraction later, though without a bunch of design issues the later suffers from. We then created thin implementations of these interfaces that delegated to the MediaWiki database abstraction layer. This inversion of control made it possible to use the MW abstraction layer in SQLStore, without having SQLStore know about MediaWiki.

These interfaces where put in a new component called Wikibase Database. I covered the creation of Wikibase Database in an earlier blog post.

The SQLStore does not have a fully fixed schema. One can configure which types of data are supported, and generally each type of data has its own table. Furthermore one can provide additional types of data that the core SQLStore does not support. Hence having the tables build with manually constructed SQL in sql files as done in MediaWiki does not seem convenient at all. The Semantic MediaWiki deals with this problem by dynamically generating a declarative definition of the schema and then translating that into SQL. We went with the same approach for the SQLStore, though took a lot of care to not repeat the spaghetti implementation approach of SMW.

This means we needed some way to represent tables, columns, indexes, etc in PHP. Furthermore, we’d need to be able to turn this PHP representation into SQL compatible with the used database. Unfortunately the MediaWiki database abstraction later can do neither of those.

What we ended up doing was creating schema representation objects and adding additional interfaces to Wikibase Database. And than creating implementations of those for MySQL and SQLite. Needless to say, writing such SQL generation code is rather tedious and quite hard to test. We knew that before hand, and spend quite some time looking around for existing solutions we’d be able to delegate to. Unfortunately none where found.

Several months later I somehow ended up on the architecture page of Doctrine DBAL. After reading through the basic introduction there, I thought “oh wow, this sound very similar to what we did in Wikibase Database”. So I read through the more detailed docs, and slowly came to the realization that DBAL is exactly what we had been looking for before. Making the facepalm even worse, I actually read through the basic Doctrine docs and quickly looked at its source when doing the initial research. Base on that I had concluded that it came with all this not needed ORM stuff, not realizing Doctrine itself was build on an independent DBAL.


I’ve now migrated the SQLStore to use Doctrine DBAL rather than Wikibase Database. The similarities between the interfaces of both abstraction layers is extremely high. The structure of the schema definition objects is essentially identical.


Of course Doctrine DBAL has many interfaces that Wikibase Database does not have. For instance, it can compare two schemas with each other, and turn the diff into SQL queries. That’s something that we will very likely have use for at a later point. It also has a nice query builder, support for more databases and a much more solid implementation overall. And perhaps most important of all, it does not need to be maintained by us. That’s one big project liability traded for another liability orders of magnitude less significant :)

The refactoring in the SQLStore from Wikibase Database killed about 1000 lines of code and obsoleted Wikibase Database (8000 lines of code) itself. Granted, this refactoring also tackled a design problem that caused not needed complexity, which contributed to this 1000 lines. It’s always great when you can remove so much code while retaining all functionality. In fact some TODOs got tackled along the way and some bugs got fixed. Oh and, we can now run the integration tests on an in-memory SQLite database \o/

So far my experience with Doctrine DBAL has been almost entirely positive. Many thanks to the authors of this software.

Posted in Programming Tagged with: , , , , , , , , , , ,

The Wikidata phase3 software components

Work on the long awaited query functionality for the Wikidata project has already happened during a period of several months. Since queries are a completely disjoint feature set from the existing functionality, we decided to put it into a new component part of the Wikibase software. This component is called Wikibase Query. WB Query is a MediaWiki extension that depends on the WB Repo extension. It’s responsible for providing query functionality bindings in the MediaWiki UI and web API.

The actual execution of queries is done via a component WB Query depends on: Wikibase QueryEngine. WB QueryEngine does not depend on MediaWiki and simply defines a high level interface to storing and querying Entity data. Entities are stored using ->insertEntity(Entity $entity) and queries are run with ->getMatchingEntities(a query object). The queries are defined using the Ask language, which I blogged about before.

The git repo that contains WB QueryEngine also comes with our first, and currently only, implementation of the WB QueryEngine interfaces. This implementation is called SQLStore, and as it’s name suggests is an implementation that works with relational databases. If you are familiar with Semantic MediaWiki, especially it’s internals, this should sound very familiar. The SQLStore is meant to allow us bootstrapping the functionality relatively quickly, and defer the decision of what other storage and query technologies we want to switch to a point where we will be better informed. If we create such an alternate implementation, the SQLStore will remain as a more simple to setup basic implementation, useful to developers and wikis that are not scale.

The SQLStore uses a database abstraction layer to not bind itself to a particular relational database. We’ve just made big changes to the database abstraction layer used, which is the topic of my next blog post: Wikibase and Doctrine DBAL.

Posted in Programming Tagged with: , , , , , , ,