Jeroens blog

Semantic MediaWiki 2.0 RC3

I am happy to announce the third release candidate for Semantic MediaWiki 2.0 is now available.

Semantic MediaWiki 2.0 is the next big release, which brings new features and many enhancements. Most notably vastly improved SPARQL store support, including a brand new connector for Jena Fuseki.

The target for the actual 2.0 release is August 3rd. This release candidate is meant to gather feedback and to provide you with a peak at 2.0 already. If you find any issues, please report them on our issue tracker.

Upgrading instructions

If you are using SMW via Composer, update the version in your composer.json to “~2.0@rc” and run “composer update”. If you where running dev versions of the 1.9 series using “~1.9@dev” or similar, switch to “~2.0@dev”. Note that several extensions to SMW, such as Semantic Maps and Semantic Result Formats, do not yet have any stable release which is installable together with SMW 2.x. If you are also running those, you will likely have to switch them to use a development version.

You can also download the SMW 2.0 RC3 tarball.

More detailed upgrading instructions will be made available for the 2.0 release.

Posted in News, Software Tagged with: , , , , ,

Some fun with iterators

Sometimes you need to loop over a big pile of stuff and execute an action for each item. In the Wikibase software, this for instance occurs when we want to rebuild or refresh a part of our secondary persistence, or when we want to create an export of sorts.

Historically we’ve created CLI scrips that build and executed calls to the database to get the data. As we’ve been paying more attention to separating concerns, such scripts have evolved into fetching the data via some persistence service. That avoids binding to a specific implementation to some extend. However, it still leads to the script knowing about the particular persistence service. In other words, it might not know the database layout, or that MySQL is used, it still things via the signature of the interface. And it’s entirely possible you want to use the code with a different source of the thing being iterated over that is in a format which the persistence interface is not suitable for.

All the code doing the iteration and invocation of the task needs to know is that there is a collection of a particular type it can loop over. This is what the Iterator interface is for. If you have the iteration code use an Iterator, you can implement and test most of your code without having the fetching part in place. You can simply feed in an ArrayIterator. This also demonstrates the script no longer knows if the data is already there or if (part of it) still needs to be retrieved.


When iterating over a big or expensive to retrieve set of data, one often wants to apply batching. Having to create an iterator every time is annoying though, and putting the iteration logic together with the actual retrieval code is not very nice. After having done this a few times, I realized that part could be abstracted out, and created a tiny new library: BatchingIterator. You can read how to use it.

Posted in Programming Tagged with: , , , , , , ,

Component design

This week I gave a presentation titled “Component design: Getting cohesion and coupling right” at Wikimedia Deutschland.


Components are a level of organization, in between classes and layers. They are an
important mechanism in avoiding monolithic designs and big balls of mud. While
everyone can recognize a component when they see one, many people are unaware of
the basic principles that guide good component design.

This presentation is aimed at developers. It is suitable both for people new to the field
and those with many years of experience.

The topics covered include:

* What is a component?
* Which things go together?
* How do components relate to each other?
* How are common problems avoided?

You can view the slides, or look at the source.

slides-1-benefits slides-2-questions slides-3-adp slides-4-cat

Posted in Programming Tagged with: , , , , , , , ,

Empower people around the world with microloans

In 2011, I created on account on, a website that facilitates microloans.

The basic idea being that you lend a small amount, typically 25 USD. The loans are made to poor people, that want to borrow money so they can invest in income sources. For instance new farm animals, stock for their shop, or machinery. That’s a lot more effective then giving someone some money that they use to buy food, without attempting to change their situation. What’s more, you get your money back most of the time, since they are loans. So once you got it back, you can make a new loan.

My first loans where on October 15, when I put in 100 USD. Not long after, I figured I had quite some money on my bank account that could just as well be put to good use. So I put in another 6000 USD. Since then I’ve made 1034 loans, to people in 56 different countries, with a total of 26000 USD lend.


I’ve lost 136.48 USD over all these loans, which I will never get back. In addition, the money has lost value due to inflation. Still, that is several hundred dollars that helped over a thousand people.

If that’s not enough to convince you that you should do the same, consider this entirely selfish reasoning. It’s good to split up your wealth, so you don’t end up with problems if something crashes. You can have money in different currencies, though that’s all still fiat. You can also put things into precious metals, though that comes with its own hassle. You can put money in stocks and whatnot, though that requires knowledge and can be quite risky. Then there are currencies, though that is even more risky. You can have money at multiple banks, in case one fails. Nowadays interest rates at most banks are below inflation rate, so you lose money, while supporting the bankers and risk losing it all when they fuck up. While a bank can fail, I cannot think of many events that would make me lose the money in these microloans, since it’s incredibly distributed. So while there is some loss, I’m more confident I will own (most of) this money in a year, then that on any of my bank accounts.

So, go ahead, create an account, and start lending.


Posted in Uncategorized Tagged with: ,

StarCarft arcade

These are replays of games on some of my favourite StarCarft 2 arcade games.


Hero battle, 3v3 (or [1-3]v[1-3]), ~30 min, ~90 APM

Mineralz and the THING

Survival (against human player), 7v1ish, ~40min

Desert Strike HotS

Tug of war, 3v3 (or [1-3]v[1-3]), ~30 min, ~10 APM

Tya’s Zerg Defence

Survival, 3 player, ~30 min

Star Battle

Hero battle, 6v6 (or [1-6]v[1-6]), ~40 min, ~90 APM

Posted in Gaming Tagged with: , , ,

Wikibase and Doctrine DBAL


When I started writing this blog post, I realized some introduction to the query components was first due. You can find it in my last blog post: The Wikidata phase3 software components.

In this post I described how the SQLStore uses a database abstraction layer to not bind itself to a particular relational database. The Wikibase software as a whole is already using a database abstraction layer, namely the one MediaWiki provides. Even though the main Wikibase applications, Wikibase Client and Wikibase Repo, depend on MediaWiki, the SQLStore does not. That rules out direct usage of the MediaWiki database abstraction layer.

Nevertheless, the MediaWiki database abstraction layer has served us reasonably well, and the team is familiar with it. Furthermore, using it does not introduce a new dependency that WMF operations might get mad about. So what we did was create a new set of interfaces for database interaction, much in line with those of the MediaWiki database abstraction later, though without a bunch of design issues the later suffers from. We then created thin implementations of these interfaces that delegated to the MediaWiki database abstraction layer. This inversion of control made it possible to use the MW abstraction layer in SQLStore, without having SQLStore know about MediaWiki.

These interfaces where put in a new component called Wikibase Database. I covered the creation of Wikibase Database in an earlier blog post.

The SQLStore does not have a fully fixed schema. One can configure which types of data are supported, and generally each type of data has its own table. Furthermore one can provide additional types of data that the core SQLStore does not support. Hence having the tables build with manually constructed SQL in sql files as done in MediaWiki does not seem convenient at all. The Semantic MediaWiki deals with this problem by dynamically generating a declarative definition of the schema and then translating that into SQL. We went with the same approach for the SQLStore, though took a lot of care to not repeat the spaghetti implementation approach of SMW.

This means we needed some way to represent tables, columns, indexes, etc in PHP. Furthermore, we’d need to be able to turn this PHP representation into SQL compatible with the used database. Unfortunately the MediaWiki database abstraction later can do neither of those.

What we ended up doing was creating schema representation objects and adding additional interfaces to Wikibase Database. And than creating implementations of those for MySQL and SQLite. Needless to say, writing such SQL generation code is rather tedious and quite hard to test. We knew that before hand, and spend quite some time looking around for existing solutions we’d be able to delegate to. Unfortunately none where found.

Several months later I somehow ended up on the architecture page of Doctrine DBAL. After reading through the basic introduction there, I thought “oh wow, this sound very similar to what we did in Wikibase Database”. So I read through the more detailed docs, and slowly came to the realization that DBAL is exactly what we had been looking for before. Making the facepalm even worse, I actually read through the basic Doctrine docs and quickly looked at its source when doing the initial research. Base on that I had concluded that it came with all this not needed ORM stuff, not realizing Doctrine itself was build on an independent DBAL.


I’ve now migrated the SQLStore to use Doctrine DBAL rather than Wikibase Database. The similarities between the interfaces of both abstraction layers is extremely high. The structure of the schema definition objects is essentially identical.


Of course Doctrine DBAL has many interfaces that Wikibase Database does not have. For instance, it can compare two schemas with each other, and turn the diff into SQL queries. That’s something that we will very likely have use for at a later point. It also has a nice query builder, support for more databases and a much more solid implementation overall. And perhaps most important of all, it does not need to be maintained by us. That’s one big project liability traded for another liability orders of magnitude less significant :)

The refactoring in the SQLStore from Wikibase Database killed about 1000 lines of code and obsoleted Wikibase Database (8000 lines of code) itself. Granted, this refactoring also tackled a design problem that caused not needed complexity, which contributed to this 1000 lines. It’s always great when you can remove so much code while retaining all functionality. In fact some TODOs got tackled along the way and some bugs got fixed. Oh and, we can now run the integration tests on an in-memory SQLite database \o/

So far my experience with Doctrine DBAL has been almost entirely positive. Many thanks to the authors of this software.

Posted in Programming Tagged with: , , , , , , , , , , ,

The Wikidata phase3 software components

Work on the long awaited query functionality for the Wikidata project has already happened during a period of several months. Since queries are a completely disjoint feature set from the existing functionality, we decided to put it into a new component part of the Wikibase software. This component is called Wikibase Query. WB Query is a MediaWiki extension that depends on the WB Repo extension. It’s responsible for providing query functionality bindings in the MediaWiki UI and web API.

The actual execution of queries is done via a component WB Query depends on: Wikibase QueryEngine. WB QueryEngine does not depend on MediaWiki and simply defines a high level interface to storing and querying Entity data. Entities are stored using ->insertEntity(Entity $entity) and queries are run with ->getMatchingEntities(a query object). The queries are defined using the Ask language, which I blogged about before.

The git repo that contains WB QueryEngine also comes with our first, and currently only, implementation of the WB QueryEngine interfaces. This implementation is called SQLStore, and as it’s name suggests is an implementation that works with relational databases. If you are familiar with Semantic MediaWiki, especially it’s internals, this should sound very familiar. The SQLStore is meant to allow us bootstrapping the functionality relatively quickly, and defer the decision of what other storage and query technologies we want to switch to a point where we will be better informed. If we create such an alternate implementation, the SQLStore will remain as a more simple to setup basic implementation, useful to developers and wikis that are not scale.

The SQLStore uses a database abstraction layer to not bind itself to a particular relational database. We’ve just made big changes to the database abstraction layer used, which is the topic of my next blog post: Wikibase and Doctrine DBAL.

Posted in Programming Tagged with: , , , , , , ,

Nyan review

Code review is fun!

Posted in Programming Tagged with: , , , , , , ,

Wikibase DataModel: Entity v2

In a recent blog post I introduced the new Term classes introduced in Wikibase DataModel 0.7.3. It also outlined plans for making some big changes to the Entity class and it’s derivatives. We have now taken the most difficult step in the process, which is already resulting in much nicer code.

As I’ve written about in the past, we had a good portion of technical debt related to our serialization code. Our Value Objects and Entities in DataModel had a public toArray method and a public static newFromArray. This brought with it global state in places, it caused a lot of static code and forces an additional responsibility onto these objects, increasing their complexity. The new serialization components where created in part to address these issues.

I started with removing the old toArray and newFromArray code from DataModel, and in the process of doing so found that this went hand in hand with taking the next big step in breaking up the Entity hierarchy. The constructor of Entity (and it’s concrete derivatives) took an array with the internal serialization format as only argument. They would hold this array internally and unstub specific parts when that data was requested in object form (ie by calling getSiteLinks). If all the serialization code got moved out of DataModel, then either this would need to go as well, or we’d need to make DataModel dependent on the new serialization components. Luckily the choice between introducing a cyclic dependency and removing some technical debt you need to get rid of in the near future anyway is an easy one.

Rather than taking this array in some storage specific array format, the constructors of the Entity derivatives now take a list of the objects they need. For instance:

As you can imagine, this makes a lot of things in Entity simpler. My last post on DataModel included this plot:


This is what the same plot looks like on the development branch of the DataModel 1.0 release:


We nearly halved the complexity of our most complex class \o/. Some more stats: So far we changed 47 files with 768 additions and 2382 deletions. With these changes, our ScurtinizerCI quality rating went from 8.23 to 8.76. The release is definitely not done yet though – the big changes already described make a lot of smaller cleanup possible. And we’re incentives to kill deprecated things in this release, since we’ll be following semver properly afterwards, and will have to bump to 2.x when we make a breaking change afterwards.

Entity has an equality method. This used to work by putting the array data the Entity held through a generic comparer object. Since we needed a replacement for this, I made most value objects in DataModel implement the Comparable interface. This was already made available in a 0.7.4 release. Now Entity simply delegates to the equals methods of the objects it holds, letting them decide how to compute equality for their type. This fixed quite some inconsistencies that could occur in the old code (depending on how exactly you set data in the first place) such as SiteLink badges incorrectly being compared in order dependant fashion.

Further splitting of Entity is on the roadmap, though perhaps not for the 1.0 release. For a list of 1.0 changes made so far, check the release notes.

Posted in Programming, Software Tagged with: , , , , , , , ,

Desert Strike HotS: gas rush

Every now and then I play Desert Strike HotS, which is an arcade game on the StarCraft 2 platform.

It is a 3v3 tug of war game. You place units on your platform, and they all spawn periodically and head towards the other side of the map. The game ends when you kill the base of the enemy team.


You cannot control the units after they spawned (with the exception of some abilities), they just engage the enemy units as the AI sees fit. That means placement of units can be almost as important as what units you choose to counter the enemy force.

To place units, you need minerals, which you get though a steady income. Every second or so you get some minerals. The team that controls the centre gets 15% bonus income. You control the centre by having the battle take place on the other side of the centre line. A second way to increase your income is to “build a gas” (a refinery building). Doing so costs 100 minerals and pauses your income for some time. Each new gas you place pauses your income longer. Once the gas finished, your income resumes, with a 10% bonus. You can maximum build 3 gas (plus an additional one in late game). Halfway between the centre line and the enemy base on each side, there is a tower (defensive structure) of the team on that side. Killing this enemy tower gets each player of the team 200 minerals.

Those are the basic game mechanics, in particular those related to economy. This post is on how it is a bad idea to rush building gas, something a lot of people are doing. And most of them don’t seem to understand why it is unwise, even while it’s so simple.

10% income bonus is nice. However, 15% income bonus for both you and your two allies is clearly nicer. Quantifiable so: 4.5 times.

Now this 4.5x advantage is not holding into account two additional relevant factors. If the advantage is 4.5x, it means the enemy team has to build 4.5 gas to cancel out your centre control. That means all their players need to invest in a gas, and two need to even get their second gas (which is more expensive than the first). At that point you are even income wise, assuming your team did not build any gas yet. Phrasing it like this makes it clear that your team can however build one, and then be ahead. The second factor is that having these gas does not give you a tactical advantage, while controlling the centre does. It means you are closer to their tower, and if you take it out, you get that 200 minerals bonus.

So clearly, the best investment you can make is securing the centre. And jeopardizing centre control for getting a gas is folly. One should only get gas during periods in which the centre is secure.

Unless your opponents don’t know how to play (hint: gas rush is a good indication), you will need all the minerals you have dedicated to controlling the centre. At the beginning of the game, you have few units. Every mineral makes a difference, which also makes it so much more important to cause a huge income gap by having the enemy team control the centre. So it really is the worst possible time to get a gas. It’s a “long term investment” which ends up costing you in the short term, and even more in the long run.

I wonder if these gas rush noobs put their real money on savings accounts, and than go borrow money because they locked away to much. There definitely seems to be correlation between gas rushing and not being very smart, as manifested by playing badly. So how do I know that these people are generally not simply new to this game, and simply don’t know the mechanics? Amazingly, a lot of them have played the game hundreds of times (which is a lot more than me). If after playing so many times, you still do not know which units to get, where to place them, and how gas rush correlates to stupidity and defeat, well, then you are probably indeed not the brightest person around. I’ve also asked many gas rushers why they think it’s a good idea, and most of them make it clear they indeed don’t get it, and are completely certain that the stupid thing they did the last 500 games is a great idea.

Time to end the rant and switch to some screenshots. And yeah, I did not know “gas” is spelled with only one “s” :)

Desert Strike HotS gas rush

Desert Strike HotS gas rush

Desert Strike HotS gas rush

Desert Strike HotS gas rush

And here you have a video of me playing the game, without anyone in my team gas rushing.

Posted in Gaming Tagged with: , , ,