Jeroens blog

PHPCS and PHPMD: my experiences

PHPCS (PHP Code Sniffer) detects violations against a specified coding standard. PHPMD (PHP Mess Detector) is a similar tool, though with more of a focus on metrics. In this post I’ll go over how I started using them, and what I learned in the process.

They are both very useful in that they can automatically detect a variety of things such as unused variables, misplaced brackets and overly long methods. These are things looked out for during code review, yet this takes time, and is quite error prone. I often find myself running PHPStorms “code analysis” to find newly introduced dead code which up till that point had gone undetected.

I’ve known about tools such as PHPCS and PHPMD for quite some time, though never got round to actually using them. Then some time ago, I stumbled across a recently created PHPCS ruleset for the MediaWiki coding style. Since we’re using the MediaWiki coding style in most of our PHP projects at WMDE (Wikimedia Deutchland), I started looking into how we could make use of this ruleset.

The first step I took was taking the already created ruleset for the MediaWiki coding style and trying it out against a small project that’s essentially done and sees very little development activity: the Diff library. Apart from a few misplaced spaces, which where quickly fixed, it worked right away – yay. Step two was to actually look at the ruleset in more detail.

I realized that several exceptions to the rules could be removed, since the Diff library is more strict about things than the MediaWiki software. After removing those, I started looking for other things that could easily be added to the ruleset. I ended up looking through the various PHPCS so called coding standards, and picked rules matching the Diff guidelines from several, which where then added to my own ruleset.

One thing that I was keen to add where limitations on code complexity. PHPCS allows setting a maximum nesting level, and upperbounding the Cyclomatic Complexity of methods. That’s great, though hardly covers all metrics I want. You can still create a 10k ELOC class with 50 fields that couples to 100 classes without the tool shouting at you. I did some research on the other code analysis tools for PHP, and figured PHPMD was the best fit.

As with PHPCS, I went through all the PHPMD rules, to determine which ones where applicable to my usecase, and put them in my rule file.

At this point I could validate the Diff code against both the PHPCS and PHPMD rules by running

The -p option for PHPCS shows the progress with a pile of dots, much like done in PHPUnit. Unfortunately PHPMD has no such feature. The -s option sprints the names of the rules in the report. This is very useful when you do not know the rules yet, and are trying to find out which ones are applicable for your project. If the name of the rule is included, rather than the error message, you don’t need to hunt for it before you’re able to disable it or change its configuration.

Next I looked into how to run this on TravisCI, so new violations would not get introduced. The most obvious thing to do is of course to simply look at how existing projects are doing this. In doing so, I ran into a Composer feature I had not used before: Composer scripts. Diff now has these scripts, which allows me to simply run “composer ci” to run all tests and code style checks that are also run on TravisCI.

With the basic work done for the Diff library, I proceeded to add the rulesets I created to the Wikibase DataModel component. This component is bigger, though still reasonably small, and is more active development wise. Like Diff, it is quite clean and mostly adheres to the more strict and recent code style and complexity expectations we have at WMDE. This is why I figured Wikibase DataModel to be a good choice to test the rulsets and general setup against.

Right away several rules where found that had to be removed or configured differently, since they prohibited things we’re actually fine with, or in some cases demand. (There are many rules in PHPCS that conflict with each other, so you will never be able to enable all rules.) An example of this is the PHPMD TooManyMethods rule, which I figured was just about public non-getter-setter methods. Apparently it also includes private methods as well, which makes the rule a lot less useful in my opinion (see GitHub issue). Another thing I ran into is that we have test methods named like testGivenInvalidLanguage_exceptionIsThrown, which violates the CamelCase rule. I filed an issue for this, and have a preliminary pull request open as well.

I’ve now started using these tools for several projects, including some in which there where a lot of deviations from the standard. PHPCS comes with a “PHP Code Beautifier and Fixer” tool that allows you to quickly fix whole classes of certain violations. For instance, your coding style might dictate a line at the end of each file. This tool can automatically add one wherever one is missing, and remove additional ones where there is more than one.

In legacy projects it can be hard to enable these tools and have them prevent regressions against your coding standard. If you have a lot of code violating your own standard, you need to fix it before you can enable the associated rules. This can be a lot of work, and not something you want to do in one go. You might not even want to do it at all at places, for instance parts of your codebase that are not actively developed. You can simply omit rules that are presently violated, though that likely leaves you with a much reduced ruleset, forcing you to still pay a lot of attention to prevent new violations of the omitted rules from being added. Alternatively you can have the whole rulset, and simply not have your CI server use it. The danger with this is obvious, and as with not running tests on your CI and accepting some are broken, a lot of vigilance is needed to prevent regressions, especially in teams.

In one legacy project where I’ve introduced these tools, I went with a combined approach. This project has two rulsets for both PHPCS and PHPMD: a strict one, and a reduced one that works with the legacy code. In this project there already was a devision between old and new code. The old code was in one directory using a custom autoloader, while the new one was in src/, using the Composer PSR-4 autoloader. This made it trivial to run the strict rules against the code in src/, and new things added there, while running the more relaxed ones against the old directory.

Posted in Programming Tagged with: , , , , , , , , , , , ,

Semantic MediaWiki news

It’s been a while since I last wrote about Semantic MediaWiki, even though several noteworthy things have happened since. In this post I’ll highlight some things that have happened since the 2.0 release.

Semantic MediaWiki 2.1

Semantic MediaWiki 2.1 is a minor release that adds several new features, many enhancements, addresses numerous issues and adds support for additional platforms. It does not contain any breaking changes.

The most notable new features provided by this new version are support for semantic queries in Special:Search, contextual help texts on edit pages, and introducing the long wished for full PostgreSQL support. Substantial improvements to the SPARQL store support have also been made.

At least as important as the new features are the numerous bug fixes, performance tweaks and minor enhancements. I’m really pleased with how few regressions we’ve introduced in the past few releases. Combined with the many fixes we’re making, this means SMW keeps on getting more robust and polished.

SMWCon Fall 2014, Vienna

I attended SMWCon Fall 2014 in Vienna.

As usual I gave my “yearly” SMW talk on everything that happened since the conference a year before, what’s currently going on, and what plans we [people working on SMW] have.

The one other presentation I did was on the Wikibase software.

You can find videos and slides for most of the SMWCon Fall 2014 talks via the schedule. I can hardly cover all interesting topics presented about, so will leave it at this little bit of self promotion :)

New SMW extensions

MWJames somehow found the time to create two new SMW extensions, on top of all the work he’s been doing on SMW itself.Semantic Breadcrumb Links is a Semantic Mediawiki extension to aid in-page navigation by building breadcrumb links from an attributive property filter. It uses a pattern match strategy to filter property usage (e.g. Has parent page) that ascribe the location of a page relative to its parent and provides navigational help by generating a breadcrumb trail.

Semantic Interlanguage Links is a Semantic Mediawiki extension to create and manage interlanguage links. This extension creates interlanguage links and provides queryable annotations that can connect pages with similar content for different languages to be accessible via the sitelink navigation by using the INTERLANGUAGELINK parser function. You can see the extension in action at in this video.

Several existing SMW extensions, such as Semantic Result Formats and Semantic Forms also saw new releases. Perhaps most notably Semantic Signup, which got a huge overhaul. This extension had essentially gone unmaintained for a few years, and now was updated to work with the latest versions of MediaWiki, Semantic MediaWiki and PHP. A lot of its code was cleaned up and tests where added, making future maintenance easier, and uncovering a number of bugs which where fixed.

Up next

MWJames has been working on yet another existing new SMW extension, which will probably see it’s first release within a month. Stay tuned!

In a bit under 3 months time, SMWCon Spring 2015 will be held at St. Louis, Missouri. If you have not yet, go ahead and register.

And of course there is Semantic MediaWiki 2.2, the next minor SMW release. At the time of writing this post, 238 changes have been made to 393 files by 9 different people. A lot of these are internal enhancements, many of which preparation work needed to make bigger changes. There also was some focus on performance related improvements. A few of the new features we have so far are template support in the #set parser function, additional options in the rebuildData maintenance script, and a new sorting option for the category result format. There currently is no target date for the 2.2 release, though 2 to 3 months from now is probably a fair estimation.

Note: This is just a selection of things that happened, and I might be forgetting about some important things :)

Posted in Uncategorized Tagged with: , , , , , , , , ,

Wikibase DataModel 1.0

I’m happy to announce the 1.0 release of Wikibase DataModel. Wikibase DataModel is the canonical PHP implementation of the Data Model at the heart of the Wikibase software.

This is a big release which has been some time in the making, even though many additions have been split of and included in previous releases. The highlights are as follows:

Removal of the (de)serialization code

The entities and value objects in Wikibase DataModel used to have toArray and newFromArray methods. This caused several problems, such as having a pile of static code, depending on configuration (which was done via global state) and adding support for an arbitrary array format to the responsibilities of the objects. This has been fully removed, and can now be done via the dedicated serialization components (Public format, internal format) which where released some time back.

In earlier versions of Wikibase DataModel, the Item and Property classes contained the array representation internally rather than fields for the value objects items and properties contain. While this was not visible via the getters and setters, which dealt with those value objects, it was exposed in the constructor. As of DataModel 1.0, the constructors take the value objects rather than the array representation.

Deprecation of Entity

Type hinting against the Entity class has been deprecated. This was announced on the Wikidata tech list some time back. While the class is still there, most methods it defines have been deprecated, and some have been removed. A new EntityDocument interface has been introduced in version 0.8.2, which can be used instead. As part of the cleanup here, Item has been made to only accept statements, rather than all claims, as it wrongly did before.


Many more changes and additions where made. You can view the full list of changes affecting users of the component in the release notes.

Posted in Programming, Software Tagged with: , , , , , , ,

SoCraTes 2014


Last week I attended SoCraTes 2014, the 4th International Software Craftsmanship and Testing Conference in Germany.

Since this was the first time I went there, I did not really know what to expect, and was slightly apprehensive. Turns out there was no need for that, the conference was the most fun and interesting I’ve been to recently, and definitely the most motivating as well.

What made it so great? Probably the nicest aspect of the conference where the people attending. Basically everyone there was passionate about what they were doing, interested in learning more, open minded, and respectful of others. This combined with the schedule being purely composed out of sessions people proposed at the start of the day made the whole atmosphere very different from that of your typical commercial conference. Apart from attending sessions on various topics, I also did some pair programming and played my first set of beachvolleyball games at a software conference.

I’m definitely going back next year!

Posted in Events, Programming Tagged with: , , , ,

Doctrine QueryBuilder table alias

The Doctrine project, best known for its Object Relational Mapper, also includes a database abstraction layer, used by the ORM. This abstraction layer is called DBAL, for DataBase Abstraction Layer.

Quickly after I started using DBAL in some Wikibase components, I got annoyed at how single table single table selects needed to be created. The QueryBuilder has a “from” method, in which one needs to specify the table name. The annoying bit is that you also had to specify a table alias, and then use this alias in the select and where calls as well.


In an hour of free time and several commits I made the parameter optional, enabling one to do the same without having to specify the alias:


This change will be part of the upcoming Doctrine DBAL 2.5.

Posted in Programming, Software Tagged with: , , , ,

Semantic MediaWiki 2.0 RC3

I am happy to announce the third release candidate for Semantic MediaWiki 2.0 is now available.

Semantic MediaWiki 2.0 is the next big release, which brings new features and many enhancements. Most notably vastly improved SPARQL store support, including a brand new connector for Jena Fuseki.

The target for the actual 2.0 release is August 3rd. This release candidate is meant to gather feedback and to provide you with a peak at 2.0 already. If you find any issues, please report them on our issue tracker.

Upgrading instructions

If you are using SMW via Composer, update the version in your composer.json to “~2.0@rc” and run “composer update”. If you where running dev versions of the 1.9 series using “~1.9@dev” or similar, switch to “~2.0@dev”. Note that several extensions to SMW, such as Semantic Maps and Semantic Result Formats, do not yet have any stable release which is installable together with SMW 2.x. If you are also running those, you will likely have to switch them to use a development version.

You can also download the SMW 2.0 RC3 tarball.

More detailed upgrading instructions will be made available for the 2.0 release.

Posted in News, Software Tagged with: , , , , ,

Some fun with iterators

Sometimes you need to loop over a big pile of stuff and execute an action for each item. In the Wikibase software, this for instance occurs when we want to rebuild or refresh a part of our secondary persistence, or when we want to create an export of sorts.

Historically we’ve created CLI scrips that build and executed calls to the database to get the data. As we’ve been paying more attention to separating concerns, such scripts have evolved into fetching the data via some persistence service. That avoids binding to a specific implementation to some extend. However, it still leads to the script knowing about the particular persistence service. In other words, it might not know the database layout, or that MySQL is used, it still things via the signature of the interface. And it’s entirely possible you want to use the code with a different source of the thing being iterated over that is in a format which the persistence interface is not suitable for.

All the code doing the iteration and invocation of the task needs to know is that there is a collection of a particular type it can loop over. This is what the Iterator interface is for. If you have the iteration code use an Iterator, you can implement and test most of your code without having the fetching part in place. You can simply feed in an ArrayIterator. This also demonstrates the script no longer knows if the data is already there or if (part of it) still needs to be retrieved.


When iterating over a big or expensive to retrieve set of data, one often wants to apply batching. Having to create an iterator every time is annoying though, and putting the iteration logic together with the actual retrieval code is not very nice. After having done this a few times, I realized that part could be abstracted out, and created a tiny new library: BatchingIterator. You can read how to use it.

Posted in Programming Tagged with: , , , , , , ,

Component design

This week I gave a presentation titled “Component design: Getting cohesion and coupling right” at Wikimedia Deutschland.


Components are a level of organization, in between classes and layers. They are an
important mechanism in avoiding monolithic designs and big balls of mud. While
everyone can recognize a component when they see one, many people are unaware of
the basic principles that guide good component design.

This presentation is aimed at developers. It is suitable both for people new to the field
and those with many years of experience.

The topics covered include:

* What is a component?
* Which things go together?
* How do components relate to each other?
* How are common problems avoided?

You can view the slides, or look at the source.

slides-1-benefits slides-2-questions slides-3-adp slides-4-cat

Posted in Programming Tagged with: , , , , , , , ,

Empower people around the world with microloans

In 2011, I created on account on, a website that facilitates microloans.

The basic idea being that you lend a small amount, typically 25 USD. The loans are made to poor people, that want to borrow money so they can invest in income sources. For instance new farm animals, stock for their shop, or machinery. That’s a lot more effective then giving someone some money that they use to buy food, without attempting to change their situation. What’s more, you get your money back most of the time, since they are loans. So once you got it back, you can make a new loan.

My first loans where on October 15, when I put in 100 USD. Not long after, I figured I had quite some money on my bank account that could just as well be put to good use. So I put in another 6000 USD. Since then I’ve made 1034 loans, to people in 56 different countries, with a total of 26000 USD lend.


I’ve lost 136.48 USD over all these loans, which I will never get back. In addition, the money has lost value due to inflation. Still, that is several hundred dollars that helped over a thousand people.

If that’s not enough to convince you that you should do the same, consider this entirely selfish reasoning. It’s good to split up your wealth, so you don’t end up with problems if something crashes. You can have money in different currencies, though that’s all still fiat. You can also put things into precious metals, though that comes with its own hassle. You can put money in stocks and whatnot, though that requires knowledge and can be quite risky. Then there are currencies, though that is even more risky. You can have money at multiple banks, in case one fails. Nowadays interest rates at most banks are below inflation rate, so you lose money, while supporting the bankers and risk losing it all when they fuck up. While a bank can fail, I cannot think of many events that would make me lose the money in these microloans, since it’s incredibly distributed. So while there is some loss, I’m more confident I will own (most of) this money in a year, then that on any of my bank accounts.

So, go ahead, create an account, and start lending.


Posted in Uncategorized Tagged with: ,

StarCarft arcade

These are replays of games on some of my favourite StarCarft 2 arcade games.


Hero battle, 3v3 (or [1-3]v[1-3]), ~30 min, ~90 APM

Mineralz and the THING

Survival (against human player), 7v1ish, ~40min

Desert Strike HotS

Tug of war, 3v3 (or [1-3]v[1-3]), ~30 min, ~10 APM

Tya’s Zerg Defence

Survival, 3 player, ~30 min

Star Battle

Hero battle, 6v6 (or [1-6]v[1-6]), ~40 min, ~90 APM

Posted in Gaming Tagged with: , , ,