PHPCS (PHP Code Sniffer) detects violations against a specified coding standard. PHPMD (PHP Mess Detector) is a similar tool, though with more of a focus on metrics. In this post I’ll go over how I started using them, and what I learned in the process.
They are both very useful in that they can automatically detect a variety of things such as unused variables, misplaced brackets and overly long methods. These are things looked out for during code review, yet this takes time, and is quite error prone. I often find myself running PHPStorms “code analysis” to find newly introduced dead code which up till that point had gone undetected.
I’ve known about tools such as PHPCS and PHPMD for quite some time, though never got round to actually using them. Then some time ago, I stumbled across a recently created PHPCS ruleset for the MediaWiki coding style. Since we’re using the MediaWiki coding style in most of our PHP projects at WMDE (Wikimedia Deutchland), I started looking into how we could make use of this ruleset.
The first step I took was taking the already created ruleset for the MediaWiki coding style and trying it out against a small project that’s essentially done and sees very little development activity: the Diff library. Apart from a few misplaced spaces, which where quickly fixed, it worked right away – yay. Step two was to actually look at the ruleset in more detail.
I realized that several exceptions to the rules could be removed, since the Diff library is more strict about things than the MediaWiki software. After removing those, I started looking for other things that could easily be added to the ruleset. I ended up looking through the various PHPCS so called coding standards, and picked rules matching the Diff guidelines from several, which where then added to my own ruleset.
One thing that I was keen to add where limitations on code complexity. PHPCS allows setting a maximum nesting level, and upperbounding the Cyclomatic Complexity of methods. That’s great, though hardly covers all metrics I want. You can still create a 10k ELOC class with 50 fields that couples to 100 classes without the tool shouting at you. I did some research on the other code analysis tools for PHP, and figured PHPMD was the best fit.
At this point I could validate the Diff code against both the PHPCS and PHPMD rules by running
phpcs src/* tests/* --standard=phpcs.xml --extensions=php -sp
phpmd src/ text phpmd.xml
The -p option for PHPCS shows the progress with a pile of dots, much like done in PHPUnit. Unfortunately PHPMD has no such feature. The -s option sprints the names of the rules in the report. This is very useful when you do not know the rules yet, and are trying to find out which ones are applicable for your project. If the name of the rule is included, rather than the error message, you don’t need to hunt for it before you’re able to disable it or change its configuration.
Next I looked into how to run this on TravisCI, so new violations would not get introduced. The most obvious thing to do is of course to simply look at how existing projects are doing this. In doing so, I ran into a Composer feature I had not used before: Composer scripts. Diff now has these scripts, which allows me to simply run “composer ci” to run all tests and code style checks that are also run on TravisCI.
With the basic work done for the Diff library, I proceeded to add the rulesets I created to the Wikibase DataModel component. This component is bigger, though still reasonably small, and is more active development wise. Like Diff, it is quite clean and mostly adheres to the more strict and recent code style and complexity expectations we have at WMDE. This is why I figured Wikibase DataModel to be a good choice to test the rulsets and general setup against.
Right away several rules where found that had to be removed or configured differently, since they prohibited things we’re actually fine with, or in some cases demand. (There are many rules in PHPCS that conflict with each other, so you will never be able to enable all rules.) An example of this is the PHPMD TooManyMethods rule, which I figured was just about public non-getter-setter methods. Apparently it also includes private methods as well, which makes the rule a lot less useful in my opinion (see GitHub issue). Another thing I ran into is that we have test methods named like testGivenInvalidLanguage_exceptionIsThrown, which violates the CamelCase rule. I filed an issue for this, and have a preliminary pull request open as well.
I’ve now started using these tools for several projects, including some in which there where a lot of deviations from the standard. PHPCS comes with a “PHP Code Beautifier and Fixer” tool that allows you to quickly fix whole classes of certain violations. For instance, your coding style might dictate a line at the end of each file. This tool can automatically add one wherever one is missing, and remove additional ones where there is more than one.
In legacy projects it can be hard to enable these tools and have them prevent regressions against your coding standard. If you have a lot of code violating your own standard, you need to fix it before you can enable the associated rules. This can be a lot of work, and not something you want to do in one go. You might not even want to do it at all at places, for instance parts of your codebase that are not actively developed. You can simply omit rules that are presently violated, though that likely leaves you with a much reduced ruleset, forcing you to still pay a lot of attention to prevent new violations of the omitted rules from being added. Alternatively you can have the whole rulset, and simply not have your CI server use it. The danger with this is obvious, and as with not running tests on your CI and accepting some are broken, a lot of vigilance is needed to prevent regressions, especially in teams.
In one legacy project where I’ve introduced these tools, I went with a combined approach. This project has two rulsets for both PHPCS and PHPMD: a strict one, and a reduced one that works with the legacy code. In this project there already was a devision between old and new code. The old code was in one directory using a custom autoloader, while the new one was in src/, using the Composer PSR-4 autoloader. This made it trivial to run the strict rules against the code in src/, and new things added there, while running the more relaxed ones against the old directory.