grahl/ch

Refactoring complex Drupal modules

In 2016 I spent a significant amount of time on porting and refactoring the LDAP module for Drupal 8. This work is ongoing and lots of work still needs to be done for a proper 1.0 release.

Here’s what I encountered along the way and why Drupal 8 is such a massive step forward for complex modules like LDAP.

Good intentions

If you look at the 7.x codebase you can clearly see that the authors wanted to have a well-tested and supported module which worked as expected even though it contains a cornucopia of features. This was not easy in 7 and it shows.

It’s visible in the fact that unit tests are really integration tests (due to SimpleTest). It’s visible in the fact that integration tests use complex bootstrapping mechanisms which switch out significant amount of code to substitute it with test code in huge, monolithic test classes. Plus, there are multiple lines of test architectures and manual test instructions.

The most interesting part is that you see that the authors wanted to include object-oriented code but Drupal’s structure and the code’s legacy architecture prevented them from doing so.

First obstacle: Class loading

In 7 there is no real autoloading. There are solutions now but those were not available to the authors. What did they do?

They wrote a custom class loader in each module to fetch their helper classes.

function ldap_user_conf($type = NULL, $reset = FALSE) {
  static $ldap_user_conf;
  static $ldap_user_conf_admin;

  if ($type == 'admin' && ($reset || !is_object($ldap_user_conf_admin))) {
    ldap_servers_module_load_include('php', 'ldap_user', 'LdapUserConfAdmin.class');
    $ldap_user_conf_admin = new LdapUserConfAdmin();
  }
  elseif ($type != 'admin' && ($reset || !is_object($ldap_user_conf))) {
    ldap_servers_module_load_include('php', 'ldap_user', 'LdapUserConf.class');
    $ldap_user_conf = new LdapUserConf();
  }

  return ($type == 'admin') ? $ldap_user_conf_admin : $ldap_user_conf;
}

Note that this function not only switches between loading a subclass and its parent class but also contains a custom caching mechanism.

An additional problem was that these classes could not declare their dependencies so that the following was not uncommon and created code which is tied together only by wishful thinking and manual testing:

module_load_include('php', 'ldap_user', 'LdapUserConf.class');
module_load_include('inc', 'user', 'user.pages');

class LdapUserConfAdmin extends LdapUserConf {

The refactoring in the D8 branch allowed us to get rid of LdapUserConfAdmin and handle its tasks directly in a standard configuration form class. While we have not yet gotten rid of LdapUserConf, the beginning of the class exemplifies how autoloading has already made a huge difference:

namespace Drupal\ldap_user;

use Drupal\Component\Utility\Unicode;
use Drupal\ldap_servers\Entity\Server;
use Drupal\ldap_servers\TokenFunctions;
use Drupal\ldap_user\Exception\LdapBadParamsException;
use Drupal\user\Entity\User;
use Drupal\user\UserInterface;

class LdapUserConf {

Second obstacle: Class and configuration structure

Due to the inefficient mechanisms above it’s no wonder that classes weren’t separated into functional groupings but rather stuck to one huge class to handle everything, with a subclass for the administration form.

Additionally, the existing classes tried to provide an interface to the complex attributes required to configure a site with LDAP authentication, user mappings, etc., and thus, we see the following:

// Creation & default value for saveable data.
public $userConflictResolve = LDAP_USER_CONFLICT_RESOLVE_DEFAULT; 
 
// Complex data structure without saveable data.
public $provisionSidFromDirection = array(
  LDAP_USER_PROV_DIRECTION_TO_DRUPAL_USER => LDAP_USER_NO_SERVER_SID,
  LDAP_USER_PROV_DIRECTION_TO_LDAP_ENTRY => LDAP_USER_NO_SERVER_SID,
); 

function __construct() {
  // Loads all saveable fields via variable_get().
  $this->load(); 

Obviously, Drupal’s configuration storage can and should take of at least all saveable values. That conversion has already made a huge difference in terms of code complexity but required painstaking manual conversions due to subtle logic differences in many places.

Also note that once we no longer have to manually load the configuration values from the database the cache for LdapUserConf is not needed anymore. We could then remove a significant amount of code dealing with setting and resetting it.

Third obstacle: Global constants

While constants can in general be useful to denote, well, constant values, their usage in the LDAP module is often problematic:

define('LDAP_USER_EVENT_SYNCH_TO_DRUPAL_USER', 1);
define('LDAP_USER_EVENT_CREATE_DRUPAL_USER', 2);
define('LDAP_USER_EVENT_SYNCH_TO_LDAP_ENTRY', 3);
define('LDAP_USER_EVENT_CREATE_LDAP_ENTRY', 4);
define('LDAP_USER_EVENT_LDAP_ASSOCIATE_DRUPAL_ACCT', 5);

For one, their numeric values are a problem since it makes reading the configuration settings difficult and cryptic:

ldap_user_conf:
   drupalAcctProvisionTriggers:
    2: '2'
    1: '1'
  ldapEntryProvisionTriggers:
    6: 0
    7: 0
    8: 0
    3: 0
  manualAccountConflict: 3
  acctCreation: 4

More importantly though they are global in scope and need to be declared at the right time, which is also why they are not normally found in context but just in bulk in the your_module.module file. Thus, they also need to be manually loaded in some cases, e.g. LdapUserConf:

require_once('ldap_user.module');

For now we have stashed these constants as static values in a trait, that makes them much easier to read and maintain:

namespace Drupal\ldap_user;

trait LdapUserConfigurationValues {
  public static $eventCreateDrupalUser = 1;
  public static $eventSyncToDrupalUser = 2;
  public static $eventCreateLdapEntry = 3;
  public static $eventSyncToLdapEntry = 4;
  public static $eventLdapAssociateDrupalAccount = 5;

Now we can just call them with either depending on the context:

LdapUserConfigurationValues::$eventCreateDrupalUser
self::$eventCreateDrupalUser

These magic values are now put somewhere safe and core’s configuration storage does the rest. Ideally, we would not need even a fraction of these magic values, but it’s a decent interim solution which avoids complex update hooks to transform legacy states into new states.

Fourth obstacle: Unit testing

Once we have had those taken care of and all those other dependencies cleanly resolved, running unit tests became feasible and lots of pseudo-code for integration tests could be removed.

For example, here is TokenTests, where we now can easily mock a function buried deep within the LDAP module:

protected function setUp() {
  parent::setUp();
  /* Mocks the Server due to wrapper for ldap_explode_dn(). */
  $this->serverFactory = $this->getMockBuilder('\Drupal\ldap_servers\Entity\Server')
  ->disableOriginalConstructor()
  ->getMock();

  $this->serverFactory->expects($this->any())
  ->method('ldapExplodeDn')
  ->willReturn($this->data);

  $this->container = new ContainerBuilder();
  $this->container->set('ldap.servers', $this->serverFactory);
  \Drupal::setContainer($this->container);
  // ...
public function testTokenReplacement() {
  // ...
  $nameReplacement = $tokenHelper->tokenReplace($account->reveal(), '[property.name]', 'user_account'); // Test calling ldapExplodeDN()
  $this->assertEquals($ldap_entry['sAMAccountName'][0], $nameReplacement);

The process of writing that test also made the code itself more robust since it was necessary to encapsulate the call to the PHP function ldap_explode_dn() within a wrapper. Due to this wrapper we can now run the test even if the LDAP extension is not installed (such as on drupal.org).

Looking back

Knowing what I know now would I do it the same way again? Definitely not.

It’s important to note that a huge amount of work was already done by larowlan, queenvictoria, and many more to get a first port successfully running. With that already in place, the existing state required a step-by-step refactoring and that is why this approach was needed. It’s also why this journey and its results are likely relevant to other module maintainers with partial ports as well.

Here is how I would now approach a complex module without an existing Drupal 8 port:

  1. Start fresh. Drupal Console makes it so easy to set up the skeletons, just use those and put all the existing code into a legacy folder.
  2. Make a plan. Make a list of each file and function and what it should do and what it currently does. Then question each if it’s still useful and needed.
  3. Migrate functions not structure. Take the useful logic from the legacy folder but question whether the architecture of your files and classes really couldn’t be structured better.
  4. Profit.

If you are still unsure of how good code should look and how to get started with refactoring, I’d recommend Clean Code for an excellent introduction.