Tech Is Hard

Credibility = Talent x Years of experience + Proven hardcore accomplishment

Category Archives: PHP

Re-key Your PHP Array with array_reduce


How much PHP code is dedicated to looping through an array of arrays (rows), comparing a search value against each row’s “column” of that name? Tons. What about re-indexing the array using the search column, so we can directly access the rows by search value?

Here’s a reducing function:

function keyBy ($reduced, $row) {
    $keyname = $reduced ['key'];
    $reduced ['data'] [$row [$keyname]] = $row;
    return $reduced;
}

The most important thing to keep in mind is the desire to obtain one value accumulated from an array. Inside keyBy, $reduced holds that value. The reduction works by accepting the currently reduced value and returning it with any updates you want. keyBy is extremely powerful because it will let me re-index an array of arrays using any column. Since an initial value for $reduced is required, I decided to make use of that argument to pass an arbitrary key name. To prevent any clashes with the data keys, I separated ‘key’ from ‘data’ in $reduced.

To “reduce” an array of rows into a direct-access array, I call keyBy by passing it to array_reduce, with the initial argument indicating which key to index by.

$customers = array (
	array ('id'    => 121, 'first' => 'Jane',	'last'  => 'Refrain'),
	array ('id'    => 290, 'first' => 'Bill',	'last'  => 'Swizzle'),
	array ('id'    => 001, 'first' => 'Fred',	'last'  => 'Founder')
);

$customersByLastName = array_reduce ($customers, "keyBy", array ('key' => 'last'));

print_r ($customersByLastName);
print $customersByLastName ['data']['Founder']['first'];

Array (
  [key] => last
  [data] => Array (
    [Refrain] => Array (
      [id] => 121
      [first] => Jane
      [last] => Refrain)

    [Swizzle] => Array (
      [id] => 290
      [first] => Bill
      [last] => Swizzle)

    [Founder] => Array (
      [id] => 1
      [first] => Fred
      [last] => Founder)
  )
)

Fred

Isn’t that an incredibly small amount of code that gets rid of a lot of code? If the array is searched more than once, it quickly becomes extremely efficient.

Advertisements

World’s Most Efficient Implementation of Discrete Timers in PHP


Well it may not be THE most efficient, but it’s pretty close. I’ve seen a lot of timer code in my time, and it always looks too elaborate for what it needs to do. After all, if I’m using timers then I’m probably very concerned about how long things take — I don’t want to add much overhead to track it. I came up with this code during an important project to allow me to record how long certain functionality was taking.

I wanted something that would be as simple and lightweight as possible. In order to maintain discrete timers that can’t interfere with each other, many timer implementations require instantiating a new instance of some class. Instead, I opted for a static array in my timer function, with the index of the array as a unique timer “name”; this serves the same purpose. With this function, I can print out how long the overall script has been running, at any time. I can create a new timer (which also starts it running), print how long since that timer’s been running and optionally keep the timer’s value or reset it each time I access its current value.

We need an empty static array to hold timers. The first step is to get the current system time. By passing true to microtime(), we get a float representing the number of seconds for the current time. When no timer name (the index in our static array) is passed in, we just need to subtract the script start time from current and return that.

If there is a timer name passed, that’s when things get creative. We have to get the time value from the last call with this timer, or null if the timer hasn’t been used before. If we’re initializing the timer, set it to the current system time and return 0.0 (to force the type) for its value. If the timer exists and we’re resetting (which is the default) it, set it to the current system time.

Finally we return the difference between now and the last call.

I should note that when using elapsed() for the script’s overall execution time, the value is returned in seconds, otherwise the return value is milliseconds (prevents having extremely small numeric values).

/**
* Return an elapsed milliseconds since the timer was last reset.
*
* If a timer "name" is not specified, then returns the elapsed time since the
* current script started execution. This script timer can not be "reset".
* THIS DEFAULT SCRIPT TIMER RETURNS IN SECONDS
*
* Always "resets" the specified timer unless you specify false for the reset
* parameter. Of course, on the first call for a particular timer, it will always
* reset to the time of the call.
*
* examples:
* elapsed(__FUNCTION__); // using __FUNCTION__ or __METHOD__ makes it easy to be unique
* ...
* ...
* echo "It took " . elapsed(__FUNCTION)/1000 . " seconds."
*
* @param string $sTname Name of the timer
* @param boolean $bReset Do you want to reset this timer to 0?
* @return float Elapsed time since timer was last reset
*/
function elapsed($sTname = null, $bReset = true) {

    static $fTimers = array(); // To hold "now" from previous call
    $fNow = microtime(true); // Get "now" in seconds as a float

    if (is_null($sTname))
        return ($fNow - $_SERVER['REQUEST_TIME']);

    $fThen = isset($fTimers[$sTname]) ? $fTimers[$sTname] : null; // Copy over the start time, so we can update to "now"

    if (is_null($fThen) || $bReset) {
        $fTimers[$sTname] = $fNow;
        if (is_null($fThen))
            return 0.0;
    }
    return 1000 * ($fNow - $fThen);
}

printf (
    "Since script started %f Create 2 new timers 'fred' %f and 'alice' %f", 
    elapsed(), elapsed('fred'), elapsed('alice'));

for ($i = 0; $i <100; $i++) {
    printf (
        "Since script started %f 'fred' gets reset %f, but 'alice' doesn't %f", 
        elapsed(), elapsed('fred'), elapsed('alice', false));
}

Since script started 0.391466 Create 2 new timers ‘fred’ 0.000000 and ‘alice’ 0.000000
Since script started 0.391521 ‘fred’ gets reset 0.041962, but ‘alice’ doesn’t 0.037909
Since script started 0.391545 ‘fred’ gets reset 0.021935, but ‘alice’ doesn’t 0.056982
Since script started 0.391563 ‘fred’ gets reset 0.019073, but ‘alice’ doesn’t 0.075102
Since script started 0.391578 ‘fred’ gets reset 0.015020, but ‘alice’ doesn’t 0.090122
Since script started 0.391594 ‘fred’ gets reset 0.015020, but ‘alice’ doesn’t 0.104904
Since script started 0.391609 ‘fred’ gets reset 0.015974, but ‘alice’ doesn’t 0.119925
Since script started 0.391624 ‘fred’ gets reset 0.015020, but ‘alice’ doesn’t 0.134945
Since script started 0.391639 ‘fred’ gets reset 0.015020, but ‘alice’ doesn’t 0.149965
Since script started 0.391654 ‘fred’ gets reset 0.015020, but ‘alice’ doesn’t 0.164986
Since script started 0.391669 ‘fred’ gets reset 0.014782, but ‘alice’ doesn’t 0.180006
Since script started 0.391684 ‘fred’ gets reset 0.015020, but ‘alice’ doesn’t 0.195980
Since script started 0.391699 ‘fred’ gets reset 0.015020, but ‘alice’ doesn’t 0.211000

I think it’s convenient to use a pattern like

function foobar () {
    elapsed (__METHOD__);
    // lots of code
    $timeused = elapsed (__METHOD__);
}

Populate a PHP Array in One Assignment


Imagine you are returning an array you need to populate with some values. This is typical:

    $myArray = array();
    $myArray['foo'] = $foo;
    $myArray['bar'] = $bar;

There are advantages to populating the array as a single assignment:

    $myArray = array(
        'foo' => $foo,
        'bar' => $bar
)

The separation of assignment statements from the initialization (and the initialization to an empty array might be many lines previous), allows for easier corruption of the entries in the array; code might be added later in between the original. People may add things to $myArray without documenting it. And just like Where to Declare, I’ll have to scan more code to determine the effect of making changes to $myArray. The second way shows the _entire_ value of $myArray being set at once, instead of parts of it. It presents a visual of the return structure and makes it harder for $myArray to go awry. I don’t have to look any further to see what $myArray is at that point.

Where to Declare


I think it’s common practice to declare (and sometimes initialize) all of a function’s local variables “at the top”, but I usually declare them close to where they’re referenced. The result of declaring at the top can lead to what I call functional striation and makes refactoring more difficult.

An example:

function innocent_looking_foo() {
    var A, B, C;
	
    do (something with A)
	
    do (something with B)
	
    do (something with C)
	
    return
}

Right from the outset, it looks like this function may not be very cohesive, but the truth is, one sees a lot of code that looks like this in the real world. Let’s say someone adds D and ‘somethingelse’ now has to be done with a few of the variables:

function innocent_looking_foo() {
    var A, B, C, D;
	
    do (something with A)
	
    do (something with B)
    do (somethingelse with B)
	
    do (something with C)

    do (something with D)
    do (somethingelse with D)
	
    return
}

or worse(?)

function innocent_looking_foo() {
    var A, B, C, D;
	
    do (something with A)
    do (something with B)
    do (something with C)
    do (something with D)

    do (somethingelse with B)
    do (somethingelse with D)
	
    return
}

In some shops, this kind of growth continues for years until no one remembers if we really need to do each thing to what. And we have people who add “do (anewthing with D)” in between A and B. Remember that the “do” pseudo statements are usually represented by more than one line of code in our function. If I am working on any part of this, I have to examine the preceding code, all the way to the top, for references to the variable I’m concerned with. there may be dozens or, lets be honest, hundreds of lines where a variable can be corrupted. There is more chance that the order of execution matters, which is bad if we can avoid it.

Now read this:

function foo() {
    var A
    do (something with A)
	
    var B
    do (something with B)
    do (somethingelse with B)
	
    var C
    do (something with C)
	
    var D
    do (anewthing with D)
    do (something with D)
    do (somethingelse with D)	
	
    return
}

This sort of aligns the code with the variables it’s doing work to, and would be easier to refactor as it gets more complex. It’s easier to see repeating patterns and turn them into callable functions.

This guideline is probably even more applicable to variable initialization, irrespective of where it’s declared. Try not to separate the initialization from the first reference.

PHP Property Manager: _pm class


class __pm {

    // internal meta-management

    /**
     * Unsets the indicated properties in client class
     *
     * @param  object  $z
     */
    static function init($z) {
        $oReflectClass = new ReflectionClass(get_class($z));

        // For each public property
        foreach ($oReflectClass->getProperties(ReflectionProperty::IS_PUBLIC) as $oReflectProperty) {
            if (!($sDocBlock = $oReflectProperty->getDocComment()) || false === strpos($sDocBlock, " @pm")) continue;

            $propertyInfo = self::getDocBlockInfo($sDocBlock, $pn = $oReflectProperty->getName());
            unset($z->{$pn});
        }
    }

    /**
     * Returns an array of info from a docblock
     *
     * @param  string $sDocComment  docblock
     * @param  string $pn           property name
     * @return array                parsed docblock
     */
    public function getDocBlockInfo($sDocComment, $pn) {

        // get the "short description" from docblock (start in position 3)
        preg_match("/\.*\s\*\s*([^\n]*)/m", $sDocComment, $desc, 0, 3);
        $return = array('label' => isset($desc[1]) ? $desc[1] : $pn);

        preg_match_all("/[^@]+@(\S+)\s*(\S+)?\s*([^@]+)?\n/", $sDocComment, $matches, PREG_SET_ORDER, 3);
        foreach ($matches as $tokens) {
            // token name
            $token = $tokens[1];

            // if we only have the @token, then its value is true
            if (count($tokens) == 2) {
                $return[$token] = true;
            }
            // otherwise we need arguments to @token as the value
            else {
                array_shift($tokens); array_shift($tokens);
                $return[$token] = $tokens;
            }
        }
        return $return;
    }

    // property accessors

    /**
     * Public property accessor
     *
     * @param  mixed   $z
     * @param  string  $pn
     * @return mixed
     */
    static function get($z, $pn) {
        self::trace();
        return null;
    }

    /**
     * Public property accessor
     *
     * @param  object  $z
     * @param  string  $pn
     * @param  mixed   $pv
     * @return mixed   Passed value
     */
    function set($z, $pn, $pv) {
        self::trace();
        return $pv;
    }

init should be called during construction of your object to set up everything for the properties in the calling class that are marked with @pm.

getDocBlockInfo extracts metadata about the property from the docblock and returns an array.  This metadata combined with other reflection will control how we set up this property. So far it looks like:

(
    [label] => Distribution Center
    [pm] => 1
    [var] => Array
        (
            [0] => DistCenter
        )
)

trace() will log a formatted stack trace using trace_line().  (I’m only putting this functionality in __pm for convenience.)

    // -- logging

    /**
     *
     * Logs a trace to the point where called
     *
     * @param  mixed  String to print | TODO function that returns
     *                a string.
     *                (Arguments and return values are
     *                AUTOMATICALLY printed.)
     * @return void
     */
    static function trace() {

        $argc = count($argv = func_get_args());

        $backtrace = debug_backtrace();
        // default order is inside out, so this puts it in "top to bottom"
        $backtrace = array_reverse($backtrace);

        $indent = 0;

        // get rid of weird google analytic request var
        $request = $_REQUEST;
//      unset($request['XXX']); // get rid of anything you don't always want logged

        $file = isset($backtrace[0]['file']) ? $backtrace[0]['file'] : '';
        $description = "<?php {$file}" . ($argc && count($backtrace) < 2 ? " [{$backtrace[0]['line']}] - {$argv[0]}" : '');

        if (!empty($_SERVER['HTTP_X_REQUESTED_WITH']) && strtolower($_SERVER['HTTP_X_REQUESTED_WITH']) == 'xmlhttprequest')
            $description = "AJAX request - " . $description;

            self::line("{$description} \$_REQUEST: " . self::prin_r($request, true));

        // get rid of the last stack entry for having called the log method, itself
        array_pop($backtrace);
	if (count($backtrace) > 0) {
            for ($i = 0, $stackCount = count($backtrace) - 1; $i++) {
                self::line(self::trace_line($indent, $backtrace[$i]));
            }
            self::line($argc ? self::trace_line($indent, $backtrace[$i], $argv[0]) : self::trace_line($indent, $backtrace[$i]));
        }
    }

    /**
     * Returns a single entry.
     *
     * @param  integer  $indent
     * @param  array    $trace_entry
     * @param  string   $argv[2]
     * @return string
     */
    private static function trace_line(&$indent, $trace_entry) {

        $line = array_key_exists('line', $trace_entry) ? "[{$trace_entry['line']}] " : '';
        $verb = $trace_entry['function'];

        if (array_key_exists('class', $trace_entry)) {
            $classfile = $trace_entry['class'];
            $verb = "->" . $verb;
            if (array_key_exists('object', $trace_entry) and $object = $trace_entry['object']) {
                $id = property_exists($object, 'id') ? $object->id : false;
                $class = "" : '>');
                if ($classfile != $classname)
                    $class .= " {$classfile}";
                $verb = $class . $verb;
            }
        else
            $verb = $classfile . $verb;
        }
        else {
            $file = isset($trace_entry['file']) ? substr($trace_entry['file'], self::$_baseDirLength) : 'unknown file';
            $verb = "{$file} " . $verb;
        }
        $s = str_repeat(" ", $indent) . "{$line}{$verb}(" . self::prin_r($trace_entry['args']) . ")";
        $indent += strlen($line);

        if (count($argv = func_get_args()) > 2)
            return $s . "=" . self::prin_r($argv[2], true);
        else
            return $s;
    }

With a couple useful output and formatting functions:

    static function line($string) {
        $line_start = date("m/d/Y H:i:s ");
        error_log("{$line_start}{$string}\n", 3, "__pm.log");
    }

    static function prin_r($arg) {
        $oneLine = preg_replace("/\s*\n\s*/" , ' ', print_r($arg, true));
        return gettype($arg) == 'object' ? '<'.$oneLine.'>' : $oneLine;
    }
}

PHP Property Manager: User and Model client classes


It will make things go faster to pack a lot of new things into this. User now looks like this:

require_once 'class.Model.php';

class User extends Model {

    public $id;

    /**
     * Distribution Center
     *
     * An object reference to $this User's distribution center object
     *
     * @pm
     * @var DistCenter
     */
    public $DistCenter;

    function __construct($id) {
        parent::__construct();
        $this->id = $id;
    }
}

Assuming that your class hierarchy might have a base class, I’ve added a base class of Model to User. (Remember, in my original implementation, the base class held all the property management, but I don’t want that restriction.) We’ve marked $DistCenter as a managed property with @pm and given it a type. Model should be doing most of the interfacing with __pm; it looks like:

require_once 'class.__pm.php';

class Model {

    function __construct() {
        __pm::init($this);
    }

    /**
     * Generic getter
     *
     * Called everytime a @pm property is referenced
     *
     * @param  string  $prop  Name of object property
     * @return mixed          Value of property.
     */
    public function __get($prop) {
    	try {
            return __pm::get($this, $prop);
    	}
        catch (Exception $e) {
            error_log($e);
            throw $e;
        }
    }

    /**
     * Generic setter
     *
     * Called everytime a @pm property of an object is set
     *
     * @param  string   $prop   Name of object property
     * @param  mixed    $value  Value to set object property to
     * @return mixed
     */
    function __set($prop, $value) {
        try {
            return __pm::set($this, $prop, $value);
    	}
    	catch (Exception $e) {
            error_log($e);
            throw $e;
    	}
    }
}

By supplying __get and __set interceptor functions, your class hierarchy can make use of __pm class. I’m adding logging of any Exception that’s thrown by __pm.

What Happened to my ID?


So far, User looks like

class User {
   public $id;
   public $DistCenter;
   function __construct($id) {
      $this->id = $id;
      Foobar::init();
   }
}

Now Foobar is unsetting $id, which could be fixed by swapping the lines in User’s constructor. But I don’t feel comfortable that this is the only public property I’m going to want to protect and I think the programmer should explicitly declare which properties should be handled by Foobar. I also know I want to use PHPDoc in a disciplined way to make these classes self describing.

Let’s create one for User::DistCenter.

/**
 * Distribution Center
 *
 * An object reference to $this User's distribution center object
 *
 * @foo
 * @var DistCenter
 */

The first line, the short description, will be used for labels and error messages. @var is a standard tag and the type DistCenter will be our class name for distribution centers. @foo is going to be our special tag.

What Am I Modeling?


One reason I like working on legacy systems is that the current intent can be pretty much gleaned from the code and its behavior.  You have to assume it’s working for the most part and when you run into things that look in error, you determine whether there’s a non-obvious reason for it being that way, or it gets confirmed as a problem.  I’m going to use an example business model, but keep in mind that eventually we move all the generic functionality away from the business domain.  I think the example classes might be interesting to look at also.

The classes I was pretty clear on were regions, distribution centers and users.  While there were separate data access modules and service layer modules, cohesion in the latter, a layer that should have been using powerful classes to do all the repetitive logic, was non-existent.  There was absolutely no meaningful state in these classes, they were just a loose collection of related functions.  Each function repeated many of the same data access.  With more than one data method to do the same thing, I had to go check the arguments each time and make sure I was using the best available method.  Terribly granular code, which made people take shortcuts and make assumptions where it was convenient.  Watching the data access log, it was also clear we queried for the same data items many times during a single page load.  I’ve always believed that if you make the common things very easy to do, then I can focus on making my application code really clear and robust.

The 3 entities I listed are in a container relationship: Users are in a single DistCenter and a DistCenter is in a Region.  Starting with a User, instead of explicitly fetching a row in every method that needs to reference the user’s distribution center, I asked something like: “why can’t I instantiate a User with the id parameter and reference $myUser->DistCenter ?”

Let’s start with User, then.

class User {
   public $id;    // user's id and key everywhere
   public $DistCenter;    // reference to $this User's Center
   function __construct($id) {
      $this->id = $id;
   }
   function __get($p) {
   }
   function __set($p, $v) {
   }
}

But when I write

$User = new User();
$Center = $User->DistCenter;

__get doesn’t execute. I want my properties defined explicitly in the classes, but how can I make __get think they’re undefined? After trying null and other tricks, I found that I could unset() them during construction and get my desired result.

class User {
...
   function __construct($id) {
      $this->id = $id;
      Foobar::init($this);
   }
}

class __pm {
   /**
    * Unsets the indicated properties in client class
    *
    * @param  object  $O
    */
   static function init($O) {
      $classType = get_class($O);
      $oReflectClass = new ReflectionClass($classType);

      // For each public property
      foreach ($oReflectClass->getProperties(ReflectionProperty::IS_PUBLIC) as $oReflectProperty) {
         $pName = $oReflectProperty->getName();
         unset($O->{$pName});
      }
   }
}

__pm is the new class to automate all the intelligent property handling. We will have to modify User’s inherent definition a little as we go along to let __pm work, and those changes will become the pattern for any class.

Writing a Powerful PHP Property Manager Class


I’m going to recreate the process I went through in writing a side class that automates everything I could think of to intelligently support powerful business objects.  I think it does an awesome job of balancing performance with “language conformity” — the idea that any specialized library or framework should not require you to configure anything to make it work.  You shouldn’t have to “add this URL to that array” and “put the file in this particular directory”, etc., type of disjointedness.  No black box stuff, although sometimes using PHP interceptor functions, along with inheritance, can seem like magic.

“The only way to get a practical and high performing product is to fully exploit the available technology.”

After working with PHP for about 6 months in a generic “framework” (It was homegrown, but just like other popular frameworks in its organization, it assumed a non-business aware stance.  While I am 100% behind generic implementation, it is clear that this type of thinking as a system structure leads to a non-cohesive codebase), I decided as a way to learn OO PHP, I would try to write some classes that represented the business’ entities and hide a lot of the repeated data access in them.

Obviously I also wanted some way to only access the DB when necessary and to cache results during the server trip as object state.  I also wanted centralized validation.  Having each new method define its own validation rules, the way we were doing it, seemed insane to me.  There set of types to validate was finite, why have to keep defining what a “retailer_id” must look like, right?  And with the rules being repeated and scattered, inconsistencies were guaranteed.

As bonus goals, I wanted auto-complete functionality to be available in supporting IDEs and I wanted to be able to generate detailed class documentation that really gave a picture of the business domain.

What I’m going to try here was originally implemented within the base class of all the business classes. I realized to make it useful it needed to be in an auxiliary class, but had only partially implemented it that way (but with a lot of new and improved features like memcache and validation built in). It was originally written using PHP 5.1.something, I think, so there might be more refinements that would make the interface cleaner. I always kept the magic to a minimum and only used Reflection for initialization (unlike Yii, for instance, which relies heavily on it — I have to think that’s not cheap).

%d bloggers like this: