Tech Is Hard

Credibility = Talent x Years of experience + Proven hardcore accomplishment

Rebuilding existing systems


My favorite thing is to rewrite a legacy system. Not like a “rewrite the whole system and flip a switch and we’re using it”, but instead an incremental process that might take 12 to 18 months before you have a new system, without ever migrating. I’m calling it “rebuilding” to keep with all the construction analogies we use, because it involves re-engineering, re-architecting and rewriting . We should admit that most legacy systems, if compared to a building, are like the Winchester house. “Remodeling” might be better term, because you keep the system functioning the whole time it’s being rewritten. But in the process, we demolish entire rooms and floors, rebuilding new ones that make sense.

Sometimes I describe it as performing a complete organ transplant while keeping the patient alive. Actually the patient’s health keeps improving as we operate.

I’ve been doing it since the beginning of my programming career, because it’s in my nature to always see “what’s missing” from the current. When that natural inclination is combined with the universal theme “Don’t touch it if you don’t have to. Don’t fix it if it ain’t broke.”, a person learns how to incrementally shim new ways of doing things into a software system.

At at least 2 companies, I completely changed the internal architecture of existing systems this way. Huge systems, some in assembler and C. During the process, the way new software was written changed a lot, with tools and libraries of [macros | functions | classes] to remove the drudgery, hide the housekeeping, bring new expressiveness and shrink code.

Smaller code is better. As you see a larger and larger view of your software system and as the file system structure and classes begin to describe the process of your business system, dramatic gains are made. Applications behave consistently and users are happy. Error handling gets the attention it deserves (things go wrong a lot more than we like to think). As old PITAs go away, hidden inside pre-written code, we begin to think and write more elegantly. We start to imagine practical implementations where it seemed impossible before.

What’s been my secret weapon at successfully doing this? I admit that in my mind, rebuilding an existing system is easier than something brand new. I have a running model of what it should do, at least mostly do. The transformation becomes a process of changing, checking regressions and validating new behavior in the case of something that never worked right to begin with.

I always work with a library mentality. Frameworks that hem everything into an arbitrary predetermined structure are for saps. And would require a top down approach, which gets you back to trying to rewrite the system from scratch. You have to work from the inside out, bottom up, to change the existing system. Wherever you create something that will be used for new development, you have to bounce back and forth between the most efficient and maintainable layering of internal functionality and making the application level interface revolutionarily more simple and/or powerful than existing ways of doing the job.

To inject the new into an old system, we also have to be ready to write things that we will throw away and still devote full attention to its quality, lest we regress.

How to Easily and Consistently Round Up with Integer Division


I’ve encountered this in different languages for different purposes.  The first time was some assembler code that calculated how many total bytes it would take to hold an arbitrary number of bits.  The problem is it’s usually coded the way we think about the problem: divide, then if there’s a remainder, add 1 to the quotient.

A quick, easy way without any special checking, is to add the (divisor-1) to the dividend before doing the division.  In the bits to bytes example, I changed the code to do it this way

howManyBytes = (howManyBits + 7)/8

Instead of this way

howManyBytes = howManyBits/8
if (howManyBytes*8 < howManyBits)
  howManyBytes ++

You’re guaranteed to get the rounded up number without any extra work.

Re-key Your PHP Array with array_reduce


How much PHP code is dedicated to looping through an array of arrays (rows), comparing a search value against each row’s “column” of that name? Tons. What about re-indexing the array using the search column, so we can directly access the rows by search value?

Here’s a reducing function:

function keyBy ($reduced, $row) {
    $keyname = $reduced ['key'];
    $reduced ['data'] [$row [$keyname]] = $row;
    return $reduced;
}

The most important thing to keep in mind is the desire to obtain one value accumulated from an array. Inside keyBy, $reduced holds that value. The reduction works by accepting the currently reduced value and returning it with any updates you want. keyBy is extremely powerful because it will let me re-index an array of arrays using any column. Since an initial value for $reduced is required, I decided to make use of that argument to pass an arbitrary key name. To prevent any clashes with the data keys, I separated ‘key’ from ‘data’ in $reduced.

To “reduce” an array of rows into a direct-access array, I call keyBy by passing it to array_reduce, with the initial argument indicating which key to index by.

$customers = array (
	array ('id'    => 121, 'first' => 'Jane',	'last'  => 'Refrain'),
	array ('id'    => 290, 'first' => 'Bill',	'last'  => 'Swizzle'),
	array ('id'    => 001, 'first' => 'Fred',	'last'  => 'Founder')
);

$customersByLastName = array_reduce ($customers, "keyBy", array ('key' => 'last'));

print_r ($customersByLastName);
print $customersByLastName ['data']['Founder']['first'];

Array (
  [key] => last
  [data] => Array (
    [Refrain] => Array (
      [id] => 121
      [first] => Jane
      [last] => Refrain)

    [Swizzle] => Array (
      [id] => 290
      [first] => Bill
      [last] => Swizzle)

    [Founder] => Array (
      [id] => 1
      [first] => Fred
      [last] => Founder)
  )
)

Fred

Isn’t that an incredibly small amount of code that gets rid of a lot of code? If the array is searched more than once, it quickly becomes extremely efficient.

World’s Most Efficient Implementation of Discrete Timers in PHP


Well it may not be THE most efficient, but it’s pretty close. I’ve seen a lot of timer code in my time, and it always looks too elaborate for what it needs to do. After all, if I’m using timers then I’m probably very concerned about how long things take — I don’t want to add much overhead to track it. I came up with this code during an important project to allow me to record how long certain functionality was taking.

I wanted something that would be as simple and lightweight as possible. In order to maintain discrete timers that can’t interfere with each other, many timer implementations require instantiating a new instance of some class. Instead, I opted for a static array in my timer function, with the index of the array as a unique timer “name”; this serves the same purpose. With this function, I can print out how long the overall script has been running, at any time. I can create a new timer (which also starts it running), print how long since that timer’s been running and optionally keep the timer’s value or reset it each time I access its current value.

We need an empty static array to hold timers. The first step is to get the current system time. By passing true to microtime(), we get a float representing the number of seconds for the current time. When no timer name (the index in our static array) is passed in, we just need to subtract the script start time from current and return that.

If there is a timer name passed, that’s when things get creative. We have to get the time value from the last call with this timer, or null if the timer hasn’t been used before. If we’re initializing the timer, set it to the current system time and return 0.0 (to force the type) for its value. If the timer exists and we’re resetting (which is the default) it, set it to the current system time.

Finally we return the difference between now and the last call.

I should note that when using elapsed() for the script’s overall execution time, the value is returned in seconds, otherwise the return value is milliseconds (prevents having extremely small numeric values).

/**
* Return an elapsed milliseconds since the timer was last reset.
*
* If a timer "name" is not specified, then returns the elapsed time since the
* current script started execution. This script timer can not be "reset".
* THIS DEFAULT SCRIPT TIMER RETURNS IN SECONDS
*
* Always "resets" the specified timer unless you specify false for the reset
* parameter. Of course, on the first call for a particular timer, it will always
* reset to the time of the call.
*
* examples:
* elapsed(__FUNCTION__); // using __FUNCTION__ or __METHOD__ makes it easy to be unique
* ...
* ...
* echo "It took " . elapsed(__FUNCTION)/1000 . " seconds."
*
* @param string $sTname Name of the timer
* @param boolean $bReset Do you want to reset this timer to 0?
* @return float Elapsed time since timer was last reset
*/
function elapsed($sTname = null, $bReset = true) {

    static $fTimers = array(); // To hold "now" from previous call
    $fNow = microtime(true); // Get "now" in seconds as a float

    if (is_null($sTname))
        return ($fNow - $_SERVER['REQUEST_TIME']);

    $fThen = isset($fTimers[$sTname]) ? $fTimers[$sTname] : null; // Copy over the start time, so we can update to "now"

    if (is_null($fThen) || $bReset) {
        $fTimers[$sTname] = $fNow;
        if (is_null($fThen))
            return 0.0;
    }
    return 1000 * ($fNow - $fThen);
}

printf (
    "Since script started %f Create 2 new timers 'fred' %f and 'alice' %f", 
    elapsed(), elapsed('fred'), elapsed('alice'));

for ($i = 0; $i <100; $i++) {
    printf (
        "Since script started %f 'fred' gets reset %f, but 'alice' doesn't %f", 
        elapsed(), elapsed('fred'), elapsed('alice', false));
}

Since script started 0.391466 Create 2 new timers ‘fred’ 0.000000 and ‘alice’ 0.000000
Since script started 0.391521 ‘fred’ gets reset 0.041962, but ‘alice’ doesn’t 0.037909
Since script started 0.391545 ‘fred’ gets reset 0.021935, but ‘alice’ doesn’t 0.056982
Since script started 0.391563 ‘fred’ gets reset 0.019073, but ‘alice’ doesn’t 0.075102
Since script started 0.391578 ‘fred’ gets reset 0.015020, but ‘alice’ doesn’t 0.090122
Since script started 0.391594 ‘fred’ gets reset 0.015020, but ‘alice’ doesn’t 0.104904
Since script started 0.391609 ‘fred’ gets reset 0.015974, but ‘alice’ doesn’t 0.119925
Since script started 0.391624 ‘fred’ gets reset 0.015020, but ‘alice’ doesn’t 0.134945
Since script started 0.391639 ‘fred’ gets reset 0.015020, but ‘alice’ doesn’t 0.149965
Since script started 0.391654 ‘fred’ gets reset 0.015020, but ‘alice’ doesn’t 0.164986
Since script started 0.391669 ‘fred’ gets reset 0.014782, but ‘alice’ doesn’t 0.180006
Since script started 0.391684 ‘fred’ gets reset 0.015020, but ‘alice’ doesn’t 0.195980
Since script started 0.391699 ‘fred’ gets reset 0.015020, but ‘alice’ doesn’t 0.211000

I think it’s convenient to use a pattern like

function foobar () {
    elapsed (__METHOD__);
    // lots of code
    $timeused = elapsed (__METHOD__);
}

Good Is As Good Does


If the technical situation in the environment is not getting better over time, then the team’s output is not good. If the output is not good, what are we doing it for? And can we call ourselves good if our output isn’t?

Making Mongoose Keep Its Promises on the Server


So far I really like the deferred/promise model. On the client it lets me get data as early as I want to, but defer using the response until it’s convenient; maybe not until inside some event handler. I can also place a call to .fail() for any promise in a convenient place, even before I make a call for data. The Deferred object coordinates calling any registered fail function(s) if anything errors — that’s right, you can specify multiple handlers for the failure — and all the done, then and always functions that have been registered, when appropriate.

To do the same thing on the server, I found jQuery Deffered for nodejs. After requiring it

var Deferred = require( "JQDeferred" );

and with a little function in my application

function promise (query) {
  var dfd = Deferred ();

  query .run (function (err, doc) {
    if (doc) {
      if (doc.length > 0 || Object.keys(doc).length > 0) {
        console .log ("found\n", util .inspect(query));
        dfd .resolve (doc);
      }
      else {
        console .log ("not found or no rows\n", util .inspect(query)); 
        dfd .resolve (null);
      }
    }
    else {
      console .log ("error or no data object\n", util .inspect(query));
      dfd.reject (err ? err : null);
    }
  });
  return dfd .promise ();
}

Works for any Mongoose method that returns a query, like find, findOne. It’ll need to change to fold in the rest of my calls. In one function, I need to get 3 rows to verify something before I take action. Using jQuery Deferred and this promise function, dealing with 3 asynchronous calls with very little code looks like

var CloseAll = function (query, resp) {

  Deferred .when (
    promise (modelA .findOne ({modelAkey:query.body.A})), 
    promise (modelB .findOne ({modelBkey:query.body.B})), 
    promise (modelC .findOne ({modelCkey:query.body.C}))) 
    .fail (function (err)    { resp .send (_.extend (query.body, err), 500); })
    .done (function (dataA, dataB, dataC) {
          now do stuff that require all 3 things

Pretty sweet.

Promises, promises. A concise pattern for getting and showing my JSON array with jQuery and Underscore


I’m forgetting the async/sync stuff from before in favor of thenables. Deferred and its more polite personality, promise. There are just certain things that I think would be really tough to do with a callback architecture.

Normally, if I were showing a view of something I wouldn’t $.get() it from the server until my DOM is loaded, because I can’t do anything with the response until I’ve got elements to manipulate. But with jQuery’s implementation of AJAX calls, which return a promise, I can set it up so that I get data as soon as the Javascript is loaded and guarantee that I don’t try to move the data to DOM elements until it’s ready, too.

var mydata = $.get (window.location.origin + ":8080/foobar");

$( function () {
  mydata .done (showdata);
});

function showdata (data) {
  $( 'ul#something') .html (
      _.map (data, function (row) {
           return '<li>' +row.name +'</li>';
      }) .join ( '' ) );
}

If data looked like [{“name”:”john”},{“name”:”bob”},{“name”:”fred”},{“name”:”alice”}], then our <ul> now looks like

<ul id="something"><li>john</li><li>bob</li><li>fred</li><li>alice</li></ul>

I think that’s slick. showdata won’t get called until the DOM’s loaded and we have the data, but there’s no waiting before we ask for it from the server. We use map() to output an array of HTML strings (instead of concatenating), then join those and put the output into $(‘ul#something’). There’s only one variable used — which is a very good thing, BTW, lots of variables is a code smell, IMO.

Another nice pattern I use:

Imagine this jQuery GET for some set of choices within a “topic”

var topic_data = $.get (
  window.location.origin + ':8080' + '/topic/' + some_criteria);
{"name":"foobar","choices":["bob","fred","alice"]}

And separate documents for a user’s choice for that topic, which are referenced by the “name” of the topic.  In other words, I had to get that topic to find the key to use for the next GET.  I need to render the topic_data in showTopicChoices regardless of what happens in that GET.

topic_data .done (showTopicChoices, function (topic) {
  $.get (window.location.origin + ':8080' + '/user/topic/' + topic.name) 
    .done (overlayUserChoice);
 });

It’s a nice way to separate the rendering of more general data from other indicators.

Model


I so hate how poorly MODEL is understood in Web development – quote from mongoose thread: “Try not to push too much down to the model – helpers on the model are
generally to ensure the integrity of the model, rather than for enforcing business logic”.
But here’s the definition

Finally, the model manages the behavior and data of the application domain, responds to requests for information about its state (usually from the view), and responds to instructions to change state (usually from the controller).[3]

“Application domain”.  That means business logic, if it relates to the state of the object.  Can’t you guys just rename your thingies data abstraction classes?  That’s what they are.

A virtual property, or a property with accessors, should be available in every type of access on the model. That’s how we hide the details of of data representation in the model.

Powerful jQuery Shorthand to Move Data Into DOM


I realize this is basic, though not obvious, jQuery stuff, but it’s powerful in the amount of work that can be done.  One could devise a fairly elaborate convention to make rendering driven largely by the markup.  Assume some JSON object like

var myJson = { "name": "Grant", "age": "ancient", "position": "seated" };

and elements like

 
<p myns:field="name"></p>
<p myns:field="age"></p>
<p myns:field="position"></p>

With this jQuery call I can set the text of all the elements without any knowledge of the model’s contents.

    $('[myns\\:field]') .text(function(idx, txt) {
      return myJson [$(this) .attr ('myns:field')];
    });

There’s a lot of potential; I’ll have to see if it’s worth it under varied scenarios.

How to do a Better Job at Meta-programming


Or

How I thought like a compiler 20 years ago to create a reporting system processing millions of 32k byte, varying length, records on tape per hour against hundreds of user-defined report packages .

Imagine, if you will, a young lad in front of a 3270 terminal, a green cursor blinking in his face, eager to do the impossible.  A couple years before (1988 through 89) I had been a contractor for my now employer and had written a set of assembler macros that could be used to define reports against a proprietary “database” of varying-length documents, up to 32k in length.  In order to pack as much sparse data as possible onto tape cartridges, each record contained fixed and varying sections and embedded bitmaps used to drive data access code.  My manager had been the original author (there’s not too many people who could have pulled it off, in retrospect, “hats off”, Jeff and Phil.)  Each database had a its own “schema” as a file of assembler macros that was compiled to a load module of control blocks.

To complement the low level code was a set of application macros to locate or manipulate data in the record and to examine the schema.  They should have allowed schema-independent programming, but in reality, to code for every possible data type and configuration in which the application data may lie was prohibitive, so some assumptions were usually made: “order records and item records may occur multiple times, always“, “an order will always have an amount and a date field”.

When I had done the original reporting system, it had to work for all schemas, so that meant putting a lot of limitations on the possible reports.  We were really coding to meet a certain set of specific reports, but in a generic way.  It worked pretty well, but the code had to be modified a lot to accommodate “unforeseen” reporting needs.

When I was tasked with the next generation, I knew I wanted the report definitions to be in a “grammar”.  Some type of structured way to define a report, that took into account nesting, ranging, counting and summing.  I also knew that it had to handle any schema and any “reasonable” report.  I shocked the team lead when I threw the stack of new reports into the trash that we were to use as our requirements.  I said “let’s write a spreadsheet concept and forget the specifics.  We’ll check them from time to time to make sure what I’m doing satisfies, but I want my blinders off.”

To do all these things, even (maybe more so) in a modern database technology would be hard.  With our data store I realized I had to think of something new and spent weeks drawing boxes and arrows and writing little tests.  When I was finished, my programs could handle a great variety of reports and process data faster than code that was schema specific.  Marketing could define their own reports now to analyze the sea of demographic data we held.

Next: what a sample report’s data looked like.