Tech Is Hard

Credibility = Talent x Years of experience + Proven hardcore accomplishment

Category Archives: code

Software Developers in the Past were Much Better than Today (as a whole)


Can it be argued with?  It seems statistical truth.  Back in the 70s, 80s and part of the 90s, way before everyone and their brother was a “Web developer”, there were few enough people doing the work that the percentage who were very good — with various language, scientific, mathematical and electronics backgrounds — was MUCH, MUCH higher than it could possibly be when MANY, MANY more people are doing it.

Let me put it another way: if the NFL decided to kick start a few thousand more football teams, do you think the quality of play would be the same as it is today?  Wouldn’t there be a lot of people playing that currently don’t have the skill?  There can only be so much supply for a highly-skilled profession.  People can’t just  choose to be good software engineers any more than they can go out and simply train hard enough to be a talented professional football player.

So the number of really good software people has gone from maybe 1 in 10 to 1 in probably 12,000, if you crunch the numbers.

An outcome of the same internet explosion that has outstretched the top quality supply is that everyone can spout off.  It seems too, the amount of self-promotion is inversely proportional to meaningful understanding or communication.  So there’s often this large, single, near-unanimous, overwhelming, AND HORRIBLY WRONG opinion or impression of a concept and how to implement it.

I have to say that it’s possible to be self taught and as good as the educated/highly-experienced.  But it’s very, very unlikely.  The catch is, the self-taught  don’t know enough to know how little they know in comparison.  That is also logical.  When one’s knowledge is limited, he has no way of seeing it.

I’m sorry.

XSL 1.0 str-tolower Template


We’ll add a couple variables and a template to the strings.xsl stylesheet.

<!-- translation for lowercase-->
<xsl:variable name="lcletters" select="'abcdefghijklmnopqrstuvwxyz'"/>
<xsl:variable name="ucletters" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/>

<xsl:template name="str-tolower">
  <xsl:param name="str"/>
  <xsl:value-of select="translate($str, $ucletters, $lcletters)"/>
</xsl:template>

And here’s a new UT to add to strings.test.xsl:

<xsl:call-template name="assert">
  <xsl:with-param name="expected" select="'@apple1'"/>
  <xsl:with-param name="actual">
    <xsl:call-template name="str-tolower">
      <xsl:with-param name="str" select="'@ApPlE1'"/>
    </xsl:call-template>
  </xsl:with-param>
</xsl:call-template>

Simple XSL 1.0 String Templates and Very Simple XSL Unit Testing


In order to reference external examples that evolve, I’m going to start with a stylesheet containing simple string functions and build on it.  There’s also a simple, but useful methodology for unit testing the XSL templates.

Dealing with strings is notoriously annoying in XSL, so in my xsl directory I have strings.xsl, a stylesheet to do repetitive and recursive stuff.  The first function we’ll need is str-replace.  I recently updated this template to be pretty short and sweet.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template name="str-replace">
  <xsl:param name="haystack"/>
  <xsl:param name="needle"/>
  <xsl:param name="repl" select="''"/>
  <xsl:choose>
    <xsl:when test="contains ($haystack, $needle)">
      <xsl:text><xsl:value-of select="substring-before ($haystack, $needle)"/></xsl:text>
      <xsl:text><xsl:value-of select="$repl"/></xsl:text>
      <xsl:call-template name="str-replace">
        <xsl:with-param name="haystack" select="substring-after ($haystack, $needle)"/>
        <xsl:with-param name="needle" select="$needle"/>
        <xsl:with-param name="repl" select="$repl"/>
      </xsl:call-template>
    </xsl:when>
    <xsl:otherwise>
      <xsl:text><xsl:value-of select="$haystack"/></xsl:text>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

<!-- translation for lowercase-->
<xsl:variable name="lcletters">abcdefghijklmnopqrstuvwxyz</xsl:variable>
<xsl:variable name="ucletters">ABCDEFGHIJKLMNOPQRSTUVWXYZ</xsl:variable>

<xsl:template name="str-tolower">
  <xsl:param name="str"/>
  <xsl:value-of select="translate($str, $ucletters, $lcletters)"/>
</xsl:template>

</xsl:stylesheet>

Hopefully it’s a fairly clear template: as long as the $haystack contains $needle, concatenate the string before $needle, the replacement value to use, and the result of calling the template with what comes after the first $needle occurrence.  When there’s no $needle, then it just results in $haystack.

I want to use what I’ve learned about unit testing to prevent regression in the future. It can be extremely hard to figure out which change caused regression in XSL; you may not notice the one scenario that triggers it for a while. I wrote strings.test.xsl to call templates in strings.xsl.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:import href="strings.xsl" />

<xsl:call-template name="assert">
  <xsl:with-param name="expected" select="'@apple1'"/>
  <xsl:with-param name="actual">
    <xsl:call-template name="str-tolower">
      <xsl:with-param name="str" select="'@ApPlE1'"/>
    </xsl:call-template>
  </xsl:with-param>
</xsl:call-template>

<xsl:template match="/">
  <xsl:call-template name="assert">
    <xsl:with-param name="expected" select="'abcBBB-BBBxyz'"/>
    <xsl:with-param name="actual">
      <xsl:call-template name="str-replace">
        <xsl:with-param name="haystack" select="'abcA-Axyz'"/>
        <xsl:with-param name="needle" select="'A'"/>
        <xsl:with-param name="repl" select="'BBB'"/>
      </xsl:call-template>
    </xsl:with-param>
  </xsl:call-template>

  <xsl:call-template name="assert">
    <xsl:with-param name="expected" select="'abc-xyz'"/>
      <xsl:with-param name="actual">
      <xsl:call-template name="str-replace">
        <xsl:with-param name="haystack" select="'aAbcA-AxyzA'"/>
        <xsl:with-param name="needle" select="'A'"/>
      </xsl:call-template>
    </xsl:with-param>
  </xsl:call-template>

  <xsl:call-template name="assert">
    <xsl:with-param name="expected" select="'eleven1'"/>
      <xsl:with-param name="actual">
        <xsl:call-template name="str-replace">
          <xsl:with-param name="haystack" select="'111'"/>
          <xsl:with-param name="needle" select="'11'"/>
        <xsl:with-param name="repl" select="'eleven'"/>
      </xsl:call-template>
    </xsl:with-param>
  </xsl:call-template>
</xsl:template>

<xsl:template name="assert">
  <xsl:param name="expected" select="'missing expected param'"/>
  <xsl:param name="actual" select="'missing actual param'"/>
  <xsl:if test="not ($actual = $expected)">
    <xsl:message terminate="yes">Expected <xsl:value-of select="$expected"/>; got <xsl:value-of select="$actual"/></xsl:message>
  </xsl:if>
</xsl:template>

</xsl:stylesheet>

The way I use strings.test.xsl is with any XML input document, because it doesn’t actually use the input XML.  It might be interesting to come up with a unit testing stylesheet that took the stylesheet to be tested as its input document.  And use the document() function to also do something introspective.

I admit there’s nothing really descriptive explaining what the test is for, but repeating the assert calls is really easy and I just update the template I’m calling or use a new set of parameters for a new condition. It’s a brute force method that works for now.

Self-Contained Data Tables in XSL Stylesheets


I’m going to share a trick I’ve been using in my stylesheets for years, because every time I do it someone gets amazed.

Lookups often need to be performed during transformation of a document. Let’s use the example of taking month numbers and turning them into the standard 3-character abbreviations. Instead of hard-coding the values in a choose, or reading a separate XML document, add your own working data to the stylesheet with a namespace.

<grant:stuff>
    <grant:month-name month="01">Jan</grant:month-name>
    <grant:month-name month="02">Feb</grant:month-name>
    <grant:month-name month="03">Mar</grant:month-name>
    <grant:month-name month="04">Apr</grant:month-name>
    <grant:month-name month="05">May</grant:month-name>
    <grant:month-name month="06">Jun</grant:month-name>
    <grant:month-name month="07">Jul</grant:month-name>
    <grant:month-name month="08">Aug</grant:month-name>
    <grant:month-name month="09">Sep</grant:month-name>
    <grant:month-name month="10">Oct</grant:month-name>
    <grant:month-name month="11">Nov</grant:month-name>
    <grant:month-name month="12">Dec</grant:month-name>
</grant:stuff>

For this to work inside your XSLT, you have to add a namespace prefix declaration for it:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:grant="https://techishard.wordpress.com"

Then you use the document() function without a document, to get this stylesheet:

<xsl:value-of select="document('')/xsl:stylesheet/grant:stuff/grant:month-name[@month = 10]"/>

Will get you
Oct

Populate a PHP Array in One Assignment


Imagine you are returning an array you need to populate with some values. This is typical:

    $myArray = array();
    $myArray['foo'] = $foo;
    $myArray['bar'] = $bar;

There are advantages to populating the array as a single assignment:

    $myArray = array(
        'foo' => $foo,
        'bar' => $bar
)

The separation of assignment statements from the initialization (and the initialization to an empty array might be many lines previous), allows for easier corruption of the entries in the array; code might be added later in between the original. People may add things to $myArray without documenting it. And just like Where to Declare, I’ll have to scan more code to determine the effect of making changes to $myArray. The second way shows the _entire_ value of $myArray being set at once, instead of parts of it. It presents a visual of the return structure and makes it harder for $myArray to go awry. I don’t have to look any further to see what $myArray is at that point.

Where to Declare


I think it’s common practice to declare (and sometimes initialize) all of a function’s local variables “at the top”, but I usually declare them close to where they’re referenced. The result of declaring at the top can lead to what I call functional striation and makes refactoring more difficult.

An example:

function innocent_looking_foo() {
    var A, B, C;
	
    do (something with A)
	
    do (something with B)
	
    do (something with C)
	
    return
}

Right from the outset, it looks like this function may not be very cohesive, but the truth is, one sees a lot of code that looks like this in the real world. Let’s say someone adds D and ‘somethingelse’ now has to be done with a few of the variables:

function innocent_looking_foo() {
    var A, B, C, D;
	
    do (something with A)
	
    do (something with B)
    do (somethingelse with B)
	
    do (something with C)

    do (something with D)
    do (somethingelse with D)
	
    return
}

or worse(?)

function innocent_looking_foo() {
    var A, B, C, D;
	
    do (something with A)
    do (something with B)
    do (something with C)
    do (something with D)

    do (somethingelse with B)
    do (somethingelse with D)
	
    return
}

In some shops, this kind of growth continues for years until no one remembers if we really need to do each thing to what. And we have people who add “do (anewthing with D)” in between A and B. Remember that the “do” pseudo statements are usually represented by more than one line of code in our function. If I am working on any part of this, I have to examine the preceding code, all the way to the top, for references to the variable I’m concerned with. there may be dozens or, lets be honest, hundreds of lines where a variable can be corrupted. There is more chance that the order of execution matters, which is bad if we can avoid it.

Now read this:

function foo() {
    var A
    do (something with A)
	
    var B
    do (something with B)
    do (somethingelse with B)
	
    var C
    do (something with C)
	
    var D
    do (anewthing with D)
    do (something with D)
    do (somethingelse with D)	
	
    return
}

This sort of aligns the code with the variables it’s doing work to, and would be easier to refactor as it gets more complex. It’s easier to see repeating patterns and turn them into callable functions.

This guideline is probably even more applicable to variable initialization, irrespective of where it’s declared. Try not to separate the initialization from the first reference.

Coding for the Long Term


“Long term” here means this, as probably most things I say, may not be applicable to those who don’t have responsibility for evolving a system with a medium or large team of developers over years. In other words, it’s what I call “enterprise”.

Programming is made up largely of a series of decisions. What’s the next step in the process? What’s the next refinement to make? What’s the interface to my software? But the most frequent decisions are made as we write the code itself. How to represent the current state so that our code can be written to achieve the desired state. Do I use an array for this? Do I write a loop here or spend the time looking for a built in function? Do I need a “default” case in this switch?

How we represent our data, or state, will constrain the style of the code — at both a macro and granular level of detail. The shape of the two together have an effect far beyond completing the task at hand. The decisions we make when coding should be influenced by this knowledge.

I’m not saying I know for sure, but when I’ve explained to others before why I emphasized a particular way of doing something, they’d say “That makes a lot of sense. I never thought of it before”. Actually, it’s usually just a discernment for which of two choices is the better overall, and why.

So I’m going to humbly share some of that. Some of it may only apply to one language, but often the concepts are transferable.

REBOL/GnuCash: Listing the Transactions


With a couple additions to the end-states block!, the end-element function now lists the transacions in my GnuCash file, with the account references (contained in the <trn:splits> element of each transaction) dereferenced:

>> do %gc2iam.r
3-Jan-2011/13:02:43-7:00 Hobby Lobby fix chair pic glass 
[[4106 100] ["Home Repair" "EXPENSE" 3b7ddd7ff3110b9bc94f6425bba8fc83] 
[-4106 100] ["Chase MC" "CREDIT" 74bee4c0f2e95ff570291a9d3ea3a3ba]]
12-Jun-2011/10:19:19-6:00 BEST carpet cleaning 
[[12000 100] ["Home Repair" "EXPENSE" 3b7ddd7ff3110b9bc94f6425bba8fc83] 
[-12000 100] ["Discover" "CREDIT" 74bee4c0f2e95ff570291a9d3ea3a3ba]]
10-Sep-2011/0:46:22-6:00 lunch meeting
[[200 100] ["Parking" "EXPENSE" b0b714a0163f90222c5a6769c78ca791] 
[-200 100] ["Cash" "CASH" 74bee4c0f2e95ff570291a9d3ea3a3ba]]
16-Mar-2011/21:02:47-6:00 Jenny's Market 
[[3255 100] ["Gas" "EXPENSE" b0b714a0163f90222c5a6769c78ca791] 
[-3255 100] ["Discover" "CREDIT" 74bee4c0f2e95ff570291a9d3ea3a3ba]

Here’s the relevant part of the handler. I added a /local word to end-element, value, for some manipulation of the string that GnuCash stores, which is a fraction. For instance, $328.23 is represented as 32823/100. There is other currency metadata used to make conversions between these amounts and stock prices, for example. Right now I’m just going to be concerned about “normal” transactions like spending and transfers, but I don’t want to lose any precision until I know more, so I split the string using “/” and convert each component to an integer. I changed the to use to make; the implication is that make will give me a new item of my desired type, while to technically converts the argument. Remembering the admonition to use copy when initializing from something that will change, I think make must be safer here.

trnAccts: copy []
guid:               name:               class: 
parent:             trnDate:            description:  
none

end-states: [
    "act:id"          [guid:           make word! content]
    "act:name"        [name:                 copy content]
    "act:type"        [class:                copy content]
    "act:parent"      [parent:         make word! content]
    "trn:description" [description:          copy content]
    "ts:date"         [trnDate:        make date! content]
    "split:value"     [
        value: split content #"/" 
        append/only trnAccts reduce [make integer! value/1 make integer! value/2]
    ] 
    "split:account"   [append trnAccts to word! content]

    "gnc:account"     [set guid reduce [name class parent]]
    "gnc:transaction" [
        print [trnDate description remold trnAccts]
        clear head trnAccts
    ]
]
end-element: func [
    ns-uri		[string! none!]  ns-prefix [string! none!]
    local-name	[string! none!]  q-name    [string!]
    /local value
][ 
    switch q-name end-states
    clear head content
]

REBOL/GnuCash: using REBOL words


This work is going through our GnuCash data again, but not trying to be anything generic. Instead we’re going to set REBOL words to values that we mine and see how that works as an automatic dereferencing setup. REBOL has a lot in common with Lisp, or so is my understanding of the matter, and I took a cue from Paul Graham:

Lisp’s symbol type is useful in manipulating databases of words, because it lets you test for equality by just comparing pointers. (If you find yourself using checksums to identify words, it’s symbols you mean to be using.)

We’re going to parse the XML with the statementgnucash/parse-xml read %2011.xml. So let’s look at this incarnation of the parser/handler:

gnucash: make xml-parse/parser [
    set-namespace-aware true
    handler: make xml-parse/xml-parse-handler [

        content: copy ""

        characters: func [ characters [string! none!] /local trimmedContent][
            if all [found? characters not empty? (trimmedContent: trim characters)][
                append content trimmedContent
            ]
        ]

        name: class: description: ""
        guid: parent: amount: act: none

        end-states: [
            "act:id"          [guid:   to word! content]
            "act:name"        [name:       copy content]
            "act:type"        [class:      copy content]
            "act:parent"      [parent: to word! content]
            "gnc:account"     [set guid reduce [name class parent]]
        ]

        end-element: func [
	    ns-uri	[string! none!]  ns-prefix [string! none!]
	    local-name	[string! none!]  q-name    [string!]
        ][
            switch q-name end-states
            clear head content
        ]
    ] ; handler
] ; gnucash

It’s a lot shorter. Granted, it’s specifically coded for certain elements, but it should be apparent from the definition of end-states how we could build it dynamically. end-states is just a block! that end-element uses as an argument to its switch function. And because of the schema, I can process everything in the end-element function. In other words I know the other elements in my end-states are all inside gnc:account. As end-element encounters any of the element names listed in end-states, it performs the corresponding block of actions. I used the fully qualified name, because many local-names are reused among GnuCash’s namespaces.

BTW, I really like how this works: if all [found? characters not empty? (trimmedContent: trim characters)]; we make sure we have a string and then trim whitespace, appending if there’s something left over.  Compare to your favorite language.

For the string! values, I have to copy the content, otherwise what’s being referenced keeps changing.  But for word! values I didn’t bother with copy; my deduction is that to word! gives me a copy of the value as a word! type.  And when we reach the end of “gnc:account” we know we have everything, so we set a word! for this account’s guid and set that word! to the value of this account with:
set guid reduce [name class parent]
If I pick one of the account guids from my file and probe for it, I can see it’s been set in REBOL:

>> probe get to word! "2775e139d4404298cf73e6316db71cbd"
["Taxable" "INCOME" 921eddc8eb3349d0a818ef6e52417b81]

(We have to use to word! and enter the value as a string so REBOL’s console doesn’t try to interpret it as some other type just based on the characters.)
Where it gets interesting is if I print the value in certain ways. The guid “parent” reference (3rd item in the block) gets dereferenced for me automatically. It’s a word and it has a value.

>> print remold get to word! "2775e139d4404298cf73e6316db71cbd"
["Taxable" "INCOME" ["Investment" "INCOME" e84eba6fde334896d99a24c62d1162d3]]

I think that’s pretty neat.

PIM using REBOL 2.1 more parsing


Since REBOL is new to me, I continue to experiment with the syntax to see what feels right to me in terms of the fundamental criteria – low coupling and high cohesion. I simplified the high-level rule and put some of the processing in the lower level matching rules. I added some flexibility in ordering of terms in the input, but you have to have the description last; since it’s intended to allow any input it will greedily match the date. I think the date format needs to eventually change. I’m ignoring the definitions of the lowest level terms from here on (letter, digit, etc.) except to say that the rules for day and month now set local values:

    day:      [copy the-day [ [#"1" | #"2" | #"3"] [#"0" | #"1"] | 
                              [#"1" | #"2"] digit | digit ] ]
								
    month:    [copy the-month [ "jan" | "feb" | "mar" | "apr" | "may" | "jun" | 
		                "jul" | "aug" | "sep" | "oct" | "nov" | "dec" ] ]

It may not be proper style to be clearing things at the beginning of the sentence rule, but it works and is clear. I got rid of the subject handling we had before, and I’m trying out a handler concept for tags. All we do when the rule completes now is print the parsed terms.

sentence: [( the-date: now/date
             the-subject: copy ""
             the-desc: copy ""
             the-amount: none
             the-tags: copy [] )
						
    some [ [some ws] | ["on" some ws date] | subject | amount | copy the-desc desc] 
    (print [the-date the-desc the-subject the-amount the-tags])
]

I changed the main rule to use some around a choice, instead of a bunch of alternative choices. The terms are set by the individual rules like date and desc:

desc:    [some [tag | some name-char | ws]]
tag:     [copy a-tag [ #"#" some name-char ] (tag-handler a-tag append the-tags a-tag)]
subject: [copy the-subject [ #"@" letter any name-char ]]
amount:  [copy the-amount [ opt sign some digit opt [ "." 0 2 digit ] ] 
             (the-amount: to decimal! the-amount)]
date:    [[ day | month ] [ [some ws] | #"/" ] [ month | day ] (
             default: now
             the-date: either (current: to date! rejoin [the-day "-" the-month "-" default/year]) > (default/date + 200)
                 [to date! rejoin [ the-day "-" the-month "-" (default/year - 1) ]]
                 [current]
             clear the-day clear the-month
         )]

You can see the tag-handler called in the tag rule. We’ll get to that. The date rule calculates the year automatically; if it’s more than 200 days in advance, we assume last year. My default tag-handler is expecting the current tag, and the accumulated tags block.

tag-handler: func [
    tag [string!]
    tags [block!]
][
    print ["tag found" tag "/" tags "/"]
]

There’s multiple ways in REBOL for us to supply that tag-handler, which we’ll look at as our needs become clearer. Run the rule with a reasonably human sentence:

>> parse/all “@visa on 19 Dec -230.21 #statefarm #insurance for #condo interior contents” iAm/sentence
tag found #statefarm / #statefarm /
tag found #insurance / #statefarm #insurance /
tag found #condo / #statefarm #insurance #condo /
19-Dec-2011 #statefarm #insurance for #condo interior contents @visa -230 #statefarm #insurance #condo
== true

Looking at it now, I might like moving the copy commands from the sub-rules back into sentence. I plan soon to import all my GnuCash transactions into a REBOL format I can query and add to. I’ll need to be able to query and edit them, and I’m hoping that it’s unnecessary to have a bunch of hierarchical expense and income accounts.

It hit me today that tagging allows “multiple inheritance” in contrast to hierarchies of organization.

Up until the last few years I’ve been doing my own taxes, and the only useful information I can recall needing is totals for certain categories of spending. Which forces me to categorize the transaction under only that applicable account. I’d rather tag it with meaningful attributes, among them #tax #deduction.

%d bloggers like this: