Tech Is Hard

Credibility = Talent x Years of experience + Proven hardcore accomplishment

Tag Archives: parse

REBOL interface to GnuCash: Working with the GnuCash XML

Now to take the file we created and start messing with it.

I found an option to turn off compressing the data file for GnuCash, so I want to focus on parsing the XML.

There’s a few REBOL scripts that work with XML; I chose xml-parse.r.  Initially I used its builtin block parsing methods to turn the XML into REBOL blocks that I could work with, but I decided that parsing the entire document into blocks first, just so I could try random REBOL series! functions against it would be a huge waste.  So, I ended up writing a gnucash handler to look for specific elements as they’re parsed and save the child nodes of each as key/value pairs.

As a first try I want to supply the source XML and a request/response block!:
>> gnucash/list read %2011.xml [“gnc:transaction” []]

rebol [
  Title: "Process GnuCash data file"
  File: "%parse-gc-xml.r"
  Date:  13 Dec 2012
  Purpose: {
    interface to gnucash data

do %xml-parse.r
gnucash: make xml-parse/parser [
    subjects: copy []
    content: copy ""
    row: none
    set-namespace-aware false

    list: func [ gc-xml [string!] subjects [block!] ][
        self/subjects: copy subjects
        parse-xml gc-xml
        return self/subjects
    handler: make xml-parse/xml-parse-handler [

        characters: func [ characters [string! none!] ][
            if all [ 
                found? row 
                found? characters 
                append content characters
        start-element: func [
            ns-uri [string! none!]
            local-name [string! none!] q-name [string!]
            attr-list [block!]
            clear head content
            if found? rowset: select subjects q-name [ 
                append/only rowset row: copy []
        end-element: func [
            ns-uri [string! none!]
            local-name [string! none!] q-name [string! ]
            if found? row [ 
                either select subjects q-name 
                    [ row: none ][ repend row [q-name copy content] ]
    ] ; handler
] ; gnucash

Try it out.
>> do %parse-gc-xml.r
>> trns: gnucash/list read %../2011.xml [“gnc:transaction” []]
== [“gnc:transaction” [[“trn:id” “a1ddec4b4fe26e84745a8cddac018620” “cmdty:space” “ISO4217” “cmdty:id” “USD” “trn:currency” “” “ts:…
>> trns/1
== “gnc:transaction”
>> trns/2
== [[“trn:id” “a1ddec4b4fe26e84745a8cddac018620” “cmdty:space” “ISO4217” “cmdty:id” “USD” “trn:currency” “” “ts:date” “2011-01-03 0…
>> trns/2/1
== [“trn:id” “a1ddec4b4fe26e84745a8cddac018620” “cmdty:space” “ISO4217” “cmdty:id” “USD” “trn:currency” “” “ts:date” “2011-01-03 00…
>> trns/2/2
== [“trn:id” “1849b854cd0ad3fbbddc94f3d661672a” “cmdty:space” “ISO4217” “cmdty:id” “USD” “trn:currency” “” “ts:date” “2011-05-20 00…

So we can think of trns as having “rows” of gnc:transaction attributes.  Precisely, its second entry is the returned rowset.  If I want the date from the second transaction, I can.

>> select trns/2/2 “ts:date”
== “2011-05-20 00:00:00 -0600”

We can collect multiple rowsets with one call by adding additional element/block! pairs in the argument.

>> result: gnucash/list read %../2011.xml [“gnc:commodity” [] “gnc:account” []]
== [“gnc:commodity” [[“cmdty:space” “ISO4217” “cmdty:id” “USD” “cmdty:get_quotes” “” “cmdty:quote_source” “currency” “cmdty:quote_t…
>> acts: find result “gnc:account”
== [“gnc:account” [[“act:name” “Root Account” “act:id” “6197eee7c4c8c51c914cd3aa4114ef44” “act:type” “ROOT”] [“act:name” “Expenses”…
>> acts/1
== “gnc:account”
>> select acts/2/1 “act:name”
== “Root Account”

%d bloggers like this: