Tech Is Hard

Credibility = Talent x Years of experience + Proven hardcore accomplishment

PIM using REBOL 2

I’ve updated the parsing rules. I find the REBOL parse dialect to be a level above regular expressions. I know people who are regex Jedi masters, but it really isn’t comparing apples to oranges. In REBOL’s BNF-like parse rules, one can define a grammar. Recursive and self-referential productions are possible — just like in human language. The parse rules most often contain REBOL code, surrounded by (), to execute at important points.

sentence: [ 
    [subject | date | amount] some ws 
    [subject | date | amount] some ws 
    [subject | date | amount]
        either found? ctx: find subjects the-subject [
            do ctx/2
            append subjects compose/only/deep [ 
                (the-subject) [print ( reform ["!!!! found" the-subject] )] 

        print [ "DATE:" the-month the-day "ACCOUNT" the-subject "AMOUNT" the-amount ]

subject: [copy the-subject [ #"@" letter any name-char ]]
date: 	 ["on" some ws copy the-day day some ws copy the-month month]
amount:  [copy the-amount [ opt sign any digit opt ["."] 0 2 digit ]]

day: 	 [	[#"1" | #"2" | #"3"] [#"0" | #"1"] | 
		[#"1" | #"2"] digit | 
month: 	[ "jan" | "feb" | "mar" | "apr" | "may" | "jun" | 
	  "jul" | "aug" | "sep" | "oct" | "nov" | "dec" 

ws: 	   charset reduce [tab newline #" "]
sign: 	   [ #"+" | #"-" | none ]
name-char: [ letter | digit ]
letter:	   charset [#"A" - #"Z" #"a" - #"z"]
digit:	   charset [#"0" - #"9"]

(I don’t like the way I implemented allowing the terms in any order, but it can wait.)

You can see copy in the definitions for subject, date and amount to set a word (each beginning with the-) to the value that matches the expression that follows. At the end we process or collect the-subject.  If already found we can execute some code that’s unique to that subject. Otherwise I’m saving it and some stupid code to print that it was found. This is the code that will get executed when we find it later. The really cool thing about this (and it could be done in REBOL a number of ways) is that code is built at run time, as specifically tailored as you need it to be.

form, reform, mold, remold, join, rejoin, reduce and compose still confuse me some, so I always have the language reference open :), but here we use compose to reduce (evaluate, sort of) the items in its argument that are surrounded by (). The /only refinement leaves blocks it finds as blocks. Inner blocks are processed with the /deep refinement.

compose/only/deep [ (the-subject) [print ( reform ["!!!! found" the-subject] )] ]

I’ll use a journal block to save an audit trail of the statements being parsed and a block to hold the accumulated accounts/contexts:

journal:  copy [] ; use copy to initialize
subjects: copy []

and a set of sample statements:

samples: [
	"@discover on 11 Dec -13.22"
	"@discover -56.47 on 17 Dec"
	"on 13 Jan -1.20 @visa"
	"on 4 Jan @checking -1.01"

Now call parse with each of the statements, and see that we can gather the information we’ve set up rules for. The formatted lines show the values being copied to the correct words.

>> foreach statement samples [
[    either parse/all statement sentence [repend journal [ now statement ]][ print ["*** Couldn't parse" statement ]]
[    ]
DATE: Dec 11 ACCOUNT @discover AMOUNT -13.22
!!!! found @discover
DATE: Dec 17 ACCOUNT @discover AMOUNT -56.47
DATE: Jan 13 ACCOUNT @visa AMOUNT -1.20
DATE: Jan 4 ACCOUNT @checking AMOUNT -1.01

Print the journal where we saved every statement with a timestamp.

>> forskip journal 2 [print ["timestamp" journal/1 "statement" journal/2]]
timestamp 29-Dec-2011/21:07:48-7:00 statement @discover on 11 Dec -13.22
timestamp 29-Dec-2011/21:07:48-7:00 statement @discover -56.47 on 17 Dec
timestamp 29-Dec-2011/21:07:48-7:00 statement on 13 Jan -1.20 @visa
timestamp 29-Dec-2011/21:07:48-7:00 statement on 4 Jan @checking -1.01

The subjects block which contains the subjects that were encountered and a small code block to execute for each.

>> probe subjects
== ["@discover" [print "!!!! found @discover"] "@visa" [print "!!!! found @visa"] "@checking" [print "!!!! found @checking"]]

Let’s look at some of the other ways we can build the tiny code block that gets appended to subjects. It could be coded:

compose/only/deep [ (the-subject) [print ( reduce ["!!!! found" the-subject] )] ]
>> probe subjects
== ["@discover" [print ["!!!! found" "@discover"]] "@visa" [print ["!!!! found" "@visa"]] "@checking" [print ["!!!! found" "@checki..

The the runtime output looks the same, e.g. !!!! found @discover. But if the block will be executed a lot, having print join the two strings intuitively takes a little longer. Or it could have been coded as:

compose/only/deep [ (the-subject) [print  [ "!!!! found" (the-subject) ]] ]
>> probe subjects
== ["@discover" [print ["!!!! found" "@discover"]] "@visa" [print ["!!!! found" "@visa"]] "@checking" [print ["!!!! found" "@checki..

Which creates the same block as well as the same output.

The trick to picturing the evaluation is the parentheses. Some of my background languages make heavy use of macros, and that’s how I think of the (expression). What’s great to me is the preprocessor language is the same REBOL command dialect I use for everything else (REBOL script source is actually a dialect, too. There are no keywords.) Compose and related block-evaluating functions let me customize code blocks that will be executed many times. I can eliminate any redundant conditionals.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: