Language for Simplified Web Services and XML Programming
Articles

The Trouble With 2 + 3
A comparison of Water's syntax with traditional arithmetic syntax
By Christopher Fry

Overview
Most programming languages use infix syntax with special characters that are neither letters nor numbers as the "operators", for example, “2 + 3”. For such a simple computation, it is hard to get more concise. This notation is easily understood by most people who use computers. So it is easy to understand why many language designers have been seduced by it.

Simple as “2 + 3” is out of context, overall it makes a general purpose programming language more complex and difficult to use. The issues are subtle, but none the less arise in every language. In designing Water, we did things differently than other languages. If we had not, there would be no need to bother with a new language. This article explains why Water has the syntax it does and the advantages Water has over existing popular languages. If you think you'd never get use to Water syntax or that we had a screw loose when we created it, this article is for you.

Water Syntax
First let's cover how Water performs 2 + 3.

2.<plus 3/>

This expression uses angle brackets, a slash a dot as well as the digits. When you are programming it is best if you can run the evaluator in your head as you are reading code. So here is how to do that with the above expression.

Let's choose some terminology to make it easier to describe. The whole expression is a "path" expression consisting of two parts, the "subject" of our call, that is, “2”, and the call itself, “<plus 3/>”.  The call has two parts, a method, "plus", and an argument, “3”. We bound the call with an open angle bracket and the closing sequence of />. This open and close sequence is XML's syntax as is the spelling out of "plus". The character “+” is not a valid character for tag names in XML.

Paths are evaluated left to right, as in the English language. So “2” is evaluated first. It is a literal and evaluates to 2. Other common literals in Water include "a string", true, false, and null.

Next consider “2.plus”. The dot is the "path separator" character used in Java, Javascript and a few other languages. Our path has two path parts, 2 and <plus 3/>. Dot means "look up the symbol to my right in the object that is the result of evaluating the expression to my left. So the evaluator looks for the string "plus" as the name of the field in the object 2. It does not find it, but "lookup" means, if you do not immediately find it, then LOOK UP. So the evaluator gets the value out of the _parent field of 2, which is the integer object. As it happens, the integer object does not have a "plus" field either, but the parent of integer, that is, number does. So “2.plus 3” returns the object that is the value of the field named "plus" in number.

That value happens to be a method object. In Water, unlike Java, everything is an object, including all numbers. We can use the expression “2.plus” to get a method and perhaps pass it into some other method as an argument. For example "sort" takes a method to customize how to sort. But, we don't want to just get the method, we want to CALL it so we embellish our getting of the method with angle brackets. These angle brackets also enclose the arguments. In this case, we have just one argument, “3”, which is evaluated to return 3.

OK now that we have evaluated the subject, the method, and the argument expressions, we can call our method. The subject is bound to the local variable "_subject". Each argument is bound to the corresponding parameter in the method definition, and then the code in the body of the method is executed in an environment with those local bindings. All methods in Water return an object.

There are no "void' methods, although some methods simply return null. That is one less special case you have to think about in Water as opposed other languages like C and Java.

Flexibility
Water has extremely flexible argument handling capabilities. One feature is the ability to have "optional" arguments and to supply default values right in the method definition itself. Although the subject is not a normal argument, it can be treated as optional. The object "plus" is what is called a "promoted" object so it can be looked up in the default environment even without looking it up through a number. So “<plus 2 3/>” is a valid call to “plus” and will return 5.

In additional to optional arguments, Water has rest arguments. Methods that are declared to take rest arguments can take any number of arguments. Those arguments are collected together in a vector and that vector is bound to the rest local variable so that it is available for use within the method body just like other arguments. Since plus can take rest arguments, we can make calls like 2.<plus 3 4 5 6/> or <plus 2 3 4 5 6/> for that matter. In most other languages you would have to express this as  

“2 + 3 + 4 + 5 + 6”. Traditional syntax is not so advantageous on the "concise" dimension for these situations. The need to add up five numbers that way though may be rare, but I've found the need to add three is not all that uncommon. But more powerful is the ability to pass in a whole vector of numbers. Water's syntax for passing vectors is “<plus rest=my_vec/>”. If we had previously executed <set my_vec=<vector 2 3 4 5 6/>/>, then <plus rest=my_vec/> would give us the same result as <plus 2 3 4 5 6/>. Not many languages offer this kind of capability, but those that do have some other method like "apply". In Water that is unnecessary as are special methods that were designed to get the sum of a vector.

Now you might have noticed that <vector 2 3 4 5 6/> looks an awful lot like plus.

You're right and you would also be right if you guessed that vector takes rest arguments as well. I will cover more of Water's internal consistency below.

There is an additional flexibility that falls out of Water's simple execution rules. Since "plus" is just an expression that is executed, we can place any expression in there that would return a method.

Here is an example.

We can compute a random arithmetic operator with a call to random_field_value, which simply returns the value of a randomly chosen field out of the object that is the subject of the call. Let's build up to this with some smaller expressions. A vector is an object that has non-negative consecutive integers as the keys of its fields. Here are some examples of this.

<vector "zero" "one" "two"/>.0 returns "zero"
<vector "zero" "one" "two"/>.1 returns "one"
<vector "zero" "one" "two"/>.<get 1.<plus 1/>/> returns "two"

Note 1: Water does not use the special syntax [ ] for array referencing. It simply uses a call to the "get" method. Java tries hard to confuse you by using [ ] for referencing certain types of arrays, but a method call of "elementAt" for other kinds of arrays. In Water, “get” can be used not only on arrays but on any object when you want to compute the "key" to lookup. When the key is a literal or a symbol, you don't need to "compute" it, so a straightforward foo.bar or foo.1 works. But these also work fine: do foo.<get "bar"/> or foo.<Get 1/>

Note 2: If we had tried to execute <vector "zero" "one" "two"/>.1.<plus 1/>, we would have effectively "one".<plus 1/> and would get an error since "one" is a string object that does not include a field named "plus". If you must know, in addition to a _parent field of string, "one" has the fields 0, 1 and 2. Think of a string as a vector of characters. "one".1 returns the character 'n'.

We can make a vector of arithmetic methods via the following.

<Vector number.plus number.minus number.times number.divide/>

or, because each of those is "promoted",

<vector plus minus times divide/>

Now we can pick an arithmetic method at random:

<vector plus minus times divide/>.<random_field_value/>

Finally we can make our call with a random method like this:

2.<<vector plus minus times divide/>.<random_field_value/> 3/>

and we'll get back one of 5, -1, 6, or 0.6666667 depending on which operator was picked.

Silly example? Perhaps. But you can imagine a calculator application where the user chooses the arguments and the operator and passes them all to a method for the computation. In Water that would be something like:

<set meth=plus arg1=10 arg2=20/>
arg1.<<do meth/> arg2/>

We need to wrap a "do" around our "meth", otherwise the string "meth" will be looked up as a field key in arg1.

One more feature of Water's flexible method calling facility is that you can choose to pass in "keyword" named arguments. We saw this with <plus rest=<vector 2 3 4/>/> above, but it applies generally to ANY argument. If we have a method <defmethod drive speed=55 direction="north"/>, we can call it as follows.

<drive 70 "south"/> or

<drive speed=70 direction="south"/>

Keywords let us change the order of the arguments as follows.

<Drive direction="south" speed=70/>

Since both arguments are optional, <drive/> is legal and uses 55 and "north".

<drive 70/> lets us specify speed but takes the default direction. If we want to default speed but not direction, we have to use our keyword syntax

<drive direction="south"/>

 Precedence
Traditional math infix notation needs precedence rules, because the syntax is ambiguous when you have more than one operator.

Take 2 + 3 * 4 for example.

Do you multiply 5 times 4 as “left to right” processing would suggest, or do you add 2 to 12 per the "process inner parts first" rule? Before we answer this question, first examine a syntax like Java's. Java uses left to right processing for paths and for "stand alone" expressions just as Water does, for example  foo.bar. It also uses the "process inner parts first" rule for nested method calls such as foo(bar()), where bar() is executed first and its result is used in the call to foo.

So which rule should Java use for “2 + 3 * 4”? Should it process "left to right" or "inner parts first"? Actually neither. It chooses perhaps the worst of all worlds.

The rules are idiosyncratic based on the operator. To understand them, you have to know the precedence ordering of all the infix operators. How significant a burden is this? For Java, I don't know how to look up the documentation of ANY operator. Java suffers from "Javadoc", a documentation system that describes Java classes and their fields and methods. The only trouble is, Java is only half object oriented, so some of the most commonly used and important functionality is, to be kind, documented somewhere that I can't find in my dozen or so books on Java nor in the on-line help provided with my development environment. I can, however locate the precedence table for C in "The C Programming Language", by Kernigan and Ritchie. Since Java was influenced heavily by C, it is probably just barely different enough from C to confuse you.

In case you have misplaced your copy, worn it out, or thrown it away because you never wanted a memory leak problem again, here is the table from page 53 of the second edition. Water, like Java but unlike C, has automatic garbage collection so you never have to remember to take out the trash.

Operators
Associativity
() [] -> .
left to right
! ~ ++ -- + - * & (type) sizeof   
right to left
* /  % 
left to right
+  -     
left to right
<<   >>      
left to right
<   <=  >   >=      
left to right
==  !=     
left to right
&   
left to right
^   
left to right
|    
left to right
&&   
left to right
||        
left to right
?:     
right to left
=  +=  -=  *=  /=  %=  &=  ^= |=  <<=  >>=   
left to right

Surrounding this table are a couple of pages of dense text with exceptions and explanations about the precedence rules. You'll be relieved to know that you don't have to learn the order of the operators on each line as they all have the same precedence. But you DO have to know the ordering of the lines themselves. The table includes 44 operators spread out over no less than 14 different precedence levels. Furthermore, the table does not even mention “{}”, which have their own idiosyncratic semantics in Java as well as C.

Here is the equivalent table for Water.

Operators
Associativity
< />   <foo></foo>   <foo></>  .    
left to right
=   
left to right

Where < /> is one "operator" that is pretty much equivalent to (). Note that <foo/> and <foo></foo> are equivalent and part of XML syntax. Water allows the shortcut <foo></>, useful when you have a body and a long tag name. Although XML designed the naming of an end tag to improve readability, in many cases it actually inhibits readability because it makes writing code harder for two reasons. First, if you would be inclined to have a long, semantically meaningful name for a tag, you might reconsider knowing that you have to type it in twice for every element ( call ). Second, when you rename tags, a common practice while writing code, it is easy to rename one and not the other, or accidentally misspell one or the other and make your code inconsistent. Water lets the programmer choose either the descriptive, long ending, which I often use for elements with many-lined bodies, or the short form when the start and end of the call are close together. Most languages do not permit long calls so you often end up writing little comments at the end of a paren or curly brace. Water formalizes those "comments” as does XML but unlike XML, does not FORCE you to use them.

Well, you say, you don't really HAVE to memorize the whole table, you can paste the table to your wall, or you can wrap parens around every infix expression. First, you already have a lot pasted to your wall, but in general the fewer memory aids you NEED, the better. Second, if you wrap parens around everything then you are long way towards what Water does in wrapping angle brackets around everything.

If you do wrap parens around every one of our operator calls, then the significant differences from Water are as follows.

1. Water uses English letters that can be easily pronounced.
2. Water has a consistent place for the "method" with respect to the parens.

Parens are particularly troublesome in Java and C because they imply not only precedence and not only "normal" function calling, but are also used for casting. So if you see some parens in such languages you really don't know what you've got until you study it further, and in order to do that you really need to know something about what's inside and just outside of the parens. Water avoids synonyms, especially really thorny ones like parens.

Before moving on, I should tell you how to express “2 + 3 * 4” in Water. There are two ways depending on which meaning you want.

2.<plus 3/>.<times 4/>  and

2.<plus 3.<times 4/>/>

The first performs 5 times 4, the second 2 plus 12.

Consistency
Water is much more self-consistent than languages like Java or C. Water tries hard to eliminate the need to know complex rules with exceptions. This cuts down how much you need to know to accurately run the parser and evaluator in your head and thus cuts down on bugs, makes tool writing much easier and generally applicable, and requires less documentation. Those of you with years of indoctrination into the idiosyncrasies of math and traditional programming languages will probably have more difficulty unlearning the arcane syntaxes built up over the years, than you will in learning Water.

Parens discussed above are a good example. Water uses the XML equivalent of parens, angle brackets, for making calls. There are no "operators" so there is no need to use parens for precedence setting. You don't even have to know concept of precedence or associativity to be a proficient Water programmer. There is no need to "cast" in Water so casting syntax and semantics are similarly moot.

Even the spelling of symbol names is more consistent in Water. The language follows the XML conventions of an initial letter then either letters, digits, underscores or hyphens in the second through nth character positions. In most infix languages, you can use +, -, * etc. as the name of a symbol, but only if you use it by itself or in some other special symbol like -> or *=, another whole class of exception knowledge that is just moot in Water.

The big "consistency" point for 2 + 3 is that Water has ONE way to make a call. Java and C have two. Two aren't such a problem when they are isolated from each other. But they almost never are isolated in actual code. Remember that nearly all computations in a program produce "intermediate" results. That means they are results to pass to other methods to do further computations.

Here is an example:

return ((int)foo((2 + 3) * 4)) / 5;

We have a method named "foo" that is adjacent to, outside of, and to the right of an open paren. We have another method/operator + that is not adjacent to any paren, and not outside of the relevant parens. Then we have the * method/operator that doesn't really even HAVE parens associated with it. We have "int" which does have associated parens but it is not a method NOR an argument to a method. It is a type and we are doing a cast. What exactly are we casting? Well, I think it is the result of calling foo but it is kind of hard to tell. Then I wrapped parens around the cast-and-call to foo because I wanted to make sure the first "argument" to divide (which is spelled with a slash character and comes before the actual specification of the method itself) was the result of the cast-and-call-to-foo expression. Next there's "return", which can not be considered an infix operator because it is not between anything, and it is probably not considered to be a method either. (Gee, what "class" would it be in?) The "return" has a nebulous association with the expression to its right.

Finally we need to add a semicolon to finish the whole thing off. Why? Beats me. Javascript expressions often don't require a semicolon. Semicolon is pretty much like a close paren except that there's no matching open paren. Even when there's a normal method call like foo();  you need a semicolon to end it. In fact, it is more accurate to compare 2.<plus 3/> with 2 + 3;    rather than its unsemicoloned version EXCEPT when 2 + 3 is part of some other expression as in 2 + 3 * 4; in which case you don't put a semicolon after the 3, you put it after the 4. In Water there are no semicolons or their equivalent. 2.<plus 3/> is the same regardless of whether its "standing alone" or whether it is used as part of another expression. Yet another exception you don't have to know about if you use Water.

Here's the equivalent to:

return ((int)foo((2 + 3) * 4)) / 5;  

in Water:

<foo 2.<plus 3/>.<times 5/>/>.<divide 5/>

- No "return" is necessary for most cases of its use in other languages.
  Water just returns the last expression executed in the body of a method.
  You can have an explicit return if you like. Guess what? It looks like other method calls,  i.e. <return 27/> and even takes an optional argument to say what to return from.

No casting parens
No precedence parens
No semicolon
Only one syntax for calling a method.

I'm not sure this next syntactic misfeature is necessarily CAUSED by infix "operator" syntax but it is certainly highly correlated with it. That's the use of curly braces. What exactly do they mean in Java? Well I can tell you that you won't find out from Javadoc. They're used often in defining methods and classes. Oh you say, to wrap up the whole definition so it can be cleanly separated from other definitions? Well no, unfortunately, just to wrap up PART of the definition. So in Java you get definitions like:

public void foo(String a, Integer b) {

  some code;

}

In Water, the same thing is:

<defmethod foo a b>

   some code

</>

(No public, no void, no comma between params, no curly braces, no semicolon.)

Well that's almost the same thing. In Water declaring the types of method parameters is optional. If you want to declare them you can say

<defmethod foo a=required=string b=required=integer>

   some code

</>

and if you want the args to be optional, replace "required" with their default value.

Note that defining methods is one place where the XML syntax of having an "attributes" area and a "content" area comes in handy. But what I really want you to notice is that defining a method is itself just a method call that has the same syntax as calling any other method. There are no special keywords appearing before the call with their own idiosyncratic ordering and grouping. There is the extension to each parameter to allow a type declaration, but this is optional. You can specify the returned type of a Water method but it is optional and looks just like another parameter, i.e. <defmethod foo return_type=string/>. Yes, I know. It means your favorite name for a parameter, "return_type" is reserved.  Sorry. At least you don't need to hunt for some documentation to find out what it means. By the way, defmethod, like plus takes "rest" arguments so you can specify as many parameters as you want.

The need for curly braces, their exact semantics and where they can be placed is complex in languages that have them. Water doesn't, but it does have "do". "do" in its simplest form wraps expressions that you want to become one expression for the purposes of syntax. The simple case of using "do" just executes all the forms it wraps and returns the value of the last one.

So for example:

2.<plus <do "hi mom".<print/> 3/>/>

has the same functionality as 2.<plus 3/> except there's a side effect of printing "hi mom". "do" is just another one of the Water methods that takes "rest" arguments like plus and vector.

- It is just another method.
- No special syntax and funny characters like curly braces.
- No complex rules on where you can use it. It can be used anywhere any other expression can go.

Aesthetics and Hypertext
You may find the syntax of Water ugly but three billion web pages + on the Internet use that syntax for HTML. HTML has had more "code" written in it than any other formal language. Somehow its ugliness is not preventing productivity. Are angle brackets really uglier than the combination of parens, curly braces and semicolons? You be the judge. My larger concern is what do all those things really mean and how much to I have to know to fully understand them? A hard to understand, buggy program is an ugly program regardless of the syntax.

If you know XML, think of Water as just a bunch more tags. Like "plus" for instance. Water has a rather deep semantics with an extremely dynamic yet comparatively simple object system so I don't want you to get the impression that its a shallow language. But you'll have to see other documentation to learn about those aspects.

XML
Water's syntax is not just XML but some extensions to it known as Concise XML. You can code Water in pure XML but you won't want to. XML is too verbose, too ambiguous and too inflexible for serious code. None the less, Concise XML is a very close cousin to XML. In fact, it is the way XML should have been designed in the first place and perhaps some later version of the standard will grow to use Concise XML features. You can read more about XML in my companion paper: "The Trouble With XML".

The Importance of Math
Math is important. But few programs have more than 5% of the method calls as infix math operators. Even if the program is math at its core, usually user interface code, object handling, utility method calls and non-operator math calls dominate the code. So rather than supporting all the idiosyncrasies of math notation, Water was optimized for over all consistency and learnability. The integration of math with the rest of the code is often just as important as the math operator calls themselves, and then there's all the completely non-math majority of the code. I won't deny that there are a few programs that use math heavily. If you've got one of those, consider using FORTRAN or Mathematica instead of Water. But if your math needs to interface with the Web, probably your interface code will be a lot harder to do in traditional languages than in Water, and much harder than the difference between 2 + 3 and 2.<plus 3/>

Psychology
A number of people have looked at 2.<plus 3/> and said "I could never get used to that." But those that have tried, have gotten use to it. I can't guarantee that you'll be one of them, but you may not realize how adaptable you are.  If you understand 2 + 3 and even a quarter of the exceptions it implies, you're plenty smart enough to understand 2.<plus 3/> and all the exceptions it doesn't.

Rigidity is not a productive trait for a programmer. Computers are new after all, and getting newer all the time. Thus we need new tools to deal with ever greater levels of complexity. Water is one such tool. Losing 2 + 3 is a small price to pay for what you gain in Water, only a small percentage of which was described in this article. Language design is about trade-offs but we've managed to make fewer than any language I know of. As you get more deeply into Water, you may realize that 2.<plus 3/> isn't such a bad idea after all.

I do, however, have a special word of advice to Computer Science instructors: keep your students away from Water or you won't get to fill up class hours on syntactic detail. You'll be forced to pad your curriculum with higher level semantics.

See the following also by Fry.
The Trouble With XML
Water Rationale
Java and Water Examples

© Copyright 2001-2003 Clear Methods, Inc. All rights reserved.