n-able-devel team mailing list archive

Thread
Date
More on decorators that change the language parse...and other parsing stuff

To: "N-Able Devel" <n-able-devel@xxxxxxxxxxxxxxxxxxx>
From: "Arlo Belshee" <arlo@xxxxxxxxx>
Date: Mon, 6 Dec 2010 19:24:05 -0800
Importance: Normal
Metaprogramming and stuff. I want to make this a highly hackable language. But I also want to allow significant refactoring. So, I need to define some constraints.

The first, I think, is to have a very regular syntax. Currently, I’m thinking that the language supports only the nested block and single-statement statements. The first line of the one always ends in a colon. The parse of first line & rest of block, and of whole statement, depends on what is in context.

There are 4 types of attributes: @using(...) pulls in a language capability (or set of them). @not_using(...) removes a language capability. @with(...) pulls in a set of names & their in-language constructs. Anything else is a user-defined attribute. It is treated like a Python decorator: a function call is made at compile time, transforming the object decorated. There’s a simple one in the standard lib that just attaches metadata; many user-defined attributes would be implemented by referencing that.

I want to minimize the amount of other punctuation, and have each char mean exactly one thing – with all subparses then delegated to libraries & apps. For example, I’m thinking:

() has lots of meanings in most languages. Eg, in Python, it can mean function call, decl of function args, decl of base classes, tuple, or arithmetic order of operations (and perhaps more). I want to reduce these. Also, list comprehensions (generator comprehensions, ...) are a nice concept, but I’d like them to be extensible, rather than just the set the language developers thought of. Adding set comprehensions should not have been a language change to Python.

So, I’m thinking of things like the following:

function call: a(b, c)
indexing & slicing: a[b to c by e]
tuples: a = (b, c) # or (a, b) = (c, d)
unary negate: –a
nested block: stuff:\n\tstatements\n
attribute: @foo(a, b)
initialization: init_type { stuff }

Any other punctuation is treated as a (possibly multi-character) binary operator. If it is not defined by the libraries, or is not syntactically valid as a binary op, then syntax error. I’m not sure how to handle operator precedence. I’d like to ignore it as much as possible. I don’t think I’ll get away with that, but at least I’d like it to not be build into the language.

None of this punctuation is allowed in any other context. Thus, I can parse based on this (and \n, \t, and #) without reference to any of the other stuff. This should give me the basic structure of the program, and then allow recursion based on the language elements that the programmer pulled in to this scope.

Also, I like Python’s use of keywords for some common binary operators. Keeping that should be fairly easy.

Initialization is the general form of comprehensions. I’m not too sure about it. However, it covers both the cases of Anonymous Types in C# and the case of all the comprehensions in Python. init_type defines how it parses whatever’s in the curlies. Thus, these would all be potentially valid (depending on how list, set, multi_dict, and class were defined):

list { x for x in range(35) if x % 2 == 0 }
set { file.lines.select(l => l.replace(“\{(.*?)\}”, match => replacements[match])).where(l => l.contains(“frog”)) }
multi_dict { file.lines, keys= l => l.split() }
class { name=val, name2=val2 }

I also need to expose a rich post-compilation model to the tools. The tools should be able to detect and refactor function calls, classes, and so on, even when funny decorators are involved.

Arlo