fr33domlover
/
Rel4tion-Wiki
mirror of git://seek-together.space/wiki.git


			
				
					
						
						
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187
							This is an EBNF notation, used for defining grammars in the **Rel4tion**
project. It is based on several sources, especially the EBNF article in
Wikipedia and the notation used by W3C for XML and other languages.

In case you need to refer to it, you can call it *SGN*. It means "Some Grammar
Notation". Indeed calling a specific notation "some" is a paradox. That's why
the name was chosen :-)

# Highlighting

I'm working on a [[Vim syntax file|sgn.vim]]. It's not complete yet, but
useful.

The wiki itself doesn't yet highlight SGN, but I'll see if it's close enough to
e.g. EBNF. That would just be a workaround until I write a *highlight* file of
course.

# Using

When writing a full grammar definition for some language, create a file in the
wiki with `.sgn` extension. You can treat it as plain text, but SGN comments
may contain ikiwiki links, directives, etc. That page can then, if needed, be
inlined into other pages. Or just linked.

If writing just a small piece which doesn't need its own page, use the
[[/ikiwiki/directive/format]] directive. There's no "sgn" language right now,
and I haven't tested what happens if specified. The safe default for now is
either using a code block (i.e. indenting lines with a tab or 4 spaces) or a
`txt` snippet (the wiki can render pages from plain-text `.txt` files):

	\[[!format txt """
	nesting   = nestopen | nestclose
	nestopen  = "["
	nestclose = "]"
	"""]]

# Rules

The grammar is a list of rules of the form

	symbol = expression

The expression list may be contain indentation. The indentation is there just
for readability, and doesn't add any meaning. It is a flat list of rules.

Both the alphabet of the grammar and the alphabet of the language it defines
are Unicode.

It is possible to specify symbol contexts, and context changes. These are used
by the parser (syntactic analyzer). A symbol's context is specified like this:

	context:symbol = expression

For the default context, just the `symbol` part is enough.

Context change can me specified regardless of whether a rule symbol has a
specified context or not. It has the following form:

	context1:symbol = expression => context2
	-- or
	symbol = expression => context

The default context can be specified as `:`. For example:

	exp:closinparen = ")" => :

Sometimes the context change depends on more than just the rule. Maybe the
parser holds some information and decides based on it. You can either specify
context changes in the lexical structure or in the syntax definition. In the
lexical structure case, computed context changes can be denoted like this:

	symbol = expression ?=> context

Or a list of possible contexts can be given:

	symbol = expression => context1, context2, context3

Then you can use a comment to explain how the choice is made.

The expression on the right side of the rule may be built using the following
forms:

	/ some text here /

> A free-form explanation of the match.

	"some text here" or 'some text here'

> Exactly matches the content of the string literal.

	\xN

> Matches the Unicode character whose number in hexadecimal is `N`.

	[0-9], [a-zA-Z], [\xM-\xN]

> Matches any character in the specified range(s), inclusive.

	[xyz], [\xM\xN\xP]

> Matches any character in list.

	[AB]

> Matches A or B, where each is a range, a character list or a mix.

	[^A]

> Matches any character which the range/list/mix A doesn't match.

	X | Y

> Matches X or Y (alteration).

	X - Y

> Matches any string that matches X but not Y.

	X Y

> Matches X followed by Y (concatenation).

	X*

> Matches zero or more consecutive repetitions of X.

	X+

> Matches one or more consecutive repetitions of X. In other words it's the
> same as `X X*`.

	X?

> Matches X or the empty string, i.e. 0 or 1 occurences of X.

	X #N

> Matches exactly N repetitions of X.

	X #M-N

> Matches between M to N repetitions of X inclusive.

	!X

> Matches a string if it doesn't match X.

	( X )

> Matches X. Can be used for grouping to change override precedence rules.

	-- some text here

> A comment, isn't a meaningful part of the rule.

Order of precedence, highest to lowest:

1. `X*`, `X+`, `X?`, `!X`
2. `X Y`
3. `X | Y`, `X - Y`

It's possible and sometimes very useful to indent rules. For example, a grammar
can have several "top level" kinds of forms, and the rules for each one can be
indented. It doesn't affect the meaning, but it makes the file more readable.

A line indented to the position of the `=` after the rule name (or further) is
considered part of the rule, while a line indented less is a new rule.

The recommended indentation level width is 2 spaces.

For example, this is a single rule:

[[!format sgn """
literal = number | string | boolean |
          character | chunk | pattern
"""]]

The last `|` in the first line could instead be placed in the second line,
right below the `=`.

But these are 2 rules, the second being indented:

[[!format sgn """
literal = number | string | boolean
  number  = [0-9]+
"""]]