123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187 |
- This is an EBNF notation, used for defining grammars in the **Rel4tion**
- project. It is based on several sources, especially the EBNF article in
- Wikipedia and the notation used by W3C for XML and other languages.
- In case you need to refer to it, you can call it *SGN*. It means "Some Grammar
- Notation". Indeed calling a specific notation "some" is a paradox. That's why
- the name was chosen :-)
- # Highlighting
- I'm working on a [[Vim syntax file|sgn.vim]]. It's not complete yet, but
- useful.
- The wiki itself doesn't yet highlight SGN, but I'll see if it's close enough to
- e.g. EBNF. That would just be a workaround until I write a *highlight* file of
- course.
- # Using
- When writing a full grammar definition for some language, create a file in the
- wiki with `.sgn` extension. You can treat it as plain text, but SGN comments
- may contain ikiwiki links, directives, etc. That page can then, if needed, be
- inlined into other pages. Or just linked.
- If writing just a small piece which doesn't need its own page, use the
- [[/ikiwiki/directive/format]] directive. There's no "sgn" language right now,
- and I haven't tested what happens if specified. The safe default for now is
- either using a code block (i.e. indenting lines with a tab or 4 spaces) or a
- `txt` snippet (the wiki can render pages from plain-text `.txt` files):
- \[[!format txt """
- nesting = nestopen | nestclose
- nestopen = "["
- nestclose = "]"
- """]]
- # Rules
- The grammar is a list of rules of the form
- symbol = expression
- The expression list may be contain indentation. The indentation is there just
- for readability, and doesn't add any meaning. It is a flat list of rules.
- Both the alphabet of the grammar and the alphabet of the language it defines
- are Unicode.
- It is possible to specify symbol contexts, and context changes. These are used
- by the parser (syntactic analyzer). A symbol's context is specified like this:
- context:symbol = expression
- For the default context, just the `symbol` part is enough.
- Context change can me specified regardless of whether a rule symbol has a
- specified context or not. It has the following form:
- context1:symbol = expression => context2
- -- or
- symbol = expression => context
- The default context can be specified as `:`. For example:
- exp:closinparen = ")" => :
- Sometimes the context change depends on more than just the rule. Maybe the
- parser holds some information and decides based on it. You can either specify
- context changes in the lexical structure or in the syntax definition. In the
- lexical structure case, computed context changes can be denoted like this:
- symbol = expression ?=> context
- Or a list of possible contexts can be given:
- symbol = expression => context1, context2, context3
- Then you can use a comment to explain how the choice is made.
- The expression on the right side of the rule may be built using the following
- forms:
- / some text here /
- > A free-form explanation of the match.
- "some text here" or 'some text here'
- > Exactly matches the content of the string literal.
- \xN
- > Matches the Unicode character whose number in hexadecimal is `N`.
- [0-9], [a-zA-Z], [\xM-\xN]
- > Matches any character in the specified range(s), inclusive.
- [xyz], [\xM\xN\xP]
- > Matches any character in list.
- [AB]
- > Matches A or B, where each is a range, a character list or a mix.
- [^A]
- > Matches any character which the range/list/mix A doesn't match.
- X | Y
- > Matches X or Y (alteration).
- X - Y
- > Matches any string that matches X but not Y.
- X Y
- > Matches X followed by Y (concatenation).
- X*
- > Matches zero or more consecutive repetitions of X.
- X+
- > Matches one or more consecutive repetitions of X. In other words it's the
- > same as `X X*`.
- X?
- > Matches X or the empty string, i.e. 0 or 1 occurences of X.
- X #N
- > Matches exactly N repetitions of X.
- X #M-N
- > Matches between M to N repetitions of X inclusive.
- !X
- > Matches a string if it doesn't match X.
- ( X )
- > Matches X. Can be used for grouping to change override precedence rules.
- -- some text here
- > A comment, isn't a meaningful part of the rule.
- Order of precedence, highest to lowest:
- 1. `X*`, `X+`, `X?`, `!X`
- 2. `X Y`
- 3. `X | Y`, `X - Y`
- It's possible and sometimes very useful to indent rules. For example, a grammar
- can have several "top level" kinds of forms, and the rules for each one can be
- indented. It doesn't affect the meaning, but it makes the file more readable.
- A line indented to the position of the `=` after the rule name (or further) is
- considered part of the rule, while a line indented less is a new rule.
- The recommended indentation level width is 2 spaces.
- For example, this is a single rule:
- [[!format sgn """
- literal = number | string | boolean |
- character | chunk | pattern
- """]]
- The last `|` in the first line could instead be placed in the second line,
- right below the `=`.
- But these are 2 rules, the second being indented:
- [[!format sgn """
- literal = number | string | boolean
- number = [0-9]+
- """]]
|