grammar-notation.mdwn 4.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187
  1. This is an EBNF notation, used for defining grammars in the **Rel4tion**
  2. project. It is based on several sources, especially the EBNF article in
  3. Wikipedia and the notation used by W3C for XML and other languages.
  4. In case you need to refer to it, you can call it *SGN*. It means "Some Grammar
  5. Notation". Indeed calling a specific notation "some" is a paradox. That's why
  6. the name was chosen :-)
  7. # Highlighting
  8. I'm working on a [[Vim syntax file|sgn.vim]]. It's not complete yet, but
  9. useful.
  10. The wiki itself doesn't yet highlight SGN, but I'll see if it's close enough to
  11. e.g. EBNF. That would just be a workaround until I write a *highlight* file of
  12. course.
  13. # Using
  14. When writing a full grammar definition for some language, create a file in the
  15. wiki with `.sgn` extension. You can treat it as plain text, but SGN comments
  16. may contain ikiwiki links, directives, etc. That page can then, if needed, be
  17. inlined into other pages. Or just linked.
  18. If writing just a small piece which doesn't need its own page, use the
  19. [[/ikiwiki/directive/format]] directive. There's no "sgn" language right now,
  20. and I haven't tested what happens if specified. The safe default for now is
  21. either using a code block (i.e. indenting lines with a tab or 4 spaces) or a
  22. `txt` snippet (the wiki can render pages from plain-text `.txt` files):
  23. \[[!format txt """
  24. nesting = nestopen | nestclose
  25. nestopen = "["
  26. nestclose = "]"
  27. """]]
  28. # Rules
  29. The grammar is a list of rules of the form
  30. symbol = expression
  31. The expression list may be contain indentation. The indentation is there just
  32. for readability, and doesn't add any meaning. It is a flat list of rules.
  33. Both the alphabet of the grammar and the alphabet of the language it defines
  34. are Unicode.
  35. It is possible to specify symbol contexts, and context changes. These are used
  36. by the parser (syntactic analyzer). A symbol's context is specified like this:
  37. context:symbol = expression
  38. For the default context, just the `symbol` part is enough.
  39. Context change can me specified regardless of whether a rule symbol has a
  40. specified context or not. It has the following form:
  41. context1:symbol = expression => context2
  42. -- or
  43. symbol = expression => context
  44. The default context can be specified as `:`. For example:
  45. exp:closinparen = ")" => :
  46. Sometimes the context change depends on more than just the rule. Maybe the
  47. parser holds some information and decides based on it. You can either specify
  48. context changes in the lexical structure or in the syntax definition. In the
  49. lexical structure case, computed context changes can be denoted like this:
  50. symbol = expression ?=> context
  51. Or a list of possible contexts can be given:
  52. symbol = expression => context1, context2, context3
  53. Then you can use a comment to explain how the choice is made.
  54. The expression on the right side of the rule may be built using the following
  55. forms:
  56. / some text here /
  57. > A free-form explanation of the match.
  58. "some text here" or 'some text here'
  59. > Exactly matches the content of the string literal.
  60. \xN
  61. > Matches the Unicode character whose number in hexadecimal is `N`.
  62. [0-9], [a-zA-Z], [\xM-\xN]
  63. > Matches any character in the specified range(s), inclusive.
  64. [xyz], [\xM\xN\xP]
  65. > Matches any character in list.
  66. [AB]
  67. > Matches A or B, where each is a range, a character list or a mix.
  68. [^A]
  69. > Matches any character which the range/list/mix A doesn't match.
  70. X | Y
  71. > Matches X or Y (alteration).
  72. X - Y
  73. > Matches any string that matches X but not Y.
  74. X Y
  75. > Matches X followed by Y (concatenation).
  76. X*
  77. > Matches zero or more consecutive repetitions of X.
  78. X+
  79. > Matches one or more consecutive repetitions of X. In other words it's the
  80. > same as `X X*`.
  81. X?
  82. > Matches X or the empty string, i.e. 0 or 1 occurences of X.
  83. X #N
  84. > Matches exactly N repetitions of X.
  85. X #M-N
  86. > Matches between M to N repetitions of X inclusive.
  87. !X
  88. > Matches a string if it doesn't match X.
  89. ( X )
  90. > Matches X. Can be used for grouping to change override precedence rules.
  91. -- some text here
  92. > A comment, isn't a meaningful part of the rule.
  93. Order of precedence, highest to lowest:
  94. 1. `X*`, `X+`, `X?`, `!X`
  95. 2. `X Y`
  96. 3. `X | Y`, `X - Y`
  97. It's possible and sometimes very useful to indent rules. For example, a grammar
  98. can have several "top level" kinds of forms, and the rules for each one can be
  99. indented. It doesn't affect the meaning, but it makes the file more readable.
  100. A line indented to the position of the `=` after the rule name (or further) is
  101. considered part of the rule, while a line indented less is a new rule.
  102. The recommended indentation level width is 2 spaces.
  103. For example, this is a single rule:
  104. [[!format sgn """
  105. literal = number | string | boolean |
  106. character | chunk | pattern
  107. """]]
  108. The last `|` in the first line could instead be placed in the second line,
  109. right below the `=`.
  110. But these are 2 rules, the second being indented:
  111. [[!format sgn """
  112. literal = number | string | boolean
  113. number = [0-9]+
  114. """]]