node.txt 13 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341
  1. In the compiler `continuation' means a continuation that is a lambda node.
  2. Non-lambda continuation arguments, such as the argument to a RETURN, are
  3. not referred to as continuations (the argument isn't a continuation, it
  4. is a variable that is bound to a continuation).
  5. Every node has the following fields:
  6. variant ; one of LITERAL, REFERENCE, LAMBDA, or CALL
  7. parent ; parent node
  8. index ; index of this node in parent, if parent is a call node
  9. simplified? ; true if it has already been simplified; if this is #F
  10. ; then all of this node's ancestors must also be unsimplified
  11. flag ; useful flag, all users must leave this is #F
  12. Literal nodes:
  13. value ; the value
  14. type ; the type of the value (important for statically typed languages,
  15. ; not so useful for Scheme)
  16. Reference nodes:
  17. variable ; the referenced variable; the binder of the variable must be
  18. ; an ancestor of the reference node
  19. Call nodes:
  20. primop ; the primitive being called
  21. args ; vector of argument nodes
  22. exits ; the number of arguments that are continuations; the continuation
  23. ; arguments come before the non-continuation ones
  24. source ; source info; used for error messages
  25. Primops are either trivial or nontrivial. Trivial primops only return a value
  26. and have no side effects. Calls to trivial primops never have continuation
  27. arguments and are always arguments to other calls. Calls to nontrivial primops
  28. may or may not have continuations and are always the body of a lambda node.
  29. Lambda nodes:
  30. type ; one of PROC, CONT, or JUMP (and maybe THROW at some point)
  31. name ; symbol (for debugging)
  32. id ; unique integer (for debugging)
  33. body ; the call-node that is the body of the lambda
  34. variables ; a list of variable records, with #Fs for ignored positions
  35. source ; source info; used for error messages
  36. protocol ; calling protocol from the source language
  37. block ; for use during code generation
  38. env ; for use when adding explicit environments
  39. PROC's are general procedures. The first variable of a PROC will be bound
  40. to the PROC's continuation.
  41. CONT's are continuation arguments to calls.
  42. JUMP's are continuations bound by LET or LETREC, whose calling points are
  43. known, and which are created and called within a single PROC.
  44. Variables:
  45. name ; source code name for variable (used for debugging only)
  46. id ; unique numeric identifier (used for debugging only)
  47. type ; type of variable's value
  48. binder ; LAMBDA node which binds this variable (or #F if none)
  49. refs ; list of reference nodes n for which (REFERENCE-VARIABLE n)
  50. ; = this variable
  51. flag ; useful slot, used by shapes, COPY-NODE, NODE->VECTOR, etc.
  52. ; all users must leave this is #F
  53. flags ; list of various annotations, e.g. IGNORABLE
  54. generate ; for whatever code generation wants
  55. ----------------------------------------------------------------
  56. The node tree has a very regular lexical structure:
  57. The body of every lambda node is a non-trivial call.
  58. The parent of every non-trivial call is a lambda node.
  59. Every CONT lambda is a continuation of a non-trivial call.
  60. Every JUMP lambda is an argument to either the LET or the LETREC
  61. primops (described below).
  62. The lambda node that binds a variable is an ancestor of every reference
  63. to that variable.
  64. If you start from any leaf node and follow the parent pointers up through the
  65. node tree, you first go through some number, possible zero, of trivial calls
  66. until a non-trivial call is reached. From that point on non-trivial calls
  67. alternate with CONT nodes until a PROC or JUMP lambda is reached. Going up
  68. from a PROC lambda is the same as going up from a leaf, while JUMP lambdas
  69. are always arguments to LET or LETREC, both of which are non-trivial.
  70. A basic block appears as a sequence of non-trivial calls with a single
  71. continuation apiece. The block begins with a PROC or JUMP lambda, or
  72. with a CONT lambda that is an argument to a call with two or more
  73. continuations, and ends with a call that has either no continuations,
  74. or two or more.
  75. Basic blocks are grouped into trees. The root of every tree is either
  76. a PROC or JUMP lambda, the branch points are calls with two or more
  77. continuations, and the leaves are jumps or returns. Within a tree
  78. the control flow follows the lexical structure of the program from
  79. parent to child (if we ignore calls to other PROCs).
  80. Every JUMP lambda is called from within only one PROC lambda, so a PROC
  81. can be considered to consist of a set of trees, the leaves of which either
  82. return from that PROC or jump to the top of another tree in the set.
  83. ----------------------------------------------------------------
  84. Primops:
  85. id ; unique symbol identifying this primop
  86. trivial? ; #t if this primop has does not accept a continuation
  87. side-effects ; one of #F, READ, WRITE, ALLOCATE, or IO
  88. simplify-call-proc ; simplify method
  89. primop-cost-proc ; cost of executing this operation
  90. ; (in some undisclosed metric)
  91. return-type-proc ; the type of the value returned (for trivial primops only)
  92. proc-data ; more data for the procedure primops
  93. cond-data ; more data for conditional primops
  94. code-data ; code generation data
  95. `procedure' primops are those that call one of their values.
  96. `conditional' primops are those that have more than one continuation.
  97. Below is a list of the standard primops. All but the last two are non-trivial.
  98. For the following the five primops the lambda node being called, jumped to,
  99. or whatever has been identified by the compiler, and the number of variables
  100. that the lambda node has matches the number of arguments.
  101. (CALL <cont> <proc> . <args>)
  102. (TAIL-CALL <cont-var> <proc> . <args>)
  103. (RETURN <cont-var> . <args>)
  104. (JUMP <jump-var> . <args>)
  105. ; (THROW <throw-var> . <args>) not yet implemented
  106. These are the same as the above except that the procedure has not been
  107. identified by the compiler. There is no UNKNOWN-JUMP because all calls
  108. to JUMP lambdas must be known.
  109. (UNKNOWN-CALL <cont> <proc> . <args>)
  110. (UNKNOWN-TAIL-CALL <cont> <proc> . <args>)
  111. (UNKNOWN-RETURN <cont-var> . <args>)
  112. PROC lambdas are called with either CALL or TAIL-CALL if all of their call
  113. sites have been identified, or with UNKNOWN-CALL or UNKNOWN-TAIL-CALL if not.
  114. JUMP lambdas are called using JUMP.
  115. LET binds random values, such as lambda nodes or the results of trivial
  116. calls, to variables. This primop only exists because of the requirement
  117. that every call have a primop; all it does is apply <cont> to <args>
  118. (it is called LET instead of APPLY because LET forms in the source code
  119. become calls to this primop).
  120. (LET <cont> . <args>)
  121. Recursive binding:
  122. (LETREC1 <cont>)
  123. (LETREC2 <cont> <id-var> <lambda1> <lambda2> ...)
  124. These are always used together, with the body of the continuation to LETREC1
  125. being a call to LETREC2. The two calls together look like:
  126. (LETREC1 (lambda (<id-var> <var1> ... <varN>)
  127. (LETREC2 <cont> <id-var> <lambda1> ... <lambdaN>)))
  128. which the CPS pretty-printer prints as:
  129. (let* (...
  130. ((id-var var1 ... varN) (letrec1))
  131. (() (letrec2 id-var lambda1 ... lambdaN))
  132. ...)
  133. ...)
  134. The end result is to bind <varI> to <lambdaI>. The point to the excercise
  135. is that lambdas occur within the scope of the variables.
  136. Undefined effect. This takes a continuation variable as an argument only
  137. so that the continuation variable is always reached.
  138. (UNDEFINED-EFFECT <cont-var> ...)
  139. Accessing and mutating the store.
  140. Cells are used to implement SET! on lexically bound variables. GLOBAL-SET!
  141. and GLOBAL-REF are used for module variables that may be set.
  142. (CELL-SET! <cont> <cell> <value>)
  143. (GLOBAL-SET! <cont> <global-var> <value>)
  144. (CELL-REF <cell>) ; trivial
  145. (GLOBAL-REF <global-var>) ; trivial
  146. ----------------------------------------------------------------
  147. Printing out the node tree.
  148. The following procedure:
  149. (define (fact n)
  150. (let loop ((n n) (r 1))
  151. (if (< n 2)
  152. r
  153. (loop (- n 1) (* n r)))))
  154. when converted into nodes is:
  155. (LAMBDAp (c_6 n_1)
  156. (letrec1 (LAMBDAc (x_13 loop_2)
  157. (letrec2 (LAMBDAc ()
  158. (unknown-tail-call c_6 loop_2 n_1 '1))
  159. x_13
  160. (LAMBDAp (c_8 n_3 r_4)
  161. (test
  162. (LAMBDAc ()
  163. (unknown-return c_8 r_4))
  164. (LAMBDAc ()
  165. (unknown-tail-call c_8 loop_2 (- n_3 '1) (* n_3 r_4)))
  166. (< n_3 '2)))))))
  167. where LAMBDAp is a PROC lambda and LAMBDAc is a CONT lambda. Lexically bound
  168. variables are printed as <name>_<id> and constants as '<value>. This is not
  169. very readable, and larger procedures are much worse. The first step in making
  170. it more comprehensible is to print each lambda node separately with a marker
  171. to indicate where it appears in the tree.
  172. (LAMBDAp fact_7 (c_6 n_1)
  173. (letrec1 1 ^c_14))
  174. (LAMBDAc c_14 (x_13 loop_2)
  175. (letrec2 1 ^c_12 x_13 ^loop_9))
  176. (LAMBDAc c_12 ()
  177. (unknown-tail-call 0 c_6 loop_2 n_1 '1))
  178. (LAMBDAp loop9 (c_8 n_3 r_4)
  179. (test 2 ^g_10 ^g_11 (< n_3 '2)))
  180. (LAMBDAc g_10 ()
  181. (unknown-return 0 c_8 r_4))
  182. (LAMBDAc g_11 ()
  183. (unknown-tail-call 0 c_8 loop_2 (- n_3 '1) (* n_3 r_4)))
  184. The labels used are the names and id's of the lambda nodes, with a ^ in front
  185. to distinguish them from variables. The code for each lambda is indented
  186. slightly more than the lambda in which it actually occurs. To make the
  187. distinction between continuation and non-continuation lambdas clearer the
  188. number of continuation arguments to a call is printed just after the primop
  189. (for example the first two arguments to TEST are continuations).
  190. The first three calls form a basic block because the first two calls have
  191. exactly one continuation apiece. To make this more easily seen these
  192. calls can be printed using a more condensed notation:
  193. (LAMBDAp fact_7 (c_6 n_1)
  194. (LET* (((x_13 loop_2) (letrec1))
  195. (() (letrec2 x_13 ^loop_9)))
  196. (unknown-tail-call 0 c_6 loop_2 n_1 '1)))
  197. The continuations are not printed as arguments but instead their variables
  198. are printed to the left of the call in a parody of Scheme's LET*. The results
  199. of the LETREC1 are bound to the variables X_13 and LOOP_2 as would happen with
  200. the real LET* (if it allowed calls to return multiple values).
  201. Finally, here is the way the code for FACT is actually printed:
  202. 7 (P fact_7 (c_6 n_1)
  203. 14 (LET* (((x_13 loop_2)
  204. (letrec1))
  205. 12 (() (letrec2 x_13 ^loop_9)))
  206. (unknown-tail-call 0 c_6 loop_2 n_1 '1)))
  207. 9 (P loop_9 (c_8 n_3 r_4)
  208. (test 2 ^g_10 ^g_11 (< n_3 '2)))
  209. 10 (C g_10 ()
  210. (unknown-return 0 c_8 r_4))
  211. 11 (C g_11 ()
  212. (unknown-tail-call 0 c_8 loop_2 (- n_3 '1) (* n_3 r_4)))
  213. The ID number of every lambda node is printed out at the beginning of the
  214. line on which the code for the lambda appears. This is redundant for the
  215. lambdas that are not printed as part of a LET*. The word `LAMBDA' is not
  216. printed. The (letrec1) call appears on a new line because the printer
  217. indents the calls in LET* a fixed amount.
  218. The reason for printing the ID numbers is so that the actual nodes can be
  219. obtained. Once a lambda has been printed (either by the pretty printer or
  220. by the regular printer), (NODE-UNHASH <id>) will return it:
  221. scheme-compiler> (node-unhash 9)
  222. '#{Node lambda loop 9}
  223. scheme-compiler> ,inspect ##
  224. '#{Node lambda loop 9}
  225. [0: variant] 'lambda
  226. [1: parent] '#{Node call letrec2}
  227. [2: index] 2
  228. [3: simplified?] #t
  229. [4: flag] #f
  230. [5: stuff-0] '#{Node call test}
  231. [6: stuff-1] '(#{Variable n 3} #{Variable r 4})
  232. [7: stuff-2] '(#{Name #} (n r) (if # r #))
  233. [8: stuff-3] '#{Lambda-data}
  234. ----------------------------------------------------------------
  235. Simplification.
  236. The factorial procedure above is how it looks when originally translated
  237. into a node tree. The next step in compilation is to simplify the tree,
  238. doing constant folding, identifying call points, and so on. The simplified
  239. version of FACT is:
  240. 7 (P fact_7 (c_6 n_1)
  241. 14 (LET* (((x_13 loop_2)
  242. (letrec1))
  243. 12 (() (letrec2 x_13 ^loop_9)))
  244. (jump 0 loop_2 n_1 '1)))
  245. 9 (J loop_9 (n_3 r_4)
  246. (test 2 ^g_10 ^g_11 (< n_3 '2)))
  247. 10 (C g_10 ()
  248. (unknown-return 0 c_6 r_4))
  249. 11 (C g_11 ()
  250. (jump 0 loop_2 (+ '-1 n_3) (* n_3 r_4)))
  251. The only change is that the loop has been turned into a JUMP lambda.
  252. ----------------------------------------------------------------
  253. Still to describe:
  254. protocol determination
  255. simplifier moving stuff down, duplicating, later passes move values back up