compiler.texi 37 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895
  1. @c -*-texinfo-*-
  2. @c This is part of the GNU Guile Reference Manual.
  3. @c Copyright (C) 2008, 2009, 2010, 2011
  4. @c Free Software Foundation, Inc.
  5. @c See the file guile.texi for copying conditions.
  6. @node Compiling to the Virtual Machine
  7. @section Compiling to the Virtual Machine
  8. Compilers have a mystique about them that is attractive and
  9. off-putting at the same time. They are attractive because they are
  10. magical -- they transform inert text into live results, like throwing
  11. the switch on Frankenstein's monster. However, this magic is perceived
  12. by many to be impenetrable.
  13. This section aims to pay attention to the small man behind the
  14. curtain.
  15. @xref{Read/Load/Eval/Compile}, if you're lost and you just wanted to
  16. know how to compile your @code{.scm} file.
  17. @menu
  18. * Compiler Tower::
  19. * The Scheme Compiler::
  20. * Tree-IL::
  21. * GLIL::
  22. * Assembly::
  23. * Bytecode and Objcode::
  24. * Writing New High-Level Languages::
  25. * Extending the Compiler::
  26. @end menu
  27. @node Compiler Tower
  28. @subsection Compiler Tower
  29. Guile's compiler is quite simple, actually -- its @emph{compilers}, to
  30. put it more accurately. Guile defines a tower of languages, starting
  31. at Scheme and progressively simplifying down to languages that
  32. resemble the VM instruction set (@pxref{Instruction Set}).
  33. Each language knows how to compile to the next, so each step is simple
  34. and understandable. Furthermore, this set of languages is not
  35. hardcoded into Guile, so it is possible for the user to add new
  36. high-level languages, new passes, or even different compilation
  37. targets.
  38. Languages are registered in the module, @code{(system base language)}:
  39. @example
  40. (use-modules (system base language))
  41. @end example
  42. They are registered with the @code{define-language} form.
  43. @deffn {Scheme Syntax} define-language @
  44. name title reader printer @
  45. [parser=#f] [compilers='()] [decompilers='()] [evaluator=#f] @
  46. [joiner=#f] [make-default-environment=make-fresh-user-module]
  47. Define a language.
  48. This syntax defines a @code{#<language>} object, bound to @var{name}
  49. in the current environment. In addition, the language will be added to
  50. the global language set. For example, this is the language definition
  51. for Scheme:
  52. @example
  53. (define-language scheme
  54. #:title "Scheme"
  55. #:reader (lambda (port env) ...)
  56. #:compilers `((tree-il . ,compile-tree-il))
  57. #:decompilers `((tree-il . ,decompile-tree-il))
  58. #:evaluator (lambda (x module) (primitive-eval x))
  59. #:printer write
  60. #:make-default-environment (lambda () ...))
  61. @end example
  62. @end deffn
  63. The interesting thing about having languages defined this way is that
  64. they present a uniform interface to the read-eval-print loop. This
  65. allows the user to change the current language of the REPL:
  66. @example
  67. scheme@@(guile-user)> ,language tree-il
  68. Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'.
  69. tree-il@@(guile-user)> ,L scheme
  70. Happy hacking with Scheme! To switch back, type `,L tree-il'.
  71. scheme@@(guile-user)>
  72. @end example
  73. Languages can be looked up by name, as they were above.
  74. @deffn {Scheme Procedure} lookup-language name
  75. Looks up a language named @var{name}, autoloading it if necessary.
  76. Languages are autoloaded by looking for a variable named @var{name} in
  77. a module named @code{(language @var{name} spec)}.
  78. The language object will be returned, or @code{#f} if there does not
  79. exist a language with that name.
  80. @end deffn
  81. Defining languages this way allows us to programmatically determine
  82. the necessary steps for compiling code from one language to another.
  83. @deffn {Scheme Procedure} lookup-compilation-order from to
  84. Recursively traverses the set of languages to which @var{from} can
  85. compile, depth-first, and return the first path that can transform
  86. @var{from} to @var{to}. Returns @code{#f} if no path is found.
  87. This function memoizes its results in a cache that is invalidated by
  88. subsequent calls to @code{define-language}, so it should be quite
  89. fast.
  90. @end deffn
  91. There is a notion of a ``current language'', which is maintained in
  92. the @code{*current-language*} fluid. This language is normally Scheme,
  93. and may be rebound by the user. The run-time compilation interfaces
  94. (@pxref{Read/Load/Eval/Compile}) also allow you to choose other source
  95. and target languages.
  96. The normal tower of languages when compiling Scheme goes like this:
  97. @itemize
  98. @item Scheme
  99. @item Tree Intermediate Language (Tree-IL)
  100. @item Guile Lowlevel Intermediate Language (GLIL)
  101. @item Assembly
  102. @item Bytecode
  103. @item Objcode
  104. @end itemize
  105. Object code may be serialized to disk directly, though it has a cookie
  106. and version prepended to the front. But when compiling Scheme at run
  107. time, you want a Scheme value: for example, a compiled procedure. For
  108. this reason, so as not to break the abstraction, Guile defines a fake
  109. language at the bottom of the tower:
  110. @itemize
  111. @item Value
  112. @end itemize
  113. Compiling to @code{value} loads the object code into a procedure, and
  114. wakes the sleeping giant.
  115. Perhaps this strangeness can be explained by example:
  116. @code{compile-file} defaults to compiling to object code, because it
  117. produces object code that has to live in the barren world outside the
  118. Guile runtime; but @code{compile} defaults to compiling to
  119. @code{value}, as its product re-enters the Guile world.
  120. Indeed, the process of compilation can circulate through these
  121. different worlds indefinitely, as shown by the following quine:
  122. @example
  123. ((lambda (x) ((compile x) x)) '(lambda (x) ((compile x) x)))
  124. @end example
  125. @node The Scheme Compiler
  126. @subsection The Scheme Compiler
  127. The job of the Scheme compiler is to expand all macros and all of Scheme
  128. to its most primitive expressions. The definition of ``primitive'' is
  129. given by the inventory of constructs provided by Tree-IL, the target
  130. language of the Scheme compiler: procedure calls, conditionals, lexical
  131. references, etc. This is described more fully in the next section.
  132. The tricky and amusing thing about the Scheme-to-Tree-IL compiler is
  133. that it is completely implemented by the macro expander. Since the
  134. macro expander has to run over all of the source code already in order
  135. to expand macros, it might as well do the analysis at the same time,
  136. producing Tree-IL expressions directly.
  137. Because this compiler is actually the macro expander, it is
  138. extensible. Any macro which the user writes becomes part of the
  139. compiler.
  140. The Scheme-to-Tree-IL expander may be invoked using the generic
  141. @code{compile} procedure:
  142. @lisp
  143. (compile '(+ 1 2) #:from 'scheme #:to 'tree-il)
  144. @result{}
  145. #<<call> src: #f
  146. proc: #<<toplevel-ref> src: #f name: +>
  147. args: (#<<const> src: #f exp: 1>
  148. #<<const> src: #f exp: 2>)>
  149. @end lisp
  150. Or, since Tree-IL is so close to Scheme, it is often useful to expand
  151. Scheme to Tree-IL, then translate back to Scheme. For that reason the
  152. expander provides two interfaces. The former is equivalent to calling
  153. @code{(macroexpand '(+ 1 2) 'c)}, where the @code{'c} is for
  154. ``compile''. With @code{'e} (the default), the result is translated
  155. back to Scheme:
  156. @lisp
  157. (macroexpand '(+ 1 2))
  158. @result{} (+ 1 2)
  159. (macroexpand '(let ((x 10)) (* x x)))
  160. @result{} (let ((x84 10)) (* x84 x84))
  161. @end lisp
  162. The second example shows that as part of its job, the macro expander
  163. renames lexically-bound variables. The original names are preserved
  164. when compiling to Tree-IL, but can't be represented in Scheme: a
  165. lexical binding only has one name. It is for this reason that the
  166. @emph{native} output of the expander is @emph{not} Scheme. There's too
  167. much information we would lose if we translated to Scheme directly:
  168. lexical variable names, source locations, and module hygiene.
  169. Note however that @code{macroexpand} does not have the same signature
  170. as @code{compile-tree-il}. @code{compile-tree-il} is a small wrapper
  171. around @code{macroexpand}, to make it conform to the general form of
  172. compiler procedures in Guile's language tower.
  173. Compiler procedures take three arguments: an expression, an
  174. environment, and a keyword list of options. They return three values:
  175. the compiled expression, the corresponding environment for the target
  176. language, and a ``continuation environment''. The compiled expression
  177. and environment will serve as input to the next language's compiler.
  178. The ``continuation environment'' can be used to compile another
  179. expression from the same source language within the same module.
  180. For example, you might compile the expression, @code{(define-module
  181. (foo))}. This will result in a Tree-IL expression and environment. But
  182. if you compiled a second expression, you would want to take into
  183. account the compile-time effect of compiling the previous expression,
  184. which puts the user in the @code{(foo)} module. That is purpose of the
  185. ``continuation environment''; you would pass it as the environment
  186. when compiling the subsequent expression.
  187. For Scheme, an environment is a module. By default, the @code{compile}
  188. and @code{compile-file} procedures compile in a fresh module, such
  189. that bindings and macros introduced by the expression being compiled
  190. are isolated:
  191. @example
  192. (eq? (current-module) (compile '(current-module)))
  193. @result{} #f
  194. (compile '(define hello 'world))
  195. (defined? 'hello)
  196. @result{} #f
  197. (define / *)
  198. (eq? (compile '/) /)
  199. @result{} #f
  200. @end example
  201. Similarly, changes to the @code{current-reader} fluid (@pxref{Loading,
  202. @code{current-reader}}) are isolated:
  203. @example
  204. (compile '(fluid-set! current-reader (lambda args 'fail)))
  205. (fluid-ref current-reader)
  206. @result{} #f
  207. @end example
  208. Nevertheless, having the compiler and @dfn{compilee} share the same name
  209. space can be achieved by explicitly passing @code{(current-module)} as
  210. the compilation environment:
  211. @example
  212. (define hello 'world)
  213. (compile 'hello #:env (current-module))
  214. @result{} world
  215. @end example
  216. @node Tree-IL
  217. @subsection Tree-IL
  218. Tree Intermediate Language (Tree-IL) is a structured intermediate
  219. language that is close in expressive power to Scheme. It is an
  220. expanded, pre-analyzed Scheme.
  221. Tree-IL is ``structured'' in the sense that its representation is
  222. based on records, not S-expressions. This gives a rigidity to the
  223. language that ensures that compiling to a lower-level language only
  224. requires a limited set of transformations. For example, the Tree-IL
  225. type @code{<const>} is a record type with two fields, @code{src} and
  226. @code{exp}. Instances of this type are created via @code{make-const}.
  227. Fields of this type are accessed via the @code{const-src} and
  228. @code{const-exp} procedures. There is also a predicate, @code{const?}.
  229. @xref{Records}, for more information on records.
  230. @c alpha renaming
  231. All Tree-IL types have a @code{src} slot, which holds source location
  232. information for the expression. This information, if present, will be
  233. residualized into the compiled object code, allowing backtraces to
  234. show source information. The format of @code{src} is the same as that
  235. returned by Guile's @code{source-properties} function. @xref{Source
  236. Properties}, for more information.
  237. Although Tree-IL objects are represented internally using records,
  238. there is also an equivalent S-expression external representation for
  239. each kind of Tree-IL. For example, the S-expression representation
  240. of @code{#<const src: #f exp: 3>} expression would be:
  241. @example
  242. (const 3)
  243. @end example
  244. Users may program with this format directly at the REPL:
  245. @example
  246. scheme@@(guile-user)> ,language tree-il
  247. Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'.
  248. tree-il@@(guile-user)> (apply (primitive +) (const 32) (const 10))
  249. @result{} 42
  250. @end example
  251. The @code{src} fields are left out of the external representation.
  252. One may create Tree-IL objects from their external representations via
  253. calling @code{parse-tree-il}, the reader for Tree-IL. If any source
  254. information is attached to the input S-expression, it will be
  255. propagated to the resulting Tree-IL expressions. This is probably the
  256. easiest way to compile to Tree-IL: just make the appropriate external
  257. representations in S-expression format, and let @code{parse-tree-il}
  258. take care of the rest.
  259. @deftp {Scheme Variable} <void> src
  260. @deftpx {External Representation} (void)
  261. An empty expression. In practice, equivalent to Scheme's @code{(if #f
  262. #f)}.
  263. @end deftp
  264. @deftp {Scheme Variable} <const> src exp
  265. @deftpx {External Representation} (const @var{exp})
  266. A constant.
  267. @end deftp
  268. @deftp {Scheme Variable} <primitive-ref> src name
  269. @deftpx {External Representation} (primitive @var{name})
  270. A reference to a ``primitive''. A primitive is a procedure that, when
  271. compiled, may be open-coded. For example, @code{cons} is usually
  272. recognized as a primitive, so that it compiles down to a single
  273. instruction.
  274. Compilation of Tree-IL usually begins with a pass that resolves some
  275. @code{<module-ref>} and @code{<toplevel-ref>} expressions to
  276. @code{<primitive-ref>} expressions. The actual compilation pass has
  277. special cases for calls to certain primitives, like @code{apply} or
  278. @code{cons}.
  279. @end deftp
  280. @deftp {Scheme Variable} <lexical-ref> src name gensym
  281. @deftpx {External Representation} (lexical @var{name} @var{gensym})
  282. A reference to a lexically-bound variable. The @var{name} is the
  283. original name of the variable in the source program. @var{gensym} is a
  284. unique identifier for this variable.
  285. @end deftp
  286. @deftp {Scheme Variable} <lexical-set> src name gensym exp
  287. @deftpx {External Representation} (set! (lexical @var{name} @var{gensym}) @var{exp})
  288. Sets a lexically-bound variable.
  289. @end deftp
  290. @deftp {Scheme Variable} <module-ref> src mod name public?
  291. @deftpx {External Representation} (@@ @var{mod} @var{name})
  292. @deftpx {External Representation} (@@@@ @var{mod} @var{name})
  293. A reference to a variable in a specific module. @var{mod} should be
  294. the name of the module, e.g.@: @code{(guile-user)}.
  295. If @var{public?} is true, the variable named @var{name} will be looked
  296. up in @var{mod}'s public interface, and serialized with @code{@@};
  297. otherwise it will be looked up among the module's private bindings,
  298. and is serialized with @code{@@@@}.
  299. @end deftp
  300. @deftp {Scheme Variable} <module-set> src mod name public? exp
  301. @deftpx {External Representation} (set! (@@ @var{mod} @var{name}) @var{exp})
  302. @deftpx {External Representation} (set! (@@@@ @var{mod} @var{name}) @var{exp})
  303. Sets a variable in a specific module.
  304. @end deftp
  305. @deftp {Scheme Variable} <toplevel-ref> src name
  306. @deftpx {External Representation} (toplevel @var{name})
  307. References a variable from the current procedure's module.
  308. @end deftp
  309. @deftp {Scheme Variable} <toplevel-set> src name exp
  310. @deftpx {External Representation} (set! (toplevel @var{name}) @var{exp})
  311. Sets a variable in the current procedure's module.
  312. @end deftp
  313. @deftp {Scheme Variable} <toplevel-define> src name exp
  314. @deftpx {External Representation} (define (toplevel @var{name}) @var{exp})
  315. Defines a new top-level variable in the current procedure's module.
  316. @end deftp
  317. @deftp {Scheme Variable} <conditional> src test then else
  318. @deftpx {External Representation} (if @var{test} @var{then} @var{else})
  319. A conditional. Note that @var{else} is not optional.
  320. @end deftp
  321. @deftp {Scheme Variable} <call> src proc args
  322. @deftpx {External Representation} (call @var{proc} . @var{args})
  323. A procedure call.
  324. @end deftp
  325. @deftp {Scheme Variable} <primcall> src name args
  326. @deftpx {External Representation} (primcall @var{name} . @var{args})
  327. A call to a primitive. Equivalent to @code{(call (primitive @var{name})
  328. . @var{args})}. This construct is often more convenient to generate and
  329. analyze than @code{<call>}.
  330. As part of the compilation process, instances of @code{(call (primitive
  331. @var{name}) . @var{args})} are transformed into primcalls.
  332. @end deftp
  333. @deftp {Scheme Variable} <sequence> src exps
  334. @deftpx {External Representation} (begin . @var{exps})
  335. Like Scheme's @code{begin}.
  336. @end deftp
  337. @deftp {Scheme Variable} <lambda> src meta body
  338. @deftpx {External Representation} (lambda @var{meta} @var{body})
  339. A closure. @var{meta} is an association list of properties for the
  340. procedure. @var{body} is a single Tree-IL expression of type
  341. @code{<lambda-case>}. As the @code{<lambda-case>} clause can chain to
  342. an alternate clause, this makes Tree-IL's @code{<lambda>} have the
  343. expressiveness of Scheme's @code{case-lambda}.
  344. @end deftp
  345. @deftp {Scheme Variable} <lambda-case> req opt rest kw inits gensyms body alternate
  346. @deftpx {External Representation} @
  347. (lambda-case ((@var{req} @var{opt} @var{rest} @var{kw} @var{inits} @var{gensyms})@
  348. @var{body})@
  349. [@var{alternate}])
  350. One clause of a @code{case-lambda}. A @code{lambda} expression in
  351. Scheme is treated as a @code{case-lambda} with one clause.
  352. @var{req} is a list of the procedure's required arguments, as symbols.
  353. @var{opt} is a list of the optional arguments, or @code{#f} if there
  354. are no optional arguments. @var{rest} is the name of the rest
  355. argument, or @code{#f}.
  356. @var{kw} is a list of the form, @code{(@var{allow-other-keys?}
  357. (@var{keyword} @var{name} @var{var}) ...)}, where @var{keyword} is the
  358. keyword corresponding to the argument named @var{name}, and whose
  359. corresponding gensym is @var{var}. @var{inits} are tree-il expressions
  360. corresponding to all of the optional and keyword arguments, evaluated
  361. to bind variables whose value is not supplied by the procedure caller.
  362. Each @var{init} expression is evaluated in the lexical context of
  363. previously bound variables, from left to right.
  364. @var{gensyms} is a list of gensyms corresponding to all arguments:
  365. first all of the required arguments, then the optional arguments if
  366. any, then the rest argument if any, then all of the keyword arguments.
  367. @var{body} is the body of the clause. If the procedure is called with
  368. an appropriate number of arguments, @var{body} is evaluated in tail
  369. position. Otherwise, if there is a @var{consequent}, it should be a
  370. @code{<lambda-case>} expression, representing the next clause to try.
  371. If there is no @var{consequent}, a wrong-number-of-arguments error is
  372. signaled.
  373. @end deftp
  374. @deftp {Scheme Variable} <let> src names gensyms vals exp
  375. @deftpx {External Representation} (let @var{names} @var{gensyms} @var{vals} @var{exp})
  376. Lexical binding, like Scheme's @code{let}. @var{names} are the
  377. original binding names, @var{gensyms} are gensyms corresponding to the
  378. @var{names}, and @var{vals} are Tree-IL expressions for the values.
  379. @var{exp} is a single Tree-IL expression.
  380. @end deftp
  381. @deftp {Scheme Variable} <letrec> in-order? src names gensyms vals exp
  382. @deftpx {External Representation} (letrec @var{names} @var{gensyms} @var{vals} @var{exp})
  383. @deftpx {External Representation} (letrec* @var{names} @var{gensyms} @var{vals} @var{exp})
  384. A version of @code{<let>} that creates recursive bindings, like
  385. Scheme's @code{letrec}, or @code{letrec*} if @var{in-order?} is true.
  386. @end deftp
  387. @deftp {Scheme Variable} <dynlet> fluids vals body
  388. @deftpx {External Representation} (dynlet @var{fluids} @var{vals} @var{body})
  389. Dynamic binding; the equivalent of Scheme's @code{with-fluids}.
  390. @var{fluids} should be a list of Tree-IL expressions that will
  391. evaluate to fluids, and @var{vals} a corresponding list of expressions
  392. to bind to the fluids during the dynamic extent of the evaluation of
  393. @var{body}.
  394. @end deftp
  395. @deftp {Scheme Variable} <dynref> fluid
  396. @deftpx {External Representation} (dynref @var{fluid})
  397. A dynamic variable reference. @var{fluid} should be a Tree-IL
  398. expression evaluating to a fluid.
  399. @end deftp
  400. @deftp {Scheme Variable} <dynset> fluid exp
  401. @deftpx {External Representation} (dynset @var{fluid} @var{exp})
  402. A dynamic variable set. @var{fluid}, a Tree-IL expression evaluating
  403. to a fluid, will be set to the result of evaluating @var{exp}.
  404. @end deftp
  405. @deftp {Scheme Variable} <dynwind> winder pre body post unwinder
  406. @deftpx {External Representation} (dynwind @var{winder} @var{pre} @var{body} @var{post} @var{unwinder})
  407. A @code{dynamic-wind}. @var{winder} and @var{unwinder} should both
  408. evaluate to thunks. Ensure that the winder and the unwinder are called
  409. before entering and after leaving @var{body}. Note that @var{body} is
  410. an expression, without a thunk wrapper. Guile actually inlines the
  411. bodies of @var{winder} and @var{unwinder} for the case of normal control
  412. flow, compiling the expressions in @var{pre} and @var{post},
  413. respectively.
  414. @end deftp
  415. @deftp {Scheme Variable} <prompt> tag body handler
  416. @deftpx {External Representation} (prompt @var{tag} @var{body} @var{handler})
  417. A dynamic prompt. Instates a prompt named @var{tag}, an expression,
  418. during the dynamic extent of the execution of @var{body}, also an
  419. expression. If an abort occurs to this prompt, control will be passed
  420. to @var{handler}, a @code{<lambda-case>} expression with no optional
  421. or keyword arguments, and no alternate. The first argument to the
  422. @code{<lambda-case>} will be the captured continuation, and then all
  423. of the values passed to the abort. @xref{Prompts}, for more
  424. information.
  425. @end deftp
  426. @deftp {Scheme Variable} <abort> tag args tail
  427. @deftpx {External Representation} (abort @var{tag} @var{args} @var{tail})
  428. An abort to the nearest prompt with the name @var{tag}, an expression.
  429. @var{args} should be a list of expressions to pass to the prompt's
  430. handler, and @var{tail} should be an expression that will evaluate to
  431. a list of additional arguments. An abort will save the partial
  432. continuation, which may later be reinstated, resulting in the
  433. @code{<abort>} expression evaluating to some number of values.
  434. @end deftp
  435. There are two Tree-IL constructs that are not normally produced by
  436. higher-level compilers, but instead are generated during the
  437. source-to-source optimization and analysis passes that the Tree-IL
  438. compiler does. Users should not generate these expressions directly,
  439. unless they feel very clever, as the default analysis pass will
  440. generate them as necessary.
  441. @deftp {Scheme Variable} <let-values> src names gensyms exp body
  442. @deftpx {External Representation} (let-values @var{names} @var{gensyms} @var{exp} @var{body})
  443. Like Scheme's @code{receive} -- binds the values returned by
  444. evaluating @code{exp} to the @code{lambda}-like bindings described by
  445. @var{gensyms}. That is to say, @var{gensyms} may be an improper list.
  446. @code{<let-values>} is an optimization of a @code{<call>} to the
  447. primitive, @code{call-with-values}.
  448. @end deftp
  449. @deftp {Scheme Variable} <fix> src names gensyms vals body
  450. @deftpx {External Representation} (fix @var{names} @var{gensyms} @var{vals} @var{body})
  451. Like @code{<letrec>}, but only for @var{vals} that are unset
  452. @code{lambda} expressions.
  453. @code{fix} is an optimization of @code{letrec} (and @code{let}).
  454. @end deftp
  455. Tree-IL implements a compiler to GLIL that recursively traverses
  456. Tree-IL expressions, writing out GLIL expressions into a linear list.
  457. The compiler also keeps some state as to whether the current
  458. expression is in tail context, and whether its value will be used in
  459. future computations. This state allows the compiler not to emit code
  460. for constant expressions that will not be used (e.g.@: docstrings), and
  461. to perform tail calls when in tail position.
  462. Most optimization, such as it currently is, is performed on Tree-IL
  463. expressions as source-to-source transformations. There will be more
  464. optimizations added in the future.
  465. Interested readers are encouraged to read the implementation in
  466. @code{(language tree-il compile-glil)} for more details.
  467. @node GLIL
  468. @subsection GLIL
  469. Guile Lowlevel Intermediate Language (GLIL) is a structured intermediate
  470. language whose expressions more closely approximate Guile's VM
  471. instruction set. Its expression types are defined in @code{(language
  472. glil)}.
  473. @deftp {Scheme Variable} <glil-program> meta . body
  474. A unit of code that at run-time will correspond to a compiled
  475. procedure. @var{meta} should be an alist of properties, as in
  476. Tree-IL's @code{<lambda>}. @var{body} is an ordered list of GLIL
  477. expressions.
  478. @end deftp
  479. @deftp {Scheme Variable} <glil-std-prelude> nreq nlocs else-label
  480. A prologue for a function with no optional, keyword, or rest
  481. arguments. @var{nreq} is the number of required arguments. @var{nlocs}
  482. the total number of local variables, including the arguments. If the
  483. procedure was not given exactly @var{nreq} arguments, control will
  484. jump to @var{else-label}, if given, or otherwise signal an error.
  485. @end deftp
  486. @deftp {Scheme Variable} <glil-opt-prelude> nreq nopt rest nlocs else-label
  487. A prologue for a function with optional or rest arguments. Like
  488. @code{<glil-std-prelude>}, with the addition that @var{nopt} is the
  489. number of optional arguments (possibly zero) and @var{rest} is an
  490. index of a local variable at which to bind a rest argument, or
  491. @code{#f} if there is no rest argument.
  492. @end deftp
  493. @deftp {Scheme Variable} <glil-kw-prelude> nreq nopt rest kw allow-other-keys? nlocs else-label
  494. A prologue for a function with keyword arguments. Like
  495. @code{<glil-opt-prelude>}, with the addition that @var{kw} is a list
  496. of keyword arguments, and @var{allow-other-keys?} is a flag indicating
  497. whether to allow unknown keys. @xref{Function Prologue Instructions,
  498. @code{bind-kwargs}}, for details on the format of @var{kw}.
  499. @end deftp
  500. @deftp {Scheme Variable} <glil-bind> . vars
  501. An advisory expression that notes a liveness extent for a set of
  502. variables. @var{vars} is a list of @code{(@var{name} @var{type}
  503. @var{index})}, where @var{type} should be either @code{argument},
  504. @code{local}, or @code{external}.
  505. @code{<glil-bind>} expressions end up being serialized as part of a
  506. program's metadata and do not form part of a program's code path.
  507. @end deftp
  508. @deftp {Scheme Variable} <glil-mv-bind> vars rest
  509. A multiple-value binding of the values on the stack to @var{vars}. Iff
  510. @var{rest} is true, the last element of @var{vars} will be treated as
  511. a rest argument.
  512. In addition to pushing a binding annotation on the stack, like
  513. @code{<glil-bind>}, an expression is emitted at compilation time to
  514. make sure that there are enough values available to bind. See the
  515. notes on @code{truncate-values} in @ref{Procedure Call and Return
  516. Instructions}, for more information.
  517. @end deftp
  518. @deftp {Scheme Variable} <glil-unbind>
  519. Closes the liveness extent of the most recently encountered
  520. @code{<glil-bind>} or @code{<glil-mv-bind>} expression. As GLIL
  521. expressions are compiled, a parallel stack of live bindings is
  522. maintained; this expression pops off the top element from that stack.
  523. Bindings are written into the program's metadata so that debuggers and
  524. other tools can determine the set of live local variables at a given
  525. offset within a VM program.
  526. @end deftp
  527. @deftp {Scheme Variable} <glil-source> loc
  528. Records source information for the preceding expression. @var{loc}
  529. should be an association list of containing @code{line} @code{column},
  530. and @code{filename} keys, e.g.@: as returned by
  531. @code{source-properties}.
  532. @end deftp
  533. @deftp {Scheme Variable} <glil-void>
  534. Pushes ``the unspecified value'' on the stack.
  535. @end deftp
  536. @deftp {Scheme Variable} <glil-const> obj
  537. Pushes a constant value onto the stack. @var{obj} must be a number,
  538. string, symbol, keyword, boolean, character, uniform array, the empty
  539. list, or a pair or vector of constants.
  540. @end deftp
  541. @deftp {Scheme Variable} <glil-lexical> local? boxed? op index
  542. Accesses a lexically bound variable. If the variable is not
  543. @var{local?} it is free. All variables may have @code{ref},
  544. @code{set}, and @code{bound?} as their @var{op}. Boxed variables may
  545. also have the @var{op}s @code{box}, @code{empty-box}, and @code{fix},
  546. which correspond in semantics to the VM instructions @code{box},
  547. @code{empty-box}, and @code{fix-closure}. @xref{Stack Layout}, for
  548. more information.
  549. @end deftp
  550. @deftp {Scheme Variable} <glil-toplevel> op name
  551. Accesses a toplevel variable. @var{op} may be @code{ref}, @code{set},
  552. or @code{define}.
  553. @end deftp
  554. @deftp {Scheme Variable} <glil-module> op mod name public?
  555. Accesses a variable within a specific module. See Tree-IL's
  556. @code{<module-ref>}, for more information.
  557. @end deftp
  558. @deftp {Scheme Variable} <glil-label> label
  559. Creates a new label. @var{label} can be any Scheme value, and should
  560. be unique.
  561. @end deftp
  562. @deftp {Scheme Variable} <glil-branch> inst label
  563. Branch to a label. @var{label} should be a @code{<ghil-label>}.
  564. @code{inst} is a branching instruction: @code{br-if}, @code{br}, etc.
  565. @end deftp
  566. @deftp {Scheme Variable} <glil-call> inst nargs
  567. This expression is probably misnamed, as it does not correspond to
  568. function calls. @code{<glil-call>} invokes the VM instruction named
  569. @var{inst}, noting that it is called with @var{nargs} stack arguments.
  570. The arguments should be pushed on the stack already. What happens to
  571. the stack afterwards depends on the instruction.
  572. @end deftp
  573. @deftp {Scheme Variable} <glil-mv-call> nargs ra
  574. Performs a multiple-value call. @var{ra} is a @code{<glil-label>}
  575. corresponding to the multiple-value return address for the call. See
  576. the notes on @code{mv-call} in @ref{Procedure Call and Return
  577. Instructions}, for more information.
  578. @end deftp
  579. @deftp {Scheme Variable} <glil-prompt> label escape-only?
  580. Push a dynamic prompt into the stack, with a handler at @var{label}.
  581. @var{escape-only?} is a flag that is propagated to the prompt,
  582. allowing an abort to avoid capturing a continuation in some cases.
  583. @xref{Prompts}, for more information.
  584. @end deftp
  585. Users may enter in GLIL at the REPL as well, though there is a bit
  586. more bookkeeping to do:
  587. @example
  588. scheme@@(guile-user)> ,language glil
  589. Happy hacking with Guile Lowlevel Intermediate Language (GLIL)!
  590. To switch back, type `,L scheme'.
  591. glil@@(guile-user)> (program () (std-prelude 0 0 #f)
  592. (const 3) (call return 1))
  593. @result{} 3
  594. @end example
  595. Just as in all of Guile's compilers, an environment is passed to the
  596. GLIL-to-object code compiler, and one is returned as well, along with
  597. the object code.
  598. @node Assembly
  599. @subsection Assembly
  600. Assembly is an S-expression-based, human-readable representation of
  601. the actual bytecodes that will be emitted for the VM. As such, it is a
  602. useful intermediate language both for compilation and for
  603. decompilation.
  604. Besides the fact that it is not a record-based language, assembly
  605. differs from GLIL in four main ways:
  606. @itemize
  607. @item Labels have been resolved to byte offsets in the program.
  608. @item Constants inside procedures have either been expressed as inline
  609. instructions or cached in object arrays.
  610. @item Procedures with metadata (source location information, liveness
  611. extents, procedure names, generic properties, etc) have had their
  612. metadata serialized out to thunks.
  613. @item All expressions correspond directly to VM instructions -- i.e.,
  614. there is no @code{<glil-lexical>} which can be a ref or a set.
  615. @end itemize
  616. Assembly is isomorphic to the bytecode that it compiles to. You can
  617. compile to bytecode, then decompile back to assembly, and you have the
  618. same assembly code.
  619. The general form of assembly instructions is the following:
  620. @lisp
  621. (@var{inst} @var{arg} ...)
  622. @end lisp
  623. The @var{inst} names a VM instruction, and its @var{arg}s will be
  624. embedded in the instruction stream. The easiest way to see assembly is
  625. to play around with it at the REPL, as can be seen in this annotated
  626. example:
  627. @example
  628. scheme@@(guile-user)> ,pp (compile '(+ 32 10) #:to 'assembly)
  629. (load-program
  630. ((:LCASE16 . 2)) ; Labels, unused in this case.
  631. 8 ; Length of the thunk that was compiled.
  632. (load-program ; Metadata thunk.
  633. ()
  634. 17
  635. #f ; No metadata thunk for the metadata thunk.
  636. (make-eol)
  637. (make-eol)
  638. (make-int8 2) ; Liveness extents, source info, and arities,
  639. (make-int8 8) ; in a format that Guile knows how to parse.
  640. (make-int8:0)
  641. (list 0 3)
  642. (list 0 1)
  643. (list 0 3)
  644. (return))
  645. (assert-nargs-ee/locals 0) ; Prologue.
  646. (make-int8 32) ; Actual code starts here.
  647. (make-int8 10)
  648. (add)
  649. (return))
  650. @end example
  651. Of course you can switch the REPL to assembly and enter in assembly
  652. S-expressions directly, like with other languages, though it is more
  653. difficult, given that the length fields have to be correct.
  654. @node Bytecode and Objcode
  655. @subsection Bytecode and Objcode
  656. Finally, the raw bytes. There are actually two different ``languages''
  657. here, corresponding to two different ways to represent the bytes.
  658. ``Bytecode'' represents code as uniform byte vectors, useful for
  659. structuring and destructuring code on the Scheme level. Bytecode is
  660. the next step down from assembly:
  661. @example
  662. scheme@@(guile-user)> (compile '(+ 32 10) #:to 'bytecode)
  663. @result{} #vu8(8 0 0 0 25 0 0 0 ; Header.
  664. 95 0 ; Prologue.
  665. 10 32 10 10 148 66 17 ; Actual code.
  666. 0 0 0 0 0 0 0 9 ; Metadata thunk.
  667. 9 10 2 10 8 11 18 0 3 18 0 1 18 0 3 66)
  668. @end example
  669. ``Objcode'' is bytecode, but mapped directly to a C structure,
  670. @code{struct scm_objcode}:
  671. @example
  672. struct scm_objcode @{
  673. scm_t_uint32 len;
  674. scm_t_uint32 metalen;
  675. scm_t_uint8 base[0];
  676. @};
  677. @end example
  678. As one might imagine, objcode imposes a minimum length on the
  679. bytecode. Also, the @code{len} and @code{metalen} fields are in native
  680. endianness, which makes objcode (and bytecode) system-dependent.
  681. Objcode also has a couple of important efficiency hacks. First,
  682. objcode may be mapped directly from disk, allowing compiled code to be
  683. loaded quickly, often from the system's disk cache, and shared among
  684. multiple processes. Secondly, objcode may be embedded in other
  685. objcode, allowing procedures to have the text of other procedures
  686. inlined into their bodies, without the need for separate allocation of
  687. the code. Of course, the objcode object itself does need to be
  688. allocated.
  689. Procedures related to objcode are defined in the @code{(system vm
  690. objcode)} module.
  691. @deffn {Scheme Procedure} objcode? obj
  692. @deffnx {C Function} scm_objcode_p (obj)
  693. Returns @code{#f} iff @var{obj} is object code, @code{#f} otherwise.
  694. @end deffn
  695. @deffn {Scheme Procedure} bytecode->objcode bytecode
  696. @deffnx {C Function} scm_bytecode_to_objcode (bytecode)
  697. Makes a bytecode object from @var{bytecode}, which should be a
  698. bytevector. @xref{Bytevectors}.
  699. @end deffn
  700. @deffn {Scheme Variable} load-objcode file
  701. @deffnx {C Function} scm_load_objcode (file)
  702. Load object code from a file named @var{file}. The file will be mapped
  703. into memory via @code{mmap}, so this is a very fast operation.
  704. On disk, object code has an sixteen-byte cookie prepended to it, to
  705. prevent accidental loading of arbitrary garbage.
  706. @end deffn
  707. @deffn {Scheme Variable} write-objcode objcode file
  708. @deffnx {C Function} scm_write_objcode (objcode)
  709. Write object code out to a file, prepending the sixteen-byte cookie.
  710. @end deffn
  711. @deffn {Scheme Variable} objcode->bytecode objcode
  712. @deffnx {C Function} scm_objcode_to_bytecode (objcode)
  713. Copy object code out to a bytevector for analysis by Scheme.
  714. @end deffn
  715. The following procedure is actually in @code{(system vm program)}, but
  716. we'll mention it here:
  717. @deffn {Scheme Variable} make-program objcode objtable [free-vars=#f]
  718. @deffnx {C Function} scm_make_program (objcode, objtable, free_vars)
  719. Load up object code into a Scheme program. The resulting program will
  720. have @var{objtable} as its object table, which should be a vector or
  721. @code{#f}, and will capture the free variables from @var{free-vars}.
  722. @end deffn
  723. Object code from a file may be disassembled at the REPL via the
  724. meta-command @code{,disassemble-file}, abbreviated as @code{,xx}.
  725. Programs may be disassembled via @code{,disassemble}, abbreviated as
  726. @code{,x}.
  727. Compiling object code to the fake language, @code{value}, is performed
  728. via loading objcode into a program, then executing that thunk with
  729. respect to the compilation environment. Normally the environment
  730. propagates through the compiler transparently, but users may specify
  731. the compilation environment manually as well, as a module.
  732. @node Writing New High-Level Languages
  733. @subsection Writing New High-Level Languages
  734. In order to integrate a new language @var{lang} into Guile's compiler
  735. system, one has to create the module @code{(language @var{lang} spec)}
  736. containing the language definition and referencing the parser,
  737. compiler and other routines processing it. The module hierarchy in
  738. @code{(language brainfuck)} defines a very basic Brainfuck
  739. implementation meant to serve as easy-to-understand example on how to
  740. do this. See for instance @url{http://en.wikipedia.org/wiki/Brainfuck}
  741. for more information about the Brainfuck language itself.
  742. @node Extending the Compiler
  743. @subsection Extending the Compiler
  744. At this point we take a detour from the impersonal tone of the rest of
  745. the manual. Admit it: if you've read this far into the compiler
  746. internals manual, you are a junkie. Perhaps a course at your university
  747. left you unsated, or perhaps you've always harbored a desire to hack the
  748. holy of computer science holies: a compiler. Well you're in good
  749. company, and in a good position. Guile's compiler needs your help.
  750. There are many possible avenues for improving Guile's compiler.
  751. Probably the most important improvement, speed-wise, will be some form
  752. of native compilation, both just-in-time and ahead-of-time. This could
  753. be done in many ways. Probably the easiest strategy would be to extend
  754. the compiled procedure structure to include a pointer to a native code
  755. vector, and compile from bytecode to native code at run-time after a
  756. procedure is called a certain number of times.
  757. The name of the game is a profiling-based harvest of the low-hanging
  758. fruit, running programs of interest under a system-level profiler and
  759. determining which improvements would give the most bang for the buck.
  760. It's really getting to the point though that native compilation is the
  761. next step.
  762. The compiler also needs help at the top end, enhancing the Scheme that
  763. it knows to also understand R6RS, and adding new high-level compilers.
  764. We have JavaScript and Emacs Lisp mostly complete, but they could use
  765. some love; Lua would be nice as well, but whatever language it is
  766. that strikes your fancy would be welcome too.
  767. Compilers are for hacking, not for admiring or for complaining about.
  768. Get to it!