123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895 |
- @c -*-texinfo-*-
- @c This is part of the GNU Guile Reference Manual.
- @c Copyright (C) 2008, 2009, 2010, 2011
- @c Free Software Foundation, Inc.
- @c See the file guile.texi for copying conditions.
- @node Compiling to the Virtual Machine
- @section Compiling to the Virtual Machine
- Compilers have a mystique about them that is attractive and
- off-putting at the same time. They are attractive because they are
- magical -- they transform inert text into live results, like throwing
- the switch on Frankenstein's monster. However, this magic is perceived
- by many to be impenetrable.
- This section aims to pay attention to the small man behind the
- curtain.
- @xref{Read/Load/Eval/Compile}, if you're lost and you just wanted to
- know how to compile your @code{.scm} file.
- @menu
- * Compiler Tower::
- * The Scheme Compiler::
- * Tree-IL::
- * GLIL::
- * Assembly::
- * Bytecode and Objcode::
- * Writing New High-Level Languages::
- * Extending the Compiler::
- @end menu
- @node Compiler Tower
- @subsection Compiler Tower
- Guile's compiler is quite simple, actually -- its @emph{compilers}, to
- put it more accurately. Guile defines a tower of languages, starting
- at Scheme and progressively simplifying down to languages that
- resemble the VM instruction set (@pxref{Instruction Set}).
- Each language knows how to compile to the next, so each step is simple
- and understandable. Furthermore, this set of languages is not
- hardcoded into Guile, so it is possible for the user to add new
- high-level languages, new passes, or even different compilation
- targets.
- Languages are registered in the module, @code{(system base language)}:
- @example
- (use-modules (system base language))
- @end example
- They are registered with the @code{define-language} form.
- @deffn {Scheme Syntax} define-language @
- name title reader printer @
- [parser=#f] [compilers='()] [decompilers='()] [evaluator=#f] @
- [joiner=#f] [make-default-environment=make-fresh-user-module]
- Define a language.
- This syntax defines a @code{#<language>} object, bound to @var{name}
- in the current environment. In addition, the language will be added to
- the global language set. For example, this is the language definition
- for Scheme:
- @example
- (define-language scheme
- #:title "Scheme"
- #:reader (lambda (port env) ...)
- #:compilers `((tree-il . ,compile-tree-il))
- #:decompilers `((tree-il . ,decompile-tree-il))
- #:evaluator (lambda (x module) (primitive-eval x))
- #:printer write
- #:make-default-environment (lambda () ...))
- @end example
- @end deffn
- The interesting thing about having languages defined this way is that
- they present a uniform interface to the read-eval-print loop. This
- allows the user to change the current language of the REPL:
- @example
- scheme@@(guile-user)> ,language tree-il
- Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'.
- tree-il@@(guile-user)> ,L scheme
- Happy hacking with Scheme! To switch back, type `,L tree-il'.
- scheme@@(guile-user)>
- @end example
- Languages can be looked up by name, as they were above.
- @deffn {Scheme Procedure} lookup-language name
- Looks up a language named @var{name}, autoloading it if necessary.
- Languages are autoloaded by looking for a variable named @var{name} in
- a module named @code{(language @var{name} spec)}.
- The language object will be returned, or @code{#f} if there does not
- exist a language with that name.
- @end deffn
- Defining languages this way allows us to programmatically determine
- the necessary steps for compiling code from one language to another.
- @deffn {Scheme Procedure} lookup-compilation-order from to
- Recursively traverses the set of languages to which @var{from} can
- compile, depth-first, and return the first path that can transform
- @var{from} to @var{to}. Returns @code{#f} if no path is found.
- This function memoizes its results in a cache that is invalidated by
- subsequent calls to @code{define-language}, so it should be quite
- fast.
- @end deffn
- There is a notion of a ``current language'', which is maintained in
- the @code{*current-language*} fluid. This language is normally Scheme,
- and may be rebound by the user. The run-time compilation interfaces
- (@pxref{Read/Load/Eval/Compile}) also allow you to choose other source
- and target languages.
- The normal tower of languages when compiling Scheme goes like this:
- @itemize
- @item Scheme
- @item Tree Intermediate Language (Tree-IL)
- @item Guile Lowlevel Intermediate Language (GLIL)
- @item Assembly
- @item Bytecode
- @item Objcode
- @end itemize
- Object code may be serialized to disk directly, though it has a cookie
- and version prepended to the front. But when compiling Scheme at run
- time, you want a Scheme value: for example, a compiled procedure. For
- this reason, so as not to break the abstraction, Guile defines a fake
- language at the bottom of the tower:
- @itemize
- @item Value
- @end itemize
- Compiling to @code{value} loads the object code into a procedure, and
- wakes the sleeping giant.
- Perhaps this strangeness can be explained by example:
- @code{compile-file} defaults to compiling to object code, because it
- produces object code that has to live in the barren world outside the
- Guile runtime; but @code{compile} defaults to compiling to
- @code{value}, as its product re-enters the Guile world.
- Indeed, the process of compilation can circulate through these
- different worlds indefinitely, as shown by the following quine:
- @example
- ((lambda (x) ((compile x) x)) '(lambda (x) ((compile x) x)))
- @end example
- @node The Scheme Compiler
- @subsection The Scheme Compiler
- The job of the Scheme compiler is to expand all macros and all of Scheme
- to its most primitive expressions. The definition of ``primitive'' is
- given by the inventory of constructs provided by Tree-IL, the target
- language of the Scheme compiler: procedure calls, conditionals, lexical
- references, etc. This is described more fully in the next section.
- The tricky and amusing thing about the Scheme-to-Tree-IL compiler is
- that it is completely implemented by the macro expander. Since the
- macro expander has to run over all of the source code already in order
- to expand macros, it might as well do the analysis at the same time,
- producing Tree-IL expressions directly.
- Because this compiler is actually the macro expander, it is
- extensible. Any macro which the user writes becomes part of the
- compiler.
- The Scheme-to-Tree-IL expander may be invoked using the generic
- @code{compile} procedure:
- @lisp
- (compile '(+ 1 2) #:from 'scheme #:to 'tree-il)
- @result{}
- #<<call> src: #f
- proc: #<<toplevel-ref> src: #f name: +>
- args: (#<<const> src: #f exp: 1>
- #<<const> src: #f exp: 2>)>
- @end lisp
- Or, since Tree-IL is so close to Scheme, it is often useful to expand
- Scheme to Tree-IL, then translate back to Scheme. For that reason the
- expander provides two interfaces. The former is equivalent to calling
- @code{(macroexpand '(+ 1 2) 'c)}, where the @code{'c} is for
- ``compile''. With @code{'e} (the default), the result is translated
- back to Scheme:
- @lisp
- (macroexpand '(+ 1 2))
- @result{} (+ 1 2)
- (macroexpand '(let ((x 10)) (* x x)))
- @result{} (let ((x84 10)) (* x84 x84))
- @end lisp
- The second example shows that as part of its job, the macro expander
- renames lexically-bound variables. The original names are preserved
- when compiling to Tree-IL, but can't be represented in Scheme: a
- lexical binding only has one name. It is for this reason that the
- @emph{native} output of the expander is @emph{not} Scheme. There's too
- much information we would lose if we translated to Scheme directly:
- lexical variable names, source locations, and module hygiene.
- Note however that @code{macroexpand} does not have the same signature
- as @code{compile-tree-il}. @code{compile-tree-il} is a small wrapper
- around @code{macroexpand}, to make it conform to the general form of
- compiler procedures in Guile's language tower.
- Compiler procedures take three arguments: an expression, an
- environment, and a keyword list of options. They return three values:
- the compiled expression, the corresponding environment for the target
- language, and a ``continuation environment''. The compiled expression
- and environment will serve as input to the next language's compiler.
- The ``continuation environment'' can be used to compile another
- expression from the same source language within the same module.
- For example, you might compile the expression, @code{(define-module
- (foo))}. This will result in a Tree-IL expression and environment. But
- if you compiled a second expression, you would want to take into
- account the compile-time effect of compiling the previous expression,
- which puts the user in the @code{(foo)} module. That is purpose of the
- ``continuation environment''; you would pass it as the environment
- when compiling the subsequent expression.
- For Scheme, an environment is a module. By default, the @code{compile}
- and @code{compile-file} procedures compile in a fresh module, such
- that bindings and macros introduced by the expression being compiled
- are isolated:
- @example
- (eq? (current-module) (compile '(current-module)))
- @result{} #f
- (compile '(define hello 'world))
- (defined? 'hello)
- @result{} #f
- (define / *)
- (eq? (compile '/) /)
- @result{} #f
- @end example
- Similarly, changes to the @code{current-reader} fluid (@pxref{Loading,
- @code{current-reader}}) are isolated:
- @example
- (compile '(fluid-set! current-reader (lambda args 'fail)))
- (fluid-ref current-reader)
- @result{} #f
- @end example
- Nevertheless, having the compiler and @dfn{compilee} share the same name
- space can be achieved by explicitly passing @code{(current-module)} as
- the compilation environment:
- @example
- (define hello 'world)
- (compile 'hello #:env (current-module))
- @result{} world
- @end example
- @node Tree-IL
- @subsection Tree-IL
- Tree Intermediate Language (Tree-IL) is a structured intermediate
- language that is close in expressive power to Scheme. It is an
- expanded, pre-analyzed Scheme.
- Tree-IL is ``structured'' in the sense that its representation is
- based on records, not S-expressions. This gives a rigidity to the
- language that ensures that compiling to a lower-level language only
- requires a limited set of transformations. For example, the Tree-IL
- type @code{<const>} is a record type with two fields, @code{src} and
- @code{exp}. Instances of this type are created via @code{make-const}.
- Fields of this type are accessed via the @code{const-src} and
- @code{const-exp} procedures. There is also a predicate, @code{const?}.
- @xref{Records}, for more information on records.
- @c alpha renaming
- All Tree-IL types have a @code{src} slot, which holds source location
- information for the expression. This information, if present, will be
- residualized into the compiled object code, allowing backtraces to
- show source information. The format of @code{src} is the same as that
- returned by Guile's @code{source-properties} function. @xref{Source
- Properties}, for more information.
- Although Tree-IL objects are represented internally using records,
- there is also an equivalent S-expression external representation for
- each kind of Tree-IL. For example, the S-expression representation
- of @code{#<const src: #f exp: 3>} expression would be:
- @example
- (const 3)
- @end example
- Users may program with this format directly at the REPL:
- @example
- scheme@@(guile-user)> ,language tree-il
- Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'.
- tree-il@@(guile-user)> (apply (primitive +) (const 32) (const 10))
- @result{} 42
- @end example
- The @code{src} fields are left out of the external representation.
- One may create Tree-IL objects from their external representations via
- calling @code{parse-tree-il}, the reader for Tree-IL. If any source
- information is attached to the input S-expression, it will be
- propagated to the resulting Tree-IL expressions. This is probably the
- easiest way to compile to Tree-IL: just make the appropriate external
- representations in S-expression format, and let @code{parse-tree-il}
- take care of the rest.
- @deftp {Scheme Variable} <void> src
- @deftpx {External Representation} (void)
- An empty expression. In practice, equivalent to Scheme's @code{(if #f
- #f)}.
- @end deftp
- @deftp {Scheme Variable} <const> src exp
- @deftpx {External Representation} (const @var{exp})
- A constant.
- @end deftp
- @deftp {Scheme Variable} <primitive-ref> src name
- @deftpx {External Representation} (primitive @var{name})
- A reference to a ``primitive''. A primitive is a procedure that, when
- compiled, may be open-coded. For example, @code{cons} is usually
- recognized as a primitive, so that it compiles down to a single
- instruction.
- Compilation of Tree-IL usually begins with a pass that resolves some
- @code{<module-ref>} and @code{<toplevel-ref>} expressions to
- @code{<primitive-ref>} expressions. The actual compilation pass has
- special cases for calls to certain primitives, like @code{apply} or
- @code{cons}.
- @end deftp
- @deftp {Scheme Variable} <lexical-ref> src name gensym
- @deftpx {External Representation} (lexical @var{name} @var{gensym})
- A reference to a lexically-bound variable. The @var{name} is the
- original name of the variable in the source program. @var{gensym} is a
- unique identifier for this variable.
- @end deftp
- @deftp {Scheme Variable} <lexical-set> src name gensym exp
- @deftpx {External Representation} (set! (lexical @var{name} @var{gensym}) @var{exp})
- Sets a lexically-bound variable.
- @end deftp
- @deftp {Scheme Variable} <module-ref> src mod name public?
- @deftpx {External Representation} (@@ @var{mod} @var{name})
- @deftpx {External Representation} (@@@@ @var{mod} @var{name})
- A reference to a variable in a specific module. @var{mod} should be
- the name of the module, e.g.@: @code{(guile-user)}.
- If @var{public?} is true, the variable named @var{name} will be looked
- up in @var{mod}'s public interface, and serialized with @code{@@};
- otherwise it will be looked up among the module's private bindings,
- and is serialized with @code{@@@@}.
- @end deftp
- @deftp {Scheme Variable} <module-set> src mod name public? exp
- @deftpx {External Representation} (set! (@@ @var{mod} @var{name}) @var{exp})
- @deftpx {External Representation} (set! (@@@@ @var{mod} @var{name}) @var{exp})
- Sets a variable in a specific module.
- @end deftp
- @deftp {Scheme Variable} <toplevel-ref> src name
- @deftpx {External Representation} (toplevel @var{name})
- References a variable from the current procedure's module.
- @end deftp
- @deftp {Scheme Variable} <toplevel-set> src name exp
- @deftpx {External Representation} (set! (toplevel @var{name}) @var{exp})
- Sets a variable in the current procedure's module.
- @end deftp
- @deftp {Scheme Variable} <toplevel-define> src name exp
- @deftpx {External Representation} (define (toplevel @var{name}) @var{exp})
- Defines a new top-level variable in the current procedure's module.
- @end deftp
- @deftp {Scheme Variable} <conditional> src test then else
- @deftpx {External Representation} (if @var{test} @var{then} @var{else})
- A conditional. Note that @var{else} is not optional.
- @end deftp
- @deftp {Scheme Variable} <call> src proc args
- @deftpx {External Representation} (call @var{proc} . @var{args})
- A procedure call.
- @end deftp
- @deftp {Scheme Variable} <primcall> src name args
- @deftpx {External Representation} (primcall @var{name} . @var{args})
- A call to a primitive. Equivalent to @code{(call (primitive @var{name})
- . @var{args})}. This construct is often more convenient to generate and
- analyze than @code{<call>}.
- As part of the compilation process, instances of @code{(call (primitive
- @var{name}) . @var{args})} are transformed into primcalls.
- @end deftp
- @deftp {Scheme Variable} <sequence> src exps
- @deftpx {External Representation} (begin . @var{exps})
- Like Scheme's @code{begin}.
- @end deftp
- @deftp {Scheme Variable} <lambda> src meta body
- @deftpx {External Representation} (lambda @var{meta} @var{body})
- A closure. @var{meta} is an association list of properties for the
- procedure. @var{body} is a single Tree-IL expression of type
- @code{<lambda-case>}. As the @code{<lambda-case>} clause can chain to
- an alternate clause, this makes Tree-IL's @code{<lambda>} have the
- expressiveness of Scheme's @code{case-lambda}.
- @end deftp
- @deftp {Scheme Variable} <lambda-case> req opt rest kw inits gensyms body alternate
- @deftpx {External Representation} @
- (lambda-case ((@var{req} @var{opt} @var{rest} @var{kw} @var{inits} @var{gensyms})@
- @var{body})@
- [@var{alternate}])
- One clause of a @code{case-lambda}. A @code{lambda} expression in
- Scheme is treated as a @code{case-lambda} with one clause.
- @var{req} is a list of the procedure's required arguments, as symbols.
- @var{opt} is a list of the optional arguments, or @code{#f} if there
- are no optional arguments. @var{rest} is the name of the rest
- argument, or @code{#f}.
- @var{kw} is a list of the form, @code{(@var{allow-other-keys?}
- (@var{keyword} @var{name} @var{var}) ...)}, where @var{keyword} is the
- keyword corresponding to the argument named @var{name}, and whose
- corresponding gensym is @var{var}. @var{inits} are tree-il expressions
- corresponding to all of the optional and keyword arguments, evaluated
- to bind variables whose value is not supplied by the procedure caller.
- Each @var{init} expression is evaluated in the lexical context of
- previously bound variables, from left to right.
- @var{gensyms} is a list of gensyms corresponding to all arguments:
- first all of the required arguments, then the optional arguments if
- any, then the rest argument if any, then all of the keyword arguments.
- @var{body} is the body of the clause. If the procedure is called with
- an appropriate number of arguments, @var{body} is evaluated in tail
- position. Otherwise, if there is a @var{consequent}, it should be a
- @code{<lambda-case>} expression, representing the next clause to try.
- If there is no @var{consequent}, a wrong-number-of-arguments error is
- signaled.
- @end deftp
- @deftp {Scheme Variable} <let> src names gensyms vals exp
- @deftpx {External Representation} (let @var{names} @var{gensyms} @var{vals} @var{exp})
- Lexical binding, like Scheme's @code{let}. @var{names} are the
- original binding names, @var{gensyms} are gensyms corresponding to the
- @var{names}, and @var{vals} are Tree-IL expressions for the values.
- @var{exp} is a single Tree-IL expression.
- @end deftp
- @deftp {Scheme Variable} <letrec> in-order? src names gensyms vals exp
- @deftpx {External Representation} (letrec @var{names} @var{gensyms} @var{vals} @var{exp})
- @deftpx {External Representation} (letrec* @var{names} @var{gensyms} @var{vals} @var{exp})
- A version of @code{<let>} that creates recursive bindings, like
- Scheme's @code{letrec}, or @code{letrec*} if @var{in-order?} is true.
- @end deftp
- @deftp {Scheme Variable} <dynlet> fluids vals body
- @deftpx {External Representation} (dynlet @var{fluids} @var{vals} @var{body})
- Dynamic binding; the equivalent of Scheme's @code{with-fluids}.
- @var{fluids} should be a list of Tree-IL expressions that will
- evaluate to fluids, and @var{vals} a corresponding list of expressions
- to bind to the fluids during the dynamic extent of the evaluation of
- @var{body}.
- @end deftp
- @deftp {Scheme Variable} <dynref> fluid
- @deftpx {External Representation} (dynref @var{fluid})
- A dynamic variable reference. @var{fluid} should be a Tree-IL
- expression evaluating to a fluid.
- @end deftp
- @deftp {Scheme Variable} <dynset> fluid exp
- @deftpx {External Representation} (dynset @var{fluid} @var{exp})
- A dynamic variable set. @var{fluid}, a Tree-IL expression evaluating
- to a fluid, will be set to the result of evaluating @var{exp}.
- @end deftp
- @deftp {Scheme Variable} <dynwind> winder pre body post unwinder
- @deftpx {External Representation} (dynwind @var{winder} @var{pre} @var{body} @var{post} @var{unwinder})
- A @code{dynamic-wind}. @var{winder} and @var{unwinder} should both
- evaluate to thunks. Ensure that the winder and the unwinder are called
- before entering and after leaving @var{body}. Note that @var{body} is
- an expression, without a thunk wrapper. Guile actually inlines the
- bodies of @var{winder} and @var{unwinder} for the case of normal control
- flow, compiling the expressions in @var{pre} and @var{post},
- respectively.
- @end deftp
- @deftp {Scheme Variable} <prompt> tag body handler
- @deftpx {External Representation} (prompt @var{tag} @var{body} @var{handler})
- A dynamic prompt. Instates a prompt named @var{tag}, an expression,
- during the dynamic extent of the execution of @var{body}, also an
- expression. If an abort occurs to this prompt, control will be passed
- to @var{handler}, a @code{<lambda-case>} expression with no optional
- or keyword arguments, and no alternate. The first argument to the
- @code{<lambda-case>} will be the captured continuation, and then all
- of the values passed to the abort. @xref{Prompts}, for more
- information.
- @end deftp
- @deftp {Scheme Variable} <abort> tag args tail
- @deftpx {External Representation} (abort @var{tag} @var{args} @var{tail})
- An abort to the nearest prompt with the name @var{tag}, an expression.
- @var{args} should be a list of expressions to pass to the prompt's
- handler, and @var{tail} should be an expression that will evaluate to
- a list of additional arguments. An abort will save the partial
- continuation, which may later be reinstated, resulting in the
- @code{<abort>} expression evaluating to some number of values.
- @end deftp
- There are two Tree-IL constructs that are not normally produced by
- higher-level compilers, but instead are generated during the
- source-to-source optimization and analysis passes that the Tree-IL
- compiler does. Users should not generate these expressions directly,
- unless they feel very clever, as the default analysis pass will
- generate them as necessary.
- @deftp {Scheme Variable} <let-values> src names gensyms exp body
- @deftpx {External Representation} (let-values @var{names} @var{gensyms} @var{exp} @var{body})
- Like Scheme's @code{receive} -- binds the values returned by
- evaluating @code{exp} to the @code{lambda}-like bindings described by
- @var{gensyms}. That is to say, @var{gensyms} may be an improper list.
- @code{<let-values>} is an optimization of a @code{<call>} to the
- primitive, @code{call-with-values}.
- @end deftp
- @deftp {Scheme Variable} <fix> src names gensyms vals body
- @deftpx {External Representation} (fix @var{names} @var{gensyms} @var{vals} @var{body})
- Like @code{<letrec>}, but only for @var{vals} that are unset
- @code{lambda} expressions.
- @code{fix} is an optimization of @code{letrec} (and @code{let}).
- @end deftp
- Tree-IL implements a compiler to GLIL that recursively traverses
- Tree-IL expressions, writing out GLIL expressions into a linear list.
- The compiler also keeps some state as to whether the current
- expression is in tail context, and whether its value will be used in
- future computations. This state allows the compiler not to emit code
- for constant expressions that will not be used (e.g.@: docstrings), and
- to perform tail calls when in tail position.
- Most optimization, such as it currently is, is performed on Tree-IL
- expressions as source-to-source transformations. There will be more
- optimizations added in the future.
- Interested readers are encouraged to read the implementation in
- @code{(language tree-il compile-glil)} for more details.
- @node GLIL
- @subsection GLIL
- Guile Lowlevel Intermediate Language (GLIL) is a structured intermediate
- language whose expressions more closely approximate Guile's VM
- instruction set. Its expression types are defined in @code{(language
- glil)}.
- @deftp {Scheme Variable} <glil-program> meta . body
- A unit of code that at run-time will correspond to a compiled
- procedure. @var{meta} should be an alist of properties, as in
- Tree-IL's @code{<lambda>}. @var{body} is an ordered list of GLIL
- expressions.
- @end deftp
- @deftp {Scheme Variable} <glil-std-prelude> nreq nlocs else-label
- A prologue for a function with no optional, keyword, or rest
- arguments. @var{nreq} is the number of required arguments. @var{nlocs}
- the total number of local variables, including the arguments. If the
- procedure was not given exactly @var{nreq} arguments, control will
- jump to @var{else-label}, if given, or otherwise signal an error.
- @end deftp
- @deftp {Scheme Variable} <glil-opt-prelude> nreq nopt rest nlocs else-label
- A prologue for a function with optional or rest arguments. Like
- @code{<glil-std-prelude>}, with the addition that @var{nopt} is the
- number of optional arguments (possibly zero) and @var{rest} is an
- index of a local variable at which to bind a rest argument, or
- @code{#f} if there is no rest argument.
- @end deftp
- @deftp {Scheme Variable} <glil-kw-prelude> nreq nopt rest kw allow-other-keys? nlocs else-label
- A prologue for a function with keyword arguments. Like
- @code{<glil-opt-prelude>}, with the addition that @var{kw} is a list
- of keyword arguments, and @var{allow-other-keys?} is a flag indicating
- whether to allow unknown keys. @xref{Function Prologue Instructions,
- @code{bind-kwargs}}, for details on the format of @var{kw}.
- @end deftp
- @deftp {Scheme Variable} <glil-bind> . vars
- An advisory expression that notes a liveness extent for a set of
- variables. @var{vars} is a list of @code{(@var{name} @var{type}
- @var{index})}, where @var{type} should be either @code{argument},
- @code{local}, or @code{external}.
- @code{<glil-bind>} expressions end up being serialized as part of a
- program's metadata and do not form part of a program's code path.
- @end deftp
- @deftp {Scheme Variable} <glil-mv-bind> vars rest
- A multiple-value binding of the values on the stack to @var{vars}. Iff
- @var{rest} is true, the last element of @var{vars} will be treated as
- a rest argument.
- In addition to pushing a binding annotation on the stack, like
- @code{<glil-bind>}, an expression is emitted at compilation time to
- make sure that there are enough values available to bind. See the
- notes on @code{truncate-values} in @ref{Procedure Call and Return
- Instructions}, for more information.
- @end deftp
- @deftp {Scheme Variable} <glil-unbind>
- Closes the liveness extent of the most recently encountered
- @code{<glil-bind>} or @code{<glil-mv-bind>} expression. As GLIL
- expressions are compiled, a parallel stack of live bindings is
- maintained; this expression pops off the top element from that stack.
- Bindings are written into the program's metadata so that debuggers and
- other tools can determine the set of live local variables at a given
- offset within a VM program.
- @end deftp
- @deftp {Scheme Variable} <glil-source> loc
- Records source information for the preceding expression. @var{loc}
- should be an association list of containing @code{line} @code{column},
- and @code{filename} keys, e.g.@: as returned by
- @code{source-properties}.
- @end deftp
- @deftp {Scheme Variable} <glil-void>
- Pushes ``the unspecified value'' on the stack.
- @end deftp
- @deftp {Scheme Variable} <glil-const> obj
- Pushes a constant value onto the stack. @var{obj} must be a number,
- string, symbol, keyword, boolean, character, uniform array, the empty
- list, or a pair or vector of constants.
- @end deftp
- @deftp {Scheme Variable} <glil-lexical> local? boxed? op index
- Accesses a lexically bound variable. If the variable is not
- @var{local?} it is free. All variables may have @code{ref},
- @code{set}, and @code{bound?} as their @var{op}. Boxed variables may
- also have the @var{op}s @code{box}, @code{empty-box}, and @code{fix},
- which correspond in semantics to the VM instructions @code{box},
- @code{empty-box}, and @code{fix-closure}. @xref{Stack Layout}, for
- more information.
- @end deftp
- @deftp {Scheme Variable} <glil-toplevel> op name
- Accesses a toplevel variable. @var{op} may be @code{ref}, @code{set},
- or @code{define}.
- @end deftp
- @deftp {Scheme Variable} <glil-module> op mod name public?
- Accesses a variable within a specific module. See Tree-IL's
- @code{<module-ref>}, for more information.
- @end deftp
- @deftp {Scheme Variable} <glil-label> label
- Creates a new label. @var{label} can be any Scheme value, and should
- be unique.
- @end deftp
- @deftp {Scheme Variable} <glil-branch> inst label
- Branch to a label. @var{label} should be a @code{<ghil-label>}.
- @code{inst} is a branching instruction: @code{br-if}, @code{br}, etc.
- @end deftp
- @deftp {Scheme Variable} <glil-call> inst nargs
- This expression is probably misnamed, as it does not correspond to
- function calls. @code{<glil-call>} invokes the VM instruction named
- @var{inst}, noting that it is called with @var{nargs} stack arguments.
- The arguments should be pushed on the stack already. What happens to
- the stack afterwards depends on the instruction.
- @end deftp
- @deftp {Scheme Variable} <glil-mv-call> nargs ra
- Performs a multiple-value call. @var{ra} is a @code{<glil-label>}
- corresponding to the multiple-value return address for the call. See
- the notes on @code{mv-call} in @ref{Procedure Call and Return
- Instructions}, for more information.
- @end deftp
- @deftp {Scheme Variable} <glil-prompt> label escape-only?
- Push a dynamic prompt into the stack, with a handler at @var{label}.
- @var{escape-only?} is a flag that is propagated to the prompt,
- allowing an abort to avoid capturing a continuation in some cases.
- @xref{Prompts}, for more information.
- @end deftp
- Users may enter in GLIL at the REPL as well, though there is a bit
- more bookkeeping to do:
- @example
- scheme@@(guile-user)> ,language glil
- Happy hacking with Guile Lowlevel Intermediate Language (GLIL)!
- To switch back, type `,L scheme'.
- glil@@(guile-user)> (program () (std-prelude 0 0 #f)
- (const 3) (call return 1))
- @result{} 3
- @end example
- Just as in all of Guile's compilers, an environment is passed to the
- GLIL-to-object code compiler, and one is returned as well, along with
- the object code.
- @node Assembly
- @subsection Assembly
- Assembly is an S-expression-based, human-readable representation of
- the actual bytecodes that will be emitted for the VM. As such, it is a
- useful intermediate language both for compilation and for
- decompilation.
- Besides the fact that it is not a record-based language, assembly
- differs from GLIL in four main ways:
- @itemize
- @item Labels have been resolved to byte offsets in the program.
- @item Constants inside procedures have either been expressed as inline
- instructions or cached in object arrays.
- @item Procedures with metadata (source location information, liveness
- extents, procedure names, generic properties, etc) have had their
- metadata serialized out to thunks.
- @item All expressions correspond directly to VM instructions -- i.e.,
- there is no @code{<glil-lexical>} which can be a ref or a set.
- @end itemize
- Assembly is isomorphic to the bytecode that it compiles to. You can
- compile to bytecode, then decompile back to assembly, and you have the
- same assembly code.
- The general form of assembly instructions is the following:
- @lisp
- (@var{inst} @var{arg} ...)
- @end lisp
- The @var{inst} names a VM instruction, and its @var{arg}s will be
- embedded in the instruction stream. The easiest way to see assembly is
- to play around with it at the REPL, as can be seen in this annotated
- example:
- @example
- scheme@@(guile-user)> ,pp (compile '(+ 32 10) #:to 'assembly)
- (load-program
- ((:LCASE16 . 2)) ; Labels, unused in this case.
- 8 ; Length of the thunk that was compiled.
- (load-program ; Metadata thunk.
- ()
- 17
- #f ; No metadata thunk for the metadata thunk.
- (make-eol)
- (make-eol)
- (make-int8 2) ; Liveness extents, source info, and arities,
- (make-int8 8) ; in a format that Guile knows how to parse.
- (make-int8:0)
- (list 0 3)
- (list 0 1)
- (list 0 3)
- (return))
- (assert-nargs-ee/locals 0) ; Prologue.
- (make-int8 32) ; Actual code starts here.
- (make-int8 10)
- (add)
- (return))
- @end example
- Of course you can switch the REPL to assembly and enter in assembly
- S-expressions directly, like with other languages, though it is more
- difficult, given that the length fields have to be correct.
- @node Bytecode and Objcode
- @subsection Bytecode and Objcode
- Finally, the raw bytes. There are actually two different ``languages''
- here, corresponding to two different ways to represent the bytes.
- ``Bytecode'' represents code as uniform byte vectors, useful for
- structuring and destructuring code on the Scheme level. Bytecode is
- the next step down from assembly:
- @example
- scheme@@(guile-user)> (compile '(+ 32 10) #:to 'bytecode)
- @result{} #vu8(8 0 0 0 25 0 0 0 ; Header.
- 95 0 ; Prologue.
- 10 32 10 10 148 66 17 ; Actual code.
- 0 0 0 0 0 0 0 9 ; Metadata thunk.
- 9 10 2 10 8 11 18 0 3 18 0 1 18 0 3 66)
- @end example
- ``Objcode'' is bytecode, but mapped directly to a C structure,
- @code{struct scm_objcode}:
- @example
- struct scm_objcode @{
- scm_t_uint32 len;
- scm_t_uint32 metalen;
- scm_t_uint8 base[0];
- @};
- @end example
- As one might imagine, objcode imposes a minimum length on the
- bytecode. Also, the @code{len} and @code{metalen} fields are in native
- endianness, which makes objcode (and bytecode) system-dependent.
- Objcode also has a couple of important efficiency hacks. First,
- objcode may be mapped directly from disk, allowing compiled code to be
- loaded quickly, often from the system's disk cache, and shared among
- multiple processes. Secondly, objcode may be embedded in other
- objcode, allowing procedures to have the text of other procedures
- inlined into their bodies, without the need for separate allocation of
- the code. Of course, the objcode object itself does need to be
- allocated.
- Procedures related to objcode are defined in the @code{(system vm
- objcode)} module.
- @deffn {Scheme Procedure} objcode? obj
- @deffnx {C Function} scm_objcode_p (obj)
- Returns @code{#f} iff @var{obj} is object code, @code{#f} otherwise.
- @end deffn
- @deffn {Scheme Procedure} bytecode->objcode bytecode
- @deffnx {C Function} scm_bytecode_to_objcode (bytecode)
- Makes a bytecode object from @var{bytecode}, which should be a
- bytevector. @xref{Bytevectors}.
- @end deffn
- @deffn {Scheme Variable} load-objcode file
- @deffnx {C Function} scm_load_objcode (file)
- Load object code from a file named @var{file}. The file will be mapped
- into memory via @code{mmap}, so this is a very fast operation.
- On disk, object code has an sixteen-byte cookie prepended to it, to
- prevent accidental loading of arbitrary garbage.
- @end deffn
- @deffn {Scheme Variable} write-objcode objcode file
- @deffnx {C Function} scm_write_objcode (objcode)
- Write object code out to a file, prepending the sixteen-byte cookie.
- @end deffn
- @deffn {Scheme Variable} objcode->bytecode objcode
- @deffnx {C Function} scm_objcode_to_bytecode (objcode)
- Copy object code out to a bytevector for analysis by Scheme.
- @end deffn
- The following procedure is actually in @code{(system vm program)}, but
- we'll mention it here:
- @deffn {Scheme Variable} make-program objcode objtable [free-vars=#f]
- @deffnx {C Function} scm_make_program (objcode, objtable, free_vars)
- Load up object code into a Scheme program. The resulting program will
- have @var{objtable} as its object table, which should be a vector or
- @code{#f}, and will capture the free variables from @var{free-vars}.
- @end deffn
- Object code from a file may be disassembled at the REPL via the
- meta-command @code{,disassemble-file}, abbreviated as @code{,xx}.
- Programs may be disassembled via @code{,disassemble}, abbreviated as
- @code{,x}.
- Compiling object code to the fake language, @code{value}, is performed
- via loading objcode into a program, then executing that thunk with
- respect to the compilation environment. Normally the environment
- propagates through the compiler transparently, but users may specify
- the compilation environment manually as well, as a module.
- @node Writing New High-Level Languages
- @subsection Writing New High-Level Languages
- In order to integrate a new language @var{lang} into Guile's compiler
- system, one has to create the module @code{(language @var{lang} spec)}
- containing the language definition and referencing the parser,
- compiler and other routines processing it. The module hierarchy in
- @code{(language brainfuck)} defines a very basic Brainfuck
- implementation meant to serve as easy-to-understand example on how to
- do this. See for instance @url{http://en.wikipedia.org/wiki/Brainfuck}
- for more information about the Brainfuck language itself.
- @node Extending the Compiler
- @subsection Extending the Compiler
- At this point we take a detour from the impersonal tone of the rest of
- the manual. Admit it: if you've read this far into the compiler
- internals manual, you are a junkie. Perhaps a course at your university
- left you unsated, or perhaps you've always harbored a desire to hack the
- holy of computer science holies: a compiler. Well you're in good
- company, and in a good position. Guile's compiler needs your help.
- There are many possible avenues for improving Guile's compiler.
- Probably the most important improvement, speed-wise, will be some form
- of native compilation, both just-in-time and ahead-of-time. This could
- be done in many ways. Probably the easiest strategy would be to extend
- the compiled procedure structure to include a pointer to a native code
- vector, and compile from bytecode to native code at run-time after a
- procedure is called a certain number of times.
- The name of the game is a profiling-based harvest of the low-hanging
- fruit, running programs of interest under a system-level profiler and
- determining which improvements would give the most bang for the buck.
- It's really getting to the point though that native compilation is the
- next step.
- The compiler also needs help at the top end, enhancing the Scheme that
- it knows to also understand R6RS, and adding new high-level compilers.
- We have JavaScript and Emacs Lisp mostly complete, but they could use
- some love; Lua would be nice as well, but whatever language it is
- that strikes your fancy would be welcome too.
- Compilers are for hacking, not for admiring or for complaining about.
- Get to it!
|