talk.tex 15 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471
  1. % Copyright (C) 2017 Koz Ross <koz.ross@retro-freedom.nz>
  2. % This work is licensed under the Creative Commons Attribution-ShareAlike 4.0
  3. % International License. To view a copy of this license, visit
  4. % http://creativecommons.org/licenses/by-sa/4.0/ or send a letter to Creative
  5. % Commons, PO Box 1866, Mountain View, CA 94042, USA.
  6. \documentclass{beamer}
  7. \usepackage[utf8]{inputenc}
  8. \usecolortheme{seahorse}
  9. \usefonttheme{professionalfonts}
  10. \usepackage{fontspec}
  11. \setmainfont{DejaVu Sans}
  12. \setbeamertemplate{navigation symbols}{}
  13. \usepackage{caption}
  14. \usepackage[font=itshape, begintext=``, endtext='']{quoting}
  15. \usepackage{fancyvrb}
  16. \usepackage{color}
  17. \usepackage{booktabs}
  18. \usepackage[noend]{algpseudocode}
  19. \title{CONS cells}
  20. \subtitle{Or: start simple!}
  21. \titlegraphic{\includegraphics[scale=0.8]{img/logo.pdf}}
  22. \author{Glen Osborne}
  23. \date{3rd August, 2017}
  24. \begin{document}
  25. \begin{frame}[c]
  26. \titlepage{}
  27. \end{frame}
  28. \begin{frame}[c]
  29. \frametitle{Outline}
  30. \tableofcontents
  31. \end{frame}
  32. \section{Introduction}
  33. \begin{frame}[c]
  34. \frametitle{All those data structures\ldots}
  35. We have an {\em embarassment\/} of different data structures available to
  36. us:\pause{}
  37. \begin{itemize}
  38. \item Lists\pause{}
  39. \item Dictionaries\pause{}
  40. \item Matrices (with arbitrary dimensions)\pause{}
  41. \item Trees\pause{}
  42. \item Graphs\pause{}
  43. \item And so many more!\pause{}
  44. \end{itemize}
  45. These can be {\em very\/} complicated to understand and implement!\pause{}
  46. They also often `lock you in' to a fixed set of operations --- and whether
  47. these are the ones you need can be hard to determine when initially solving a
  48. problem.
  49. \end{frame}
  50. \begin{frame}[c, fragile]
  51. \frametitle{A `novel' proposition}
  52. \begin{center}
  53. \includegraphics[scale=0.5]{img/morpheus.pdf}
  54. \end{center}\pause{}
  55. Yes, really!\pause{} Not a new idea by any means --- first put forward in the
  56. 19{\em 50\/}s.\pause{} Let's see how this is possible\ldots
  57. \end{frame}
  58. \section{The CONS cell}
  59. \begin{frame}[c]
  60. \frametitle{Some definitions}
  61. \pause{}
  62. \begin{definition}
  63. A {\em machine word\/} is a fixed-size chunk of memory.\pause{}
  64. \end{definition}
  65. \medskip
  66. What this `fixed size' is doesn't matter --- most modern computers have a 32 or
  67. 64-bit machine word.\pause{}
  68. \medskip
  69. \begin{definition}
  70. A piece of data ({\em datum\/}) is {\em primitive\/} if it fits into one machine
  71. word. Otherwise, it is {\em complex}.\pause{}
  72. \end{definition}
  73. \medskip
  74. \begin{definition}
  75. A {\em reference\/} is a primitive datum, consisting of a memory address.
  76. \end{definition}\pause{}
  77. \medskip
  78. We can imagine a reference as `pointing' to another piece of data in
  79. memory.\pause{} Besides references, we also have integers and floats as
  80. primitive data (and possibly others as well).
  81. \end{frame}
  82. \begin{frame}[c, fragile]
  83. \frametitle{Gettin' visual with it}
  84. \begin{overprint}
  85. \onslide<1>\centerline{\includegraphics[scale=0.3]{img/visual-basic.pdf}}%
  86. \onslide<2>\centerline{\includegraphics[scale=0.3]{img/visual-box.pdf}}%
  87. \onslide<3>\centerline{\includegraphics[scale=0.3]{img/visual-data.pdf}}%
  88. \onslide<4>\centerline{\includegraphics[scale=0.3]{img/visual-reference.pdf}}%
  89. \end{overprint}
  90. \vspace{1cm}
  91. \begin{overprint}
  92. \onslide<1>We will use drawings like this one to represent data in memory.
  93. \onslide<2>We represent machine words using boxes. Adjacent boxes are
  94. adjacent machine words in memory (these two are not).
  95. \onslide<3>Non-reference primitive data will be drawn inside the box representing its
  96. machine word.
  97. \onslide<4>References will be drawn as arrows to the data they `point' to
  98. (this one isn't pointing anywhere useful).
  99. \end{overprint}
  100. \end{frame}
  101. \begin{frame}[c, fragile]
  102. \frametitle{The CONS cell}
  103. \centerline{\includegraphics[scale=0.3]{img/cons-cell.pdf}}\pause{}
  104. \medskip
  105. \begin{itemize}
  106. \item Consists of two adjacent machine words\pause{}
  107. \item A CONS cell's first word is called its {\em CAR\/} (pronounced
  108. like `motor vehicle')\pause{}
  109. \item A CONS cell's second word is called its {\em CDR\/} (pronounced
  110. like `COULD-ruh')\pause{}
  111. \item Either the CAR or the CDR can store reference, or primitive
  112. non-reference, data as needed\pause{}
  113. \item Can implement {\em every single one\/} of the data structures
  114. mentioned at the start of this talk!
  115. \end{itemize}
  116. \end{frame}
  117. \begin{frame}[c, fragile]
  118. \frametitle{What you're probably thinking right now}
  119. \only<1>{\centerline{\includegraphics[scale=0.15]{img/doge.pdf}}}
  120. \only<2>{\centerline{\includegraphics[scale=0.14]{img/ducreux.pdf}}}
  121. \end{frame}
  122. \section{Building up other structures}
  123. \begin{frame}[c]
  124. \frametitle{Lists}
  125. \begin{definition}
  126. {\em NIL\/} refers to a unique machine word representing `no useful data'.
  127. \end{definition}\pause{}
  128. \medskip
  129. Initially, we'll assume that lists only store primitive data. We'll later show
  130. how to overcome this.\pause{}
  131. \medskip
  132. The intuition here is based on the fact that a CONS cell looks a lot like a
  133. singly-linked list node.\pause{} More precisely, we can store each node's {\em
  134. data\/} in the CAR, and {\em `next' reference\/} in the CDR.\pause{} For
  135. the last node, we can have its `next' reference `point to' NIL to mark the
  136. end of the list.\pause{}
  137. \medskip
  138. Using this approach, we can define all the usual list operations in a
  139. straightforward way.
  140. \end{frame}
  141. \begin{frame}[c, fragile]
  142. \frametitle{Visualizing CONS cell lists}
  143. \begin{overprint}
  144. \onslide<1>\centerline{\includegraphics[scale=0.2]{img/list-basic.pdf}}%
  145. \onslide<2>\centerline{\includegraphics[scale=0.2]{img/list-conses.pdf}}%
  146. \onslide<3>\centerline{\includegraphics[scale=0.2]{img/list-cars.pdf}}%
  147. \onslide<4>\centerline{\includegraphics[scale=0.2]{img/list-cdrs.pdf}}%
  148. \end{overprint}
  149. \vspace{2cm}
  150. \onslide<1->This is the in-memory representation of the list {\tt [10, 20,
  151. 30]} using CONS cells. \onslide<2->Each CONS cell is one list node.
  152. \onslide<3->CARs contain the data. \onslide<4->CDRs contain references to
  153. the next list node, or NIL for the end.
  154. \end{frame}
  155. \begin{frame}[c]
  156. \frametitle{Lists with complex data}
  157. \begin{itemize}
  158. \item Most interesting data won't fit into a single machine word.\pause{}
  159. \item To have lists storing complex data, we have the CARS of our list node
  160. CONS cells be {\em references\/} to the data elsewhere in memory.\pause{}
  161. \item We can even mix-and-match the two!
  162. \end{itemize}
  163. \end{frame}
  164. \begin{frame}[c, fragile]
  165. \frametitle{Visualizing lists with complex data}
  166. \begin{overprint}
  167. \onslide<1>\centerline{\includegraphics[scale=0.2]{img/list-complex.pdf}}%
  168. \onslide<2>\centerline{\includegraphics[scale=0.2]{img/list-complex-prim.pdf}}%
  169. \onslide<3>\centerline{\includegraphics[scale=0.2]{img/list-complex-comp.pdf}}%
  170. \end{overprint}
  171. \vspace{1cm}
  172. \onslide<1->This is the in-memory representation of the list {\tt ["My", 10,
  173. "sons"]} using CONS cells. \onslide<2->Primitive data is stored in the CAR of
  174. the relevant cell as before. \onslide<3->Complex data is stored elsewhere, and
  175. the CDRs of the relevant cells store references to that data instead.
  176. \end{frame}
  177. \begin{frame}[c]
  178. \frametitle{Dictionaries}\pause{}
  179. \begin{definition}
  180. A {\em dictionary\/} stores a set of key-value pairs, such that keys in the
  181. dictionary are unique.
  182. \end{definition}\pause{}
  183. \medskip
  184. We can represent a key-value pair by using a CONS cell where both the CAR and
  185. CDR store data (or a reference to data).\pause{} We can then create a list of
  186. these as complex data.\pause{} To ensure that we don't end up with key
  187. duplication, we do two things:\pause{}
  188. \medskip
  189. \begin{itemize}
  190. \item Ensure that queries always start from the first CONS cell in the
  191. list\pause{}
  192. \item Put inserts at the front of the list (so they'll be found before any
  193. previous entries with the same key)
  194. \end{itemize}
  195. \end{frame}
  196. \begin{frame}[c, fragile]
  197. \frametitle{Example of CONS cell-based dictionary}
  198. \begin{overprint}
  199. \onslide<1>\centerline{\includegraphics[scale=0.2]{img/dictionary-basic.pdf}}%
  200. \onslide<2>\centerline{\includegraphics[scale=0.2]{img/dictionary-spine.pdf}}%
  201. \end{overprint}
  202. \vspace{0.5cm}
  203. \onslide<1->This is the in-memory representation of the dictionary $\{(10,
  204. \text{"foo"}), (\text{"bar"}, 20), (\text{"baz"},\text{"quux"})\}$.
  205. \onslide<2->The entries form a list of complex data (the {\em spine\/} of
  206. the dictionary).
  207. \end{frame}
  208. \begin{frame}[c]
  209. \frametitle{A slight intermission}
  210. Given that we've now described how to define dictionaries using CONS cells, we
  211. have the ability to define any other data structure (as explained in the
  212. previous talk).\pause{} However, we can implement some structures a bit better
  213. than this using CONS cells, including:\pause{}
  214. \medskip
  215. \begin{itemize}
  216. \item Matrices\pause{}
  217. \item Trees\pause{}
  218. \item Graphs\pause{}
  219. \end{itemize}
  220. \medskip
  221. To save time, we will only draw diagrams where
  222. useful.\pause{} If we refer to lists or dictionaries anywhere, assume we mean
  223. ones based on CONS cells as described previously.
  224. \end{frame}
  225. \begin{frame}[c]
  226. \frametitle{Matrices}
  227. \begin{definition}
  228. The {\em rank\/} of a matrix is the number of dimensions it has.
  229. \end{definition}\pause{}
  230. \medskip
  231. A rank $1$ `matrix' is just a list.\pause{} A rank
  232. $2$ matrix is a list of lists, a rank $3$ matrix is a list of rank $2$
  233. matrices, and so on, and so forth.\pause{}
  234. \medskip
  235. We can define a rank $n$ matrix (for $n > 1$) as a spine of rank $n -
  236. 1$ matrices.
  237. \end{frame}
  238. \begin{frame}[c]
  239. \frametitle{Matrix example}
  240. Consider the following rank $2$ matrix:\pause{}
  241. \[
  242. \begin{bmatrix}
  243. 1 & 2 & 3 & 4\\
  244. 5 & 6 & 7 & 8\\
  245. 9 & 10 & 11 & 12\\
  246. \end{bmatrix}
  247. \]
  248. \medskip
  249. We can represent it in two ways.\pause{} We can have the spine go along the columns:
  250. \medskip
  251. {\tt [[1, 5, 9], [2, 6, 10], [3, 7, 11], [4, 8, 12]]}\pause{}
  252. \medskip
  253. \noindent{}or along the rows:
  254. \medskip
  255. {\tt [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]}
  256. \end{frame}
  257. \begin{frame}[c]
  258. \frametitle{Trees}
  259. We will begin with a very simple kind of tree: a {\em binary leaf
  260. tree}.\pause{} More precisely:
  261. \medskip
  262. \begin{definition}
  263. A {\em binary leaf tree\/} has data only in its leaf nodes, and its internal
  264. nodes have a maximum of two children.\pause{}
  265. \end{definition}
  266. \medskip
  267. We can represent a binary leaf tree using CONS cells as follows:\pause{}
  268. \medskip
  269. \begin{itemize}
  270. \item An internal node is represented by a CONS cell whose CAR stores a
  271. reference to its left child (or NIL if none) and whose CDR stores a
  272. reference to its right child (or NIL if none)\pause{}
  273. \item A leaf node is represented by a CONS cell storing the same data (or
  274. reference) in both its CAR and CDR
  275. \end{itemize}
  276. \end{frame}
  277. \begin{frame}[c, fragile]
  278. \frametitle{Example binary leaf tree and its CONS cell representation}
  279. \begin{columns}
  280. \column{.5\textwidth}
  281. \centerline{\includegraphics[scale=0.6]{img/leaf-tree.pdf}}%
  282. \column{.5\textwidth}
  283. \centerline{\includegraphics[scale=0.15]{img/leaf-tree-cons.pdf}}%
  284. \end{columns}
  285. \end{frame}
  286. \begin{frame}[c]
  287. \frametitle{Rose trees}
  288. If we need trees of arbitrary branching factor (number of children), or where
  289. data has to be stored in the leaves as well, we can instead use a {\em rose
  290. tree\/} representation:\pause{}
  291. \medskip
  292. \begin{itemize}
  293. \item A leaf node is represented the same way as for binary leaf
  294. trees\pause{}
  295. \item An internal node is represented by a CONS cell whose CAR stores the
  296. data for that node (or a reference to the data), and whose CDR stores a
  297. list of references to its children in left-to-right order\pause{}
  298. \end{itemize}
  299. \medskip
  300. We can also include a parent reference in the list of children (typically as
  301. the first element) if we want.
  302. \end{frame}
  303. \begin{frame}[c]
  304. \frametitle{Graphs}
  305. Typically, graphs are stored as an {\em adjacency matrix\/} or an {\em
  306. adjacency list}.\pause{} The adjacency matrix is easy to represent using a
  307. rank $2$ matrix as we defined before.\pause{} The adjacency list is a bit
  308. trickier:\pause{}
  309. \medskip
  310. \begin{itemize}
  311. \item An adjacency list is represented by an {\em adjacency spine\/}\pause{}
  312. \item An adjacency spine is made up of {\em vertex cells\/}\pause{}
  313. \item A vertex cell is a CONS cell whose CAR stores a reference to the next
  314. vertex cell in the adjacency spine, and whose CDR stores a {\em vertex
  315. list}\pause{}
  316. \item A vertex list is a CONS cell whose CAR stores the vertex's label (or a
  317. reference to it), and whose CDR stores a reference to a {\em child
  318. list\/}\pause{}
  319. \item A child list's data are references to elements of the adjacency spine,
  320. corresponding to the neighbours of that vertex
  321. \end{itemize}
  322. \end{frame}
  323. \section{Limitations and uses}
  324. \begin{frame}[c]
  325. \frametitle{The bad news}
  326. All of these are really neat, but unfortunately, these implementations can be
  327. slow, especially when we have a lot of data, because:\pause{}
  328. \medskip
  329. \begin{itemize}
  330. \item Many operations involve scanning `chains' of CONS cells, which give us
  331. linear (or worse!) time complexity\pause{}
  332. \item CONS cells are not cache-friendly; typically, we'll only cache the
  333. cell itself, and {\em not\/} any data its CAR or CDR might hold references
  334. to\pause{}
  335. \item This means that we could potentially be getting cache misses {\em
  336. every time\/} we follow a stored reference!\pause{}
  337. \item There are ways to make CONS cell-based structures more cache-friendly,
  338. but they're usually more trouble than they're worth
  339. \end{itemize}
  340. \end{frame}
  341. \begin{frame}[c]
  342. \frametitle{So why use them?}
  343. \begin{itemize}
  344. \item When we initially set out to solve a problem, we don't often know what
  345. operations we need, how much data we'll have, or which operations need to
  346. be fast\pause{}
  347. \item CONS cell-based structures can be built and {\em extended\/}
  348. quickly\pause{}
  349. \item This makes them {\em ideal\/} for prototyping\pause{}
  350. \item There is no {\em fast\/} --- only fast {\em enough\/}; you might find CONS cells are
  351. good enough for your purposes!\pause{}
  352. \item Once you know exactly what you need, you can always replace some (or
  353. all) of the structure with something more efficient\pause{}
  354. \end{itemize}
  355. \medskip
  356. In short, CONS cell-based structures can help you {\em learn your tradeoffs\/}
  357. quickly and easily.
  358. \end{frame}
  359. \section{Questions}
  360. \begin{frame}[fragile, c]
  361. \frametitle{Questions?}
  362. \begin{center}
  363. \includegraphics[scale=0.27]{img/questions.pdf}
  364. \end{center}
  365. \end{frame}
  366. \end{document}