inbox.mdwn 20 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366
  1. - Bring my old entries from the wordpress export, and use the meta
  2. plugin to set creation and modification dates manually to make the
  3. blog and RSS treat them correctly as written in the correct real
  4. times (i.e. 2013)
  5. - go over, has some interesting material: <https://antportal.com/wiki/>
  6. - Type into the wiki (or at least take) the "skapa model" page from algorithms
  7. course notebook
  8. - Idea: Instead of letting each app decide which Saugus backend to use (file,
  9. desktop database, etc.) I can let apps read this setting from a
  10. desktop-central place and make a "data sources" app which centrally configures
  11. sources per-app. It can also work using dconf and/or xfce conf etc. It gives
  12. the control to the *user* instead of setting this in source code. It's both
  13. good for the development and testing etc., and allows e.g. privacy apps,
  14. Tor/I2P browser etc. to use a separate encrypted file/DB and the user can
  15. fully control it. Even have several separate sources per app!
  16. - Tosaf: Each hook object should have a folder where it searches for plugins.
  17. libdir can maybe be set globally for all hooks with a single command. But
  18. Plugin must be generic and and get its info from its hook. so/dll suffix can
  19. be left for Plugin/Library to handle
  20. - Plan the framework user point-of-view: One of the problems with existing
  21. semantic desktop projects is that the entry level is very high. Another
  22. problem is that the technology is very far away from everything people
  23. actually use: GNU/Linux desktop technologies and commonly used tools and
  24. languages and conventions. Also, it's been all developed in a closed model by
  25. academy people, so the community has no knowledge. Or by companies. There
  26. isn't involvement of non-experts.
  27. Proposal: Activity Interface
  28. ============================
  29. In order to help people find information faster, the website of *Partager* can
  30. arrange its content, or at least link to it from the main page, by *task*. In
  31. other words, it allows you to choose what you want to do and directs you from
  32. there. Possible actions:
  33. * Study an ontology
  34. * Extend an ontology
  35. * Learn about the various ontologies
  36. * Use an ontology in a user application
  37. * Create a new ontology
  38. Research Discussion
  39. ===================
  40. ******** Namespace
  41. Namespaces are used as groups of names, in order to give the names additional information describing where the names come from or which body defines and maintains them. This way name ambiguities are resolved when a name is used for different things.
  42. Therefore namespaces don't have any meaning semantically. They just group names defined by some body or as a unified group, in order to keep the names from colliding with names from other groups. A name being in a given namespace doesn't provide any semantic information about that name.
  43. ******** Ontology
  44. An ontology describes concepts and relations within a given domain, using a common volabulary. For example, an RDFS ontology may describe elements in the field of computer programming, and it then uses elements of RDFS to describe them. In other words the ontology is "written in RDFS".
  45. The NRL ontology, part of NEPOMUK, has definitions which complete things missing from RDF/S like nrl:Graph, nrl:Ontology and many other things. I can have similar definitions in Idan (possible by using NRL, even without creating a special parallel for Idan).
  46. **** Equivelence and Versioning
  47. There should be a way in Idan to say that two resources refer to the same thing. And it should be possible to say that one ontology is a subontology of the other, and the same for graphs. Let's see if NRL provides definitions for that.
  48. NRL seems not to, but OWL does have a class equivalence property. However, equivalence can also be expressed by saying A and B are subclasses of each other.
  49. I don't see anything about subontologies, but maybe a deeper search would find something. Anyway I could easily define that by myself. It's good for when there's a new version or people want to share and merge their works.
  50. Research Tracking
  51. =================
  52. ******** Instructions
  53. Read about Gellish and see if there are good things I can use, same for RDF. For example giving uid to statements, allowing statements to be questions/proposals, not just facts. Use measurement units for values. But try to add them in an extensible way, i.e. try to add them outside the core language to make is small and simple. *****
  54. TODO HERE: Read about how Gellish, RDF, OWL and Tracker treat namespaces and ontologies: What they are, why they exist, how they work, how analogous ontologies are connected/merged. Then decide and explain here how namespaces and ontologies work. Focus on decentralization of ontologies and namespaces, and distribution of ontologies, namespaces and data. Reading in the W3C standards may be a good way to understand how and why things were planned. Also JSON-LD is interesting.
  55. ******** Topics
  56. **** Concepts
  57. Namespace
  58. http://en.wikipedia.org/wiki/Namespace
  59. http://en.wikipedia.org/wiki/Xml_namespace
  60. Ontology
  61. http://en.wikipedia.org/wiki/Ontology_(information_science)
  62. **** Languages
  63. RDF
  64. http://en.wikipedia.org/wiki/Resource_Description_Framework
  65. http://www.w3.org/TR/rdf-primer/
  66. http://www.w3schools.com/webservices/ws_rdf_intro.asp
  67. RDFS
  68. http://en.wikipedia.org/wiki/RDFS
  69. http://www.w3.org/TR/rdf-schema/
  70. http://www.w3schools.com/webservices/ws_rdf_schema.asp
  71. OWL
  72. http://en.wikipedia.org/wiki/Web_Ontology_Language
  73. http://www.w3.org/TR/owl2-overview/
  74. YAML
  75. JSON
  76. Gellish
  77. http://en.wikipedia.org/wiki/Gellish
  78. http://sourceforge.net/apps/trac/gellish/
  79. **** Notations
  80. Turtle
  81. JSON-LD
  82. http://json-ld.org/index.html#
  83. http://www.w3.org/TR/json-ld/
  84. **** Software
  85. Tracker
  86. Strigi
  87. Beagle
  88. Old TODO
  89. ===============
  90. [ ] = TODO
  91. [%] = WIP
  92. [X] = DONE
  93. [ ] 21nov2013 Start designing basics of Idan using YAML
  94. [ ] 21nov2013 Continue API development in C++
  95. [X] 21nov2013 Examine Gellish's extra fields for triples, e.g. fact/question/opinion, see how I can have them in Idan
  96. [ ] 21nov2013 Finalize the architecture basics, to make sure the API matches requirements
  97. [ ] 21nov2013 After architecture basics are final more or less, list components:
  98. [ ] 21nov2013 API to write graphs to file and read from file
  99. [ ] 21nov2013 API to efficiently update file with changes, e.g. see how Gedit and others update files: rewrite all or just changes
  100. [ ] 21nov2013 Decide where SPARQL / similar language can be used - on top of repos or just for database repo
  101. [ ] 21nov2013 CLI tools
  102. [ ] 21nov2013 GUI tools
  103. [%] 21nov2013 Go over CherryTree document, see what I can use, maybe make a file here to track migration of data from there to here
  104. [ ] 21nov2013 Create an automated task solution which doesn't require putting all tasks in one file
  105. [ ] 21nov2013 Create a script which extracts tasks from files in the progress folder
  106. [ ] 21nov2013 Learn bash scripting
  107. [ ] 21nov2013 Learn shell programming
  108. [ ] 21nov2013 Learn coreutils
  109. [ ] 21nov2013 Write makeclass in Bash for practice
  110. [ ] 21nov2013 Make the tool read patterns from a file, so that new ways to mark TODOs in files can be easily added
  111. [ ] 21nov2013 Split the research files to separate research for each component and add them to respective folders
  112. [ ] 21nov2013 Find/write scripts which generate table-of-contents from titles in my files, so it becomes possible for GUI to use anchors to run a GUI table-of-contents and help the user easily manage long files. If I make the TOC manually in plain text, it would become tedious and difficult to manage the TOCs over time, and they're not interactice anyway. So make scripts for that. The GUI would use them, but there's probably no justification to use C/C++ directly. A script in Bash/Perl/Python will probably do.
  113. [ ] 21nov2013 IDEA: A tool which lists the "chain of documents" I work with and the current one and the status and what to do and what to do after that, etc. to help me keep context and remember where I am, both in real time and when continuing where I stopped last time
  114. Git to ChangeLog, and Other Things
  115. ==================================
  116. First Application
  117. -----------------
  118. My next tasks are writing parsers and serializers for *Idan* and for *Kort*, and formalizing the Kiwi ontologies. This leads me to the first Partager application. It must exist in order for problems and needs to show up, and to have a practical example of things in real use.
  119. The idea is to have one desktop database accessed via D-Bus (or k-dbus) which GUI applications can then use for anything they wish. It will be an experiment. The actual first GUI application I'd like to have is a file/info browser which operates over all the visible files in the home folder, or over a subset of them. Later we'll see exactly what this application will do and how. It's also possible to use a simple file for this, as long as it's the only application using this Repository.
  120. Another option is to have an app which simply uses a small file, e.g. make a movie collection manager like the one I tried, but with a full ontology and expansion of the ontology would be supported transparently thanks to Smaoin. It's a small simple app, easier than a whole desktop info manager.
  121. Idan Syntax
  122. ---------------
  123. In order to write a parser/serializer for Idan, I need to formalize the syntax. Here it comes, more or less. I'm assuming UTF-8 characters as atoms, not ASCII. Also, I'm starting by using just the Smaoin Idan files as reference, and later I'll go over all the plans and the i18n file in rdd-wiki and make sure I didn't forget anything.
  124. This brings a question: Does a comment have to be on a separate line, or can it be at the end of a line? I'll need to examine all the docs to answer this well. Let's start with just a simple subset and expand gradually.
  125. For now I'm omitting the whitespace between components, because it makes BNF ugly. But basically the idea is that all things that aren't required to be attached can have any whitespace between then: blank lines, space, tab and so on. For the full list see here: <https://en.wikipedia.org/wiki/Whitespace_character>.
  126. <document> => <header> <content>
  127. <header> => <header-lang>? <header-declaration>*
  128. <header-lang> => <header-lang-prefix> <header-lang-identifier>
  129. <header-lang-identifier> => <identifier>
  130. <header-declaration> => <header-declaration-prefix> <header-declaration-type> <header-declaration-value>+
  131. <header-declaration-type> => <identifier>
  132. <header-declaration-value> => <identifier>
  133. <content> => <block>*
  134. <block> => <subject> <description>
  135. <subject> => <resource> | <reference> | <subject-placeholder>
  136. <resource> => <resource-open> <uid> <resource-close>
  137. <reference> => TODOOOO
  138. <subject-placeholder> => <resource-open> <subject-placeholder-char> <resource-close>
  139. <description> => (<decription-item> <description-separator>)* <decription-item> <description-suffix>
  140. <decription-item> => <predicate> <object-list>
  141. <predicate> => <resource> | <reference>
  142. <object-list> => <object> (<object-separator> <object>)*
  143. <object> => TODOOOO
  144. <identifier> => <identifier-char>+
  145. <uid> => ((<identifier-char> - <resource-close>) | (<escape-char> <resource-close>))+
  146. <header-lang-prefix> => '@@'
  147. <header-declaration-prefix> => '@'
  148. <resource-open> => '<'
  149. <resource-close> => '>'
  150. <subject-placeholder-char> => '$'
  151. <identifier-char> => any non-whitespace character
  152. <escape-char> => '\'
  153. <description-separator> => ';'
  154. <description-suffix> => '.'
  155. <object-separator> => ','
  156. Notes:
  157. ### Values
  158. In Kort, all values take the form "value"@@type. But here I want to have more flexibility and convenience, since it's going to be a high-level language. I want to have the following features:
  159. - General purpose syntax so that newly added types can always work
  160. - Short syntax for existing types
  161. - Strings have different syntax for with-escapes and without-escapes
  162. - Multiline strings work like in the Smaoin Idan file
  163. Here are suggestions:
  164. - Characters are delimited by single quotes and always allow escapes
  165. - Numbers have their rules of writing, and aren't delimited by anything
  166. - Booleans are simply true and false, not delimited by anything
  167. - Backticks and double quotes, together with the optional @@type part, are for general types
  168. - If no @@type is given, String type is assumed
  169. - Backticks take the content as is, while double-quotes allow escape sequences (regardless of type)
  170. - Multiline strings can be formed in several ways as described below
  171. ### Multiline Strings
  172. Multiline strings are strings which take more than a single line in the Idan file. Inserting newline characters in the string itself can be easily done using the '\n' escape sequence. Multiline strings are useful for two things:
  173. 1. You want a long string to appear as is in the file
  174. 2. You want a long string to occupy several short Idan lines rather than a single super-long line
  175. One way to create multiline strings is by concatenation. It works as follows: Instead of specifying a single value:
  176. "hello world"
  177. or
  178. "hello world"@@smaoin:String
  179. You specify several (__optionally__ whitespace-separated) consecutive values, in which only the last one (if any) has the type tag:
  180. "hello" " world"
  181. or
  182. "hello" " world"@@smaoin:String
  183. The second way is to not close the quotes, in which case the string is copied into the parsed value as is. For example:
  184. [[!format n3 """
  185. <$> myns:foo "This is a single line. After it we want to go down to the next line.
  186. And here we are, one line below. Let's make some space here. Say, let's leave one blank line and then proceed.
  187. Good. Now some character fun.
  188. H
  189. e
  190. l
  191. l
  192. o
  193. !
  194. And this is the last line. Ciao!" .
  195. """]]
  196. This is useful for writing long strings embedded as-is in the file, preserving all the whitespace including newlines.
  197. ### Type Tag
  198. A value can have a type tag: value@@type. Looking at the values I used above, I don't see why this double-@ can't be shortened into a single @. I remember I documented my thoughts about it, but where was it?
  199. Found it. In the file lang_0 in rdd-wiki. It has many examples of potential characters, and then chooses @@.
  200. **DECISION**: Most of the time the type tag is not used anyway, but when it is, let's make it light and simple. I'm dropping __@@__ for a single __@__.
  201. ### Whitespace and Bison
  202. If Flex/Quex has support for skipping whitespace - awesome. Otherwise, the input will have to contain it. Anyway, it's a good idea to write it in the BNF.
  203. First, read about EBNF and write the rules for (Kort and) Idan in valid ENBF.
  204. I need to learn the tools. Go to the parser software wikipedia comparison page again, and see what I want to use. Choose a Flex-Bison pair and a PEG tool to try too later.
  205. I want to learn Flex-Bison in C first, by writing programs for some simple syntax. Use their Info manuals and maybe other tutorials. Then try Bison++ and BisonC++, and maybe write a libKort parser. Finally, go try Quex because I need UTF-8 support. Actually, non of the hard-coded symbols are non-ASCII so I may end up being file with Flex (maybe a bit of ugly hacks to express the non-ASCII whitespace characters etc.). Check what Redland does.
  206. ### Motivation Talk
  207. All the thinking about making queries work make me feel bad and lose motivation. I am still failing to develop a single more-or-less universal model for queries, especially due to the fact I could potentially have any function applied on the values and no clear limit when it's too much.
  208. For example, strings can be filtered using a regex. But I could also have a Python function to return a boolean - where does it stop? Can I just use *any* function in the query, and it just fits into the model? I want something general so I could have a C++ query builder general enough to support any reasonable future language. Imagine something like "Turing complete" but for queries.
  209. Maybe even worse: even if I have a plan and a model and a language - how do I make **software** which efficiently executes those queries? I can't even write a database. All the components seem to be so complicated, there's no way I can just do it alone. And since it's so different from SPARQL in syntax and there are statement identifiers to take into account - how do I exactly reuse any existing code?
  210. __I D E A__: Take some existing pure-RDF store, such as *4store*, and adapt it to Smaoin! Another idea, take Redland's query code, e.g. what it runs when it queries in-memory models or models from text files or models which use Berkeley DB and so on, and adapt it to Smaoin!
  211. See? That's motivation talk! You can do this!!!
  212. What I really wanted to talk about is actual useful artifacts to motivate me. I want to keep the "Applications" section full of ideas, things I actually need and could enjoy instantly, and those would make it clear where I'm going and why. So here it comes.
  213. First, a list of some applications I use.
  214. - Document viewer
  215. - Image viewer
  216. - Diagram editor
  217. - Text editor
  218. - Web browser
  219. - Mail reader
  220. - Calendar with events
  221. - Task manager
  222. - Music player
  223. - Media player
  224. - Subtitle editor
  225. - Translation editor
  226. - Spreadsheet editor
  227. - Diary app
  228. - Mindmap editor
  229. And now, the "imagine" part.
  230. Imagine you could just see all the documents you have, sorted by any field you want. No worrying about folder tress or where to put them in th home folder. You could organize by subject, year, author and even your personal level of interest. You could plan when you were going to read then and see your planned reading items in a list. You could link books to movies based on them and recommend books to friends.
  231. Imagine you could see all your images. Or just screenshots. Or just photographs. Or just photographs taken in a certain place. Or just photos not containing your face. Or just computer-drawn images. Or just images you intend to use as a desktop wallpaper. You could organize in albums and share with friends.
  232. Imagine all diagram editors had the same data format.
  233. Okay, okay. I get it. I can continue this "imagine" list later. I do have motivation now. You can stop for now.
  234. ### Query Model
  235. Idea: Property chains of exact length are simply syntactic sugar of SPARQL. But chains of unspecified length, such as the ones implied by transitive properties, have separate modeling.
  236. First an example. Assume we have a property P=is-parent-of and we want to check whether Alice is an ancestor of Bob. Then we want to ask: Do there exist X1...Xn such that Alice P X1 P X2 P ... P Xn P Bob?
  237. So I figured out - not looking at the SPARQL spec yet - that it's exactly the same as the transitive closure of a property, like we used =>* and =>+ in Automata course to denote one word gets parsed eventually into another.
  238. The __idea__: Use the same in the query model! For a property P, the transitive closure can be specified directly in the query by converting `a P b .` into `a P+ b` or `a P* b`. Of course another way is to simply have a transitive property (member of class `smaoin:TransitiveProperty`) and then the same computation would be done anyway, if inference is enabled of course.
  239. Note that for a transitive property P, the transitive closure P+ is identical to P. And if the property P is also reflexive, then the reflexive-transitive closure P* is identical to P. So basically, specifying those on such properties doesn't hurt, so you could for example specify them to support non-inferencing database engines which still having it work for the inferencing ones.
  240. The idea of transitive extension can also be used with arbitrary chain lengths. We could say the P+ and P* notations are simply shorcuts for length ranges P(1:inf) and P(0:inf) respectively. But we could also use any other range, for example P(3) or P(3:5) or P(3:inf).
  241. Let's see an example. Assume P means "is-parent-of". Then:
  242. - P(1) is P
  243. - P(2) means is-grand-parent-of
  244. - P(3) means is-grand-grand-parent-of
  245. - P(3:5) means the generation distance is between 3 (g-g-parent) and 5 (g-g-g-g-parent)
  246. - P(3:inf) means either g-g-parent or a larger generation distance
  247. As to notation, in math you can do things like f^2 which means f applied twice, i.e. f(f(x)) and similarly any other "exponent". f^0 is the identity function. I could generalize this for any relation, and then e.g. the syntax can be things like P*, P+, P^3, P^3:5, P^3:inf. I'll think about it. Any other convenient syntax is valid - the P() syntax I used was just for the discussion, not a decision or something final.