fr33domlover
/
Rel4tion-Wiki
mirror of git://seek-together.space/wiki.git


			
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366
							- Bring my old entries from the wordpress export, and use the meta
  plugin to set creation and modification dates manually to make the
  blog and RSS treat them correctly as written in the correct real
  times (i.e. 2013)
- go over, has some interesting material: <https://antportal.com/wiki/>
- Type into the wiki (or at least take) the "skapa model" page from algorithms
  course notebook
- Idea: Instead of letting each app decide which Saugus backend to use (file,
  desktop database, etc.) I can let apps read this setting from a
  desktop-central place and make a "data sources" app which centrally configures
  sources per-app. It can also work using dconf and/or xfce conf etc. It gives
  the control to the *user* instead of setting this in source code. It's both
  good for the development and testing etc., and allows e.g. privacy apps,
  Tor/I2P browser etc. to use a separate encrypted file/DB and the user can
  fully control it. Even have several separate sources per app!
- Tosaf: Each hook object should have a folder where it searches for plugins.
  libdir can maybe be set globally for all hooks with a single command. But
  Plugin must be generic and and get its info from its hook. so/dll suffix can
  be left for Plugin/Library to handle
- Plan the framework user point-of-view: One of the problems with existing
  semantic desktop projects is that the entry level is very high. Another
  problem is that the technology is very far away from everything people
  actually use: GNU/Linux desktop technologies and commonly used tools and
  languages and conventions. Also, it's been all developed in a closed model by
  academy people, so the community has no knowledge. Or by companies. There
  isn't involvement of non-experts.

Proposal: Activity Interface
============================

In order to help people find information faster, the website of *Partager* can
arrange its content, or at least link to it from the main page, by *task*. In
other words, it allows you to choose what you want to do and directs you from
there. Possible actions:

* Study an ontology
* Extend an ontology
* Learn about the various ontologies
* Use an ontology in a user application
* Create a new ontology

Research Discussion
===================

******** Namespace

Namespaces are used as groups of names, in order to give the names additional information describing where the names come from or which body defines and maintains them. This way name ambiguities are resolved when a name is used for different things.

Therefore namespaces don't have any meaning semantically. They just group names defined by some body or as a unified group, in order to keep the names from colliding with names from other groups. A name being in a given namespace doesn't provide any semantic information about that name.


******** Ontology

An ontology describes concepts and relations within a given domain, using a common volabulary. For example, an RDFS ontology may describe elements in the field of computer programming, and it then uses elements of RDFS to describe them. In other words the ontology is "written in RDFS".

The NRL ontology, part of NEPOMUK, has definitions which complete things missing from RDF/S like nrl:Graph, nrl:Ontology and many other things. I can have similar definitions in Idan (possible by using NRL, even without creating a special parallel for Idan).

**** Equivelence and Versioning

There should be a way in Idan to say that two resources refer to the same thing. And it should be possible to say that one ontology is a subontology of the other, and the same for graphs. Let's see if NRL provides definitions for that.

NRL seems not to, but OWL does have a class equivalence property. However, equivalence can also be expressed by saying A and B are subclasses of each other.

I don't see anything about subontologies, but maybe a deeper search would find something. Anyway I could easily define that by myself. It's good for when there's a new version or people want to share and merge their works.

Research Tracking
=================

******** Instructions

Read about Gellish and see if there are good things I can use, same for RDF. For example giving uid to statements, allowing statements to be questions/proposals, not just facts. Use measurement units for values. But try to add them in an extensible way, i.e. try to add them outside the core language to make is small and simple. *****

TODO HERE: Read about how Gellish, RDF, OWL and Tracker treat namespaces and ontologies: What they are, why they exist, how they work, how analogous ontologies are connected/merged. Then decide and explain here how namespaces and ontologies work. Focus on decentralization of ontologies and namespaces, and distribution of ontologies, namespaces and data. Reading in the W3C standards may be a good way to understand how and why things were planned. Also JSON-LD is interesting.


******** Topics

**** Concepts

Namespace
	http://en.wikipedia.org/wiki/Namespace
	http://en.wikipedia.org/wiki/Xml_namespace
Ontology
	http://en.wikipedia.org/wiki/Ontology_(information_science)

**** Languages

RDF
	http://en.wikipedia.org/wiki/Resource_Description_Framework
	http://www.w3.org/TR/rdf-primer/
	http://www.w3schools.com/webservices/ws_rdf_intro.asp
RDFS
	http://en.wikipedia.org/wiki/RDFS
	http://www.w3.org/TR/rdf-schema/
	http://www.w3schools.com/webservices/ws_rdf_schema.asp
OWL
	http://en.wikipedia.org/wiki/Web_Ontology_Language
	http://www.w3.org/TR/owl2-overview/
YAML
JSON
Gellish
	http://en.wikipedia.org/wiki/Gellish
	http://sourceforge.net/apps/trac/gellish/

**** Notations

Turtle
JSON-LD
	http://json-ld.org/index.html#
	http://www.w3.org/TR/json-ld/

**** Software

Tracker
Strigi
Beagle

Old TODO
===============

	[ ] = TODO
	[%] = WIP
	[X] = DONE

	[ ] 21nov2013 Start designing basics of Idan using YAML
	[ ] 21nov2013 Continue API development in C++
	[X] 21nov2013 Examine Gellish's extra fields for triples, e.g. fact/question/opinion, see how I can have them in Idan
	[ ] 21nov2013 Finalize the architecture basics, to make sure the API matches requirements
	[ ] 21nov2013 After architecture basics are final more or less, list components:
		[ ] 21nov2013 API to write graphs to file and read from file
		[ ] 21nov2013 API to efficiently update file with changes, e.g. see how Gedit and others update files: rewrite all or just changes
		[ ] 21nov2013 Decide where SPARQL / similar language can be used - on top of repos or just for database repo
		[ ] 21nov2013 CLI tools
		[ ] 21nov2013 GUI tools
	[%] 21nov2013 Go over CherryTree document, see what I can use, maybe make a file here to track migration of data from there to here
	[ ] 21nov2013 Create an automated task solution which doesn't require putting all tasks in one file
		[ ] 21nov2013 Create a script which extracts tasks from files in the progress folder
			[ ] 21nov2013 Learn bash scripting
			[ ] 21nov2013 Learn shell programming
			[ ] 21nov2013 Learn coreutils
			[ ] 21nov2013 Write makeclass in Bash for practice
		[ ] 21nov2013 Make the tool read patterns from a file, so that new ways to mark TODOs in files can be easily added
	[ ] 21nov2013 Split the research files to separate research for each component and add them to respective folders
	[ ] 21nov2013 Find/write scripts which generate table-of-contents from titles in my files, so it becomes possible for GUI to use anchors to run a GUI table-of-contents and help the user easily manage long files. If I make the TOC manually in plain text, it would become tedious and difficult to manage the TOCs over time, and they're not interactice anyway. So make scripts for that. The GUI would use them, but there's probably no justification to use C/C++ directly. A script in Bash/Perl/Python will probably do.
	[ ] 21nov2013 IDEA: A tool which lists the "chain of documents" I work with and the current one and the status and what to do and what to do after that, etc. to help me keep context and remember where I am, both in real time and when continuing where I stopped last time


Git to ChangeLog, and Other Things
==================================


First Application
-----------------

My next tasks are writing parsers and serializers for *Idan* and for *Kort*, and formalizing the Kiwi ontologies. This leads me to the first Partager application. It must exist in order for problems and needs to show up, and to have a practical example of things in real use.

The idea is to have one desktop database accessed via D-Bus (or k-dbus) which GUI applications can then use for anything they wish. It will be an experiment. The actual first GUI application I'd like to have is a file/info browser which operates over all the visible files in the home folder, or over a subset of them. Later we'll see exactly what this application will do and how. It's also possible to use a simple file for this, as long as it's the only application using this Repository.

Another option is to have an app which simply uses a small file, e.g. make a movie collection manager like the one I tried, but with a full ontology and expansion of the ontology would be supported transparently thanks to Smaoin. It's a small simple app, easier than a whole desktop info manager.


Idan Syntax
---------------

In order to write a parser/serializer for Idan, I need to formalize the syntax. Here it comes, more or less. I'm assuming UTF-8 characters as atoms, not ASCII. Also, I'm starting by using just the Smaoin Idan files as reference, and later I'll go over all the plans and the i18n file in rdd-wiki and make sure I didn't forget anything.

This brings a question: Does a comment have to be on a separate line, or can it be at the end of a line? I'll need to examine all the docs to answer this well. Let's start with just a simple subset and expand gradually.

For now I'm omitting the whitespace between components, because it makes BNF ugly. But basically the idea is that all things that aren't required to be attached can have any whitespace between then: blank lines, space, tab and so on. For the full list see here: <https://en.wikipedia.org/wiki/Whitespace_character>.

<document>                  => <header> <content>

<header>                    => <header-lang>? <header-declaration>*
<header-lang>               => <header-lang-prefix> <header-lang-identifier>
<header-lang-identifier>    => <identifier>
<header-declaration>        => <header-declaration-prefix> <header-declaration-type> <header-declaration-value>+
<header-declaration-type>   => <identifier>
<header-declaration-value>  => <identifier>

<content>                   => <block>*
<block>                     => <subject> <description>
<subject>                   => <resource> | <reference> | <subject-placeholder>
<resource>                  => <resource-open> <uid> <resource-close>
<reference>                 => TODOOOO
<subject-placeholder>       => <resource-open> <subject-placeholder-char> <resource-close>
<description>               => (<decription-item> <description-separator>)* <decription-item> <description-suffix>
<decription-item>           => <predicate> <object-list>
<predicate>                 => <resource> | <reference>
<object-list>               => <object> (<object-separator> <object>)*
<object>                    => TODOOOO

<identifier>                => <identifier-char>+
<uid>                       => ((<identifier-char> - <resource-close>) | (<escape-char> <resource-close>))+

<header-lang-prefix>        => '@@'
<header-declaration-prefix> => '@'
<resource-open>             => '<'
<resource-close>            => '>'
<subject-placeholder-char>  => '$'
<identifier-char>           => any non-whitespace character
<escape-char>               => '\'
<description-separator>     => ';'
<description-suffix>        => '.'
<object-separator>          => ','

Notes:

### Values

In Kort, all values take the form "value"@@type. But here I want to have more flexibility and convenience, since it's going to be a high-level language. I want to have the following features:

- General purpose syntax so that newly added types can always work
- Short syntax for existing types
- Strings have different syntax for with-escapes and without-escapes
- Multiline strings work like in the Smaoin Idan file

Here are suggestions:

- Characters are delimited by single quotes and always allow escapes
- Numbers have their rules of writing, and aren't delimited by anything
- Booleans are simply true and false, not delimited by anything
- Backticks and double quotes, together with the optional @@type part, are for general types
- If no @@type is given, String type is assumed
- Backticks take the content as is, while double-quotes allow escape sequences (regardless of type)
- Multiline strings can be formed in several ways as described below

### Multiline Strings

Multiline strings are strings which take more than a single line in the Idan file. Inserting newline characters in the string itself can be easily done using the '\n' escape sequence. Multiline strings are useful for two things:

1. You want a long string to appear as is in the file
2. You want a long string to occupy several short Idan lines rather than a single super-long line

One way to create multiline strings is by concatenation. It works as follows: Instead of specifying a single value:

	"hello world"

or

	"hello world"@@smaoin:String

You specify several (__optionally__ whitespace-separated) consecutive values, in which only the last one (if any) has the type tag:

	"hello" " world"

or

	"hello" " world"@@smaoin:String

The second way is to not close the quotes, in which case the string is copied into the parsed value as is. For example:

[[!format n3 """
<$> myns:foo "This is a single line. After it we want to go down to the next line.
And here we are, one line below. Let's make some space here. Say, let's leave one blank line and then proceed.

Good. Now some character fun.

H
 e
  l
   l
    o
     !

And this is the last line. Ciao!" .
"""]]

This is useful for writing long strings embedded as-is in the file, preserving all the whitespace including newlines.


### Type Tag

A value can have a type tag: value@@type. Looking at the values I used above, I don't see why this double-@ can't be shortened into a single @. I remember I documented my thoughts about it, but where was it?

Found it. In the file lang_0 in rdd-wiki. It has many examples of potential characters, and then chooses @@.

**DECISION**: Most of the time the type tag is not used anyway, but when it is, let's make it light and simple. I'm dropping __@@__ for a single __@__.


### Whitespace and Bison

If Flex/Quex has support for skipping whitespace - awesome. Otherwise, the input will have to contain it. Anyway, it's a good idea to write it in the BNF.

First, read about EBNF and write the rules for (Kort and) Idan in valid ENBF.

I need to learn the tools. Go to the parser software wikipedia comparison page again, and see what I want to use. Choose a Flex-Bison pair and a PEG tool to try too later.

I want to learn Flex-Bison in C first, by writing programs for some simple syntax. Use their Info manuals and maybe other tutorials. Then try Bison++ and BisonC++, and maybe write a libKort parser. Finally, go try Quex because I need UTF-8 support. Actually, non of the hard-coded symbols are non-ASCII so I may end up being file with Flex (maybe a bit of ugly hacks to express the non-ASCII whitespace characters etc.). Check what Redland does.


### Motivation Talk

All the thinking about making queries work make me feel bad and lose motivation. I am still failing to develop a single more-or-less universal model for queries, especially due to the fact I could potentially have any function applied on the values and no clear limit when it's too much.

For example, strings can be filtered using a regex. But I could also have a Python function to return a boolean - where does it stop? Can I just use *any* function in the query, and it just fits into the model? I want something general so I could have a C++ query builder general enough to support any reasonable future language. Imagine something like "Turing complete" but for queries.

Maybe even worse: even if I have a plan and a model and a language - how do I make **software** which efficiently executes those queries? I can't even write a database. All the components seem to be so complicated, there's no way I can just do it alone. And since it's so different from SPARQL in syntax and there are statement identifiers to take into account - how do I exactly reuse any existing code?

__I D E A__: Take some existing pure-RDF store, such as *4store*, and adapt it to Smaoin! Another idea, take Redland's query code, e.g. what it runs when it queries in-memory models or models from text files or models which use Berkeley DB and so on, and adapt it to Smaoin!

See? That's motivation talk! You can do this!!!

What I really wanted to talk about is actual useful artifacts to motivate me. I want to keep the "Applications" section full of ideas, things I actually need and could enjoy instantly, and those would make it clear where I'm going and why. So here it comes.

First, a list of some applications I use.

- Document viewer
- Image viewer
- Diagram editor
- Text editor
- Web browser
- Mail reader
- Calendar with events
- Task manager
- Music player
- Media player
- Subtitle editor
- Translation editor
- Spreadsheet editor
- Diary app
- Mindmap editor

And now, the "imagine" part.

Imagine you could just see all the documents you have, sorted by any field you want. No worrying about folder tress or where to put them in th home folder. You could organize by subject, year, author and even your personal level of interest. You could plan when you were going to read then and see your planned reading items in a list. You could link books to movies based on them and recommend books to friends.

Imagine you could see all your images. Or just screenshots. Or just photographs. Or just photographs taken in a certain place. Or just photos not containing your face. Or just computer-drawn images. Or just images you intend to use as a desktop wallpaper. You could organize in albums and share with friends.

Imagine all diagram editors had the same data format.

Okay, okay. I get it. I can continue this "imagine" list later. I do have motivation now. You can stop for now.


### Query Model

Idea: Property chains of exact length are simply syntactic sugar of SPARQL. But chains of unspecified length, such as the ones implied by transitive properties, have separate modeling.

First an example. Assume we have a property P=is-parent-of and we want to check whether Alice is an ancestor of Bob. Then we want to ask: Do there exist X1...Xn such that Alice P X1 P X2 P ... P Xn P Bob?

So I figured out - not looking at the SPARQL spec yet - that it's exactly the same as the transitive closure of a property, like we used =>* and =>+ in Automata course to denote one word gets parsed eventually into another.

The __idea__: Use the same in the query model! For a property P, the transitive closure can be specified directly in the query by converting `a P b .` into `a P+ b` or `a P* b`. Of course another way is to simply have a transitive property (member of class `smaoin:TransitiveProperty`) and then the same computation would be done anyway, if inference is enabled of course.

Note that for a transitive property P, the transitive closure P+ is identical to P. And if the property P is also reflexive, then the reflexive-transitive closure P* is identical to P. So basically, specifying those on such properties doesn't hurt, so you could for example specify them to support non-inferencing database engines which still having it work for the inferencing ones.

The idea of transitive extension can also be used with arbitrary chain lengths. We could say the P+ and P* notations are simply shorcuts for length ranges P(1:inf) and P(0:inf) respectively. But we could also use any other range, for example P(3) or P(3:5) or P(3:inf).

Let's see an example. Assume P means "is-parent-of". Then:

- P(1) is P
- P(2) means is-grand-parent-of
- P(3) means is-grand-grand-parent-of
- P(3:5) means the generation distance is between 3 (g-g-parent) and 5 (g-g-g-g-parent)
- P(3:inf) means either g-g-parent or a larger generation distance

As to notation, in math you can do things like f^2 which means f applied twice, i.e. f(f(x)) and similarly any other "exponent". f^0 is the identity function. I could generalize this for any relation, and then e.g. the syntax can be things like P*, P+, P^3, P^3:5, P^3:inf. I'll think about it. Any other convenient syntax is valid - the P() syntax I used was just for the discussion, not a decision or something final.