fr33domlover
/
Rel4tion-Wiki
mirror of git://seek-together.space/wiki.git


			
				
					
						
						
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252
							**Afifon** is planned to be a system for sharing software project information
between development platform instances.  Today, distributed search of projects
is completely missing! The best way I know is to run a distributed web search
using Yacy. But Yacy is heavy and does much more than needed, and I haven't
examined how it works with semantic tagging. It's another thing to examine.

Working on short term goals will hopefully help solve the problem of project
hosting centralization and proprietary tools (especially g1thu8 which sadly
many people use, even software freedom advocates), and in the process give me
the knowledge, understanding and experience needed for the long term goal of
bringing the same tools to all kinds of software.

"Afifon" ([[!hewiktionary "עפיפון"]]) means *kite* in Hebrew (the flying toy on
a string).

**Table of Contents**

[[!toc levels=3]]

# Data Model

Before the federation part, there must be a regular data model for repo
hosting. I'd like to start minimal, because having the full package with wikis
and issue tracking and CI etc. makes the federation core work unnecessarily
harder.

Therefore, the working assumption is that the system just stores repositories.

There are no groups and no organizations and no projects - just a flat
namespace for repos, and a flat namespace for users.

Permissions, for now, will be simple too. Given a user and a repo, the
permission system can determine whether the user can push changes to the repo,
and whether the user can manage the permissions of the repo. That's all, just
these two booleans. The permissions for a repo are therefore a pair of lists.
One list contains users who may make changes to the repo, and the other list
contains users who may add and remove users from these lists for this repo.

From the permissions of the instance's repos, for each user it's possible to
compute and maintain a list of repos she can push to. However, as merge
requests are possible too, people can participate in other projects too.

A merge request is an ordered pair of branches from different repos. It means
"please merge the first branch into the second branch". For each repo, a list
of these should be maintained.

[[model.dia]]

# Federation

## Minimal

What is the minimal feature set required for removing all the inconvenience of
decentralized repo hosting?

- Shared user accounts
- Repos, projects, tickets, wikis - all support grouping and collaboration
  between users of different instances, transparently
- Forks and merge requests across instances
- Global search and access to all data: Can reach any person, project, repo,
  group, wiki, etc. by starting to search or browse from *any* instance - no
  need to know the home base instance in advance. HTTP redirection between
  instances is OK, just make it happen automatically during the user browsing
  workflow
- Maybe share user, project, repo, etc. name space, so that they're all unique
  and can be moved between instances without name collisions. Another option is
  to assign unique IDs, e.g. using UUIDs, without requiring unique names

## User Operations

- vcs operations: as usual from command line vcs program
- (TODO)

## Federation Features

- Search for users and repositories worldwide
- Give permissions to users from other instances
- Take merge requests from users from other instances
- Display user and repo links, icons, etc. for remote ones in the same way as
  for local ones, making the federation transparent. The UI doesn't
  differentiate between local and remote objects.

## Cases to Study

- Gitolite
- Darcsden
- Gogs
- Kallithea

## Implementation Plan

- Storage backend which stores all the meta info (users, permissions, etc.)
- Repo/file for configuration by server admin
- Shell for SSH access, allowing commands according to permissions
- Library which abstracts VCS access - [[/projects/repository]]
- Library which implements all the data manipulations
- Web API which wraps it
- Some kind of UI - either web-based or desktop GUI, not critical right now

## User Stories

### Storage Sharing

Using:

- Alice has a repository R on instance S (denoted S.R)
- Alice makes a local commit on her computer, using her local copy
- Alice pushes the commit to the remote branch on instance S
- Repository R has two read-only backup mirrors, on instances T and U
- Either immediately after the push, or peiodically, S pushes the changes in
  S.R to the backup repositories T.R and U.R, which aren't necessarily visible
  publicly (maybe through some admin UI or backup mirror UI, not the main UI
  anyway)

Choosing:

- Alice creates a new repository R on instance S
- S chooses, based on stats and the list of participating instances, two backup
  instances for R - instances T and U
- Periodically, and/or when T or U have uptime or responsiveness problems, or
  they announce expected downtime, S may choose new backup instances, so that
  there are always two of them. When one or both are down, this period of lack
  of backup is minimized by detecting this and choosing new instances.

# Distributed Search

Today ways to find projects and be found are:

- Have your project hosted in a proprietary centralized system like g1thu8
- Have your project hosted in a free software system like gitorious, savannah
- Run your own hosting platform with gitorious, kallithea, gogs, gitolite, etc.
- Use a proprietary user-tracking centralized search engine, like g00gle
- Use a distributed free software search engine, like Yacy

Getting all the proprietary-ness, centralization and greed out of the picture,
possible ideas are:

- Make hosting platforms federate, i.e. each one has integrated project search
  and integration for merge requests, usernames, bugs, wiki pages etc. across
  instances over the network
- Make distributed web search support semantic search and have instances
  communicate like social network nodes (e.g. Diaspora\* pods), which probably
  doesn't need a DHT
- Have a shared vocabulary and API for dev platforms to provide info, use info,
  declare features and subsystems, etc.

# Storage Distribution

Should there be a clear concept of "hosting provider" in the traditional sense?
In other words, should storage be completely distributed (i.e. a project
doesn't belong to any specific hosting server), or should each project have a
"home base"?

From the user's point of view, it doesn't matter much in the technical sense.
If a user can see project details using any instance, the actual storage
location doesn't matter. But e.g. unlike with Tahoe LAFS, the information is
public, so the LAFS concept and its overhead aren't needed here. It may not be
critical to have a single specific upstream instance visible to the user, but
in any case there should be several backup instances for each project.

What about the regular version control system usage? Having no upstream
instance means that things like `git clone` will be slower. The server will
have to figure out the physical upstream, clone from it and stream the data to
the requesting user client. Instead, if people *get the URL* of the physical
upstream, they can clone it like always.

Problem: What happens if physical upstream is down? How can people still work
with the repo? Should they be able to push too, or just pull and clone?

If a local GUI app handles detecting the functioning mirror, then no physical
visible upstream is needed in the first place. Pulling only is trivial: Just
detect the mirror and pull. Pushing means that once the upstream is down, one
of the backups detects this and becomes the new upstream (there's a shared
known protocol for choosing which backup it is). From that point on, users can
work with it transparently.

Question: Should I work on all these ideas, or should I focus on a *minimal*
addition to the existing centralized model?

Answer: Start minimal. In particular, it means storage sharing isn't needed.

# Tasks

- Is Yacy good for this? Can it do semantic search? How good is it today at
  finding independently hosted software projects?
- Understand DHTs, examine existing ones
- Write some basic ontology for project info
- How do GNU Social, Pump.io and Diaspora\* instances federate?
- Given semantic search, is DHT still a good option for collaboration features?
  If yes, it may be better to use it for search too than rely on
  general-purpose search
- Is a general-purpose quadstore DHT possible? Does it make sense? Maybe it can
  a meta-store which just says "who knows what" or "who's online", and querying
  can be done using direct connections to the found nodes?
- Maybe it's much easier and reusable to use an existing DHT based system! For
  example, GNUnet and I2P are general-purpose (e.g. see how file sharing and
  instant messaging work with them) and maybe Freenet is relevant too. Also
  cjdns. Make a list of candidates to examine.

# Features and Ideas

- Allow instances to community with other instances, for queries and commands
- DHT for distributed access where needed
- Fork projects from other instances
- Send merge requests to projects in other instances
- Report bugs to projects in other instances
- Use GPG WoT for trust between users and/or between instances
- In the future try to use distributed storage for repos, or at least for
  backup. For now, just let projects be hosted on specific hosts
- Global semantic project search
- Transparent federation: UI doesn't make you handle local/remote difference.
  It gets abstracted, like Diaspora and GNU Social transparently connect you to
  people from other instances
- Global uniform username space. Maybe avoid using hosts in the name, so moving
  between hosts is easy.
- Easy 1-click move of projects between instances, without breaking anything

# Random

A development platform may consist of many components, each providing a
solution for a particular need. Afifon should rely on any specific combination,
because the common choice of components is arbitrary. What Afifon can and
should rely on, is *theory* behind the combination of components.

Under application [[/projects]] and under the [[/projects/Kiwi]] ontologies, I
have been working on general-purpose models for wikis, issue tracking,
discussions and much more. There is still a lot of work to do on these models
and on the deployment aspects of solutions based on them. At least for now,
Afifon won't rely on these models, because it may take a lot of time for them
to be ready.

The initial Afifon will just use existing common practices. Examples for common
components are: version controlled repository, wiki, issue tracker, mailing
list, forum, generated manual and API reference.

For the very beginning, only core features will be supported. No integrated
issue tracking, no wiki, no discussions. Just the version control and code
repository aspects.

[[This|http://tsyesika.co.uk/u/federated-code-issue-hosting.html]] is a plan by
Jessica Tallon, which at the time of writing is concerned with the federation
messages. Afifon should also support things like DHT, distributed DNS, storage
sharing, p2p, routing and more. GNUnet may be a good idea here.

Collaborating on that project may be an awesome idea, especially the
information model. Also, the JSON snippets. While [[/projects/Idan]] may be
more readable and more writable for humans, it's much too complicated for the
basic simple needs of machine communication. Machines don't need convenience
features like references and generators. So I suppose that with some
inspiration from [[!wikipedia JSON-LD]], I could define a mapping between JSON
and [[/projects/Smaoin]]. Anyway, we'll see soon.