123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252 |
- **Afifon** is planned to be a system for sharing software project information
- between development platform instances. Today, distributed search of projects
- is completely missing! The best way I know is to run a distributed web search
- using Yacy. But Yacy is heavy and does much more than needed, and I haven't
- examined how it works with semantic tagging. It's another thing to examine.
- Working on short term goals will hopefully help solve the problem of project
- hosting centralization and proprietary tools (especially g1thu8 which sadly
- many people use, even software freedom advocates), and in the process give me
- the knowledge, understanding and experience needed for the long term goal of
- bringing the same tools to all kinds of software.
- "Afifon" ([[!hewiktionary "עפיפון"]]) means *kite* in Hebrew (the flying toy on
- a string).
- **Table of Contents**
- [[!toc levels=3]]
- # Data Model
- Before the federation part, there must be a regular data model for repo
- hosting. I'd like to start minimal, because having the full package with wikis
- and issue tracking and CI etc. makes the federation core work unnecessarily
- harder.
- Therefore, the working assumption is that the system just stores repositories.
- There are no groups and no organizations and no projects - just a flat
- namespace for repos, and a flat namespace for users.
- Permissions, for now, will be simple too. Given a user and a repo, the
- permission system can determine whether the user can push changes to the repo,
- and whether the user can manage the permissions of the repo. That's all, just
- these two booleans. The permissions for a repo are therefore a pair of lists.
- One list contains users who may make changes to the repo, and the other list
- contains users who may add and remove users from these lists for this repo.
- From the permissions of the instance's repos, for each user it's possible to
- compute and maintain a list of repos she can push to. However, as merge
- requests are possible too, people can participate in other projects too.
- A merge request is an ordered pair of branches from different repos. It means
- "please merge the first branch into the second branch". For each repo, a list
- of these should be maintained.
- [[model.dia]]
- # Federation
- ## Minimal
- What is the minimal feature set required for removing all the inconvenience of
- decentralized repo hosting?
- - Shared user accounts
- - Repos, projects, tickets, wikis - all support grouping and collaboration
- between users of different instances, transparently
- - Forks and merge requests across instances
- - Global search and access to all data: Can reach any person, project, repo,
- group, wiki, etc. by starting to search or browse from *any* instance - no
- need to know the home base instance in advance. HTTP redirection between
- instances is OK, just make it happen automatically during the user browsing
- workflow
- - Maybe share user, project, repo, etc. name space, so that they're all unique
- and can be moved between instances without name collisions. Another option is
- to assign unique IDs, e.g. using UUIDs, without requiring unique names
- ## User Operations
- - vcs operations: as usual from command line vcs program
- - (TODO)
- ## Federation Features
- - Search for users and repositories worldwide
- - Give permissions to users from other instances
- - Take merge requests from users from other instances
- - Display user and repo links, icons, etc. for remote ones in the same way as
- for local ones, making the federation transparent. The UI doesn't
- differentiate between local and remote objects.
- ## Cases to Study
- - Gitolite
- - Darcsden
- - Gogs
- - Kallithea
- ## Implementation Plan
- - Storage backend which stores all the meta info (users, permissions, etc.)
- - Repo/file for configuration by server admin
- - Shell for SSH access, allowing commands according to permissions
- - Library which abstracts VCS access - [[/projects/repository]]
- - Library which implements all the data manipulations
- - Web API which wraps it
- - Some kind of UI - either web-based or desktop GUI, not critical right now
- ## User Stories
- ### Storage Sharing
- Using:
- - Alice has a repository R on instance S (denoted S.R)
- - Alice makes a local commit on her computer, using her local copy
- - Alice pushes the commit to the remote branch on instance S
- - Repository R has two read-only backup mirrors, on instances T and U
- - Either immediately after the push, or peiodically, S pushes the changes in
- S.R to the backup repositories T.R and U.R, which aren't necessarily visible
- publicly (maybe through some admin UI or backup mirror UI, not the main UI
- anyway)
- Choosing:
- - Alice creates a new repository R on instance S
- - S chooses, based on stats and the list of participating instances, two backup
- instances for R - instances T and U
- - Periodically, and/or when T or U have uptime or responsiveness problems, or
- they announce expected downtime, S may choose new backup instances, so that
- there are always two of them. When one or both are down, this period of lack
- of backup is minimized by detecting this and choosing new instances.
- # Distributed Search
- Today ways to find projects and be found are:
- - Have your project hosted in a proprietary centralized system like g1thu8
- - Have your project hosted in a free software system like gitorious, savannah
- - Run your own hosting platform with gitorious, kallithea, gogs, gitolite, etc.
- - Use a proprietary user-tracking centralized search engine, like g00gle
- - Use a distributed free software search engine, like Yacy
- Getting all the proprietary-ness, centralization and greed out of the picture,
- possible ideas are:
- - Make hosting platforms federate, i.e. each one has integrated project search
- and integration for merge requests, usernames, bugs, wiki pages etc. across
- instances over the network
- - Make distributed web search support semantic search and have instances
- communicate like social network nodes (e.g. Diaspora\* pods), which probably
- doesn't need a DHT
- - Have a shared vocabulary and API for dev platforms to provide info, use info,
- declare features and subsystems, etc.
- # Storage Distribution
- Should there be a clear concept of "hosting provider" in the traditional sense?
- In other words, should storage be completely distributed (i.e. a project
- doesn't belong to any specific hosting server), or should each project have a
- "home base"?
- From the user's point of view, it doesn't matter much in the technical sense.
- If a user can see project details using any instance, the actual storage
- location doesn't matter. But e.g. unlike with Tahoe LAFS, the information is
- public, so the LAFS concept and its overhead aren't needed here. It may not be
- critical to have a single specific upstream instance visible to the user, but
- in any case there should be several backup instances for each project.
- What about the regular version control system usage? Having no upstream
- instance means that things like `git clone` will be slower. The server will
- have to figure out the physical upstream, clone from it and stream the data to
- the requesting user client. Instead, if people *get the URL* of the physical
- upstream, they can clone it like always.
- Problem: What happens if physical upstream is down? How can people still work
- with the repo? Should they be able to push too, or just pull and clone?
- If a local GUI app handles detecting the functioning mirror, then no physical
- visible upstream is needed in the first place. Pulling only is trivial: Just
- detect the mirror and pull. Pushing means that once the upstream is down, one
- of the backups detects this and becomes the new upstream (there's a shared
- known protocol for choosing which backup it is). From that point on, users can
- work with it transparently.
- Question: Should I work on all these ideas, or should I focus on a *minimal*
- addition to the existing centralized model?
- Answer: Start minimal. In particular, it means storage sharing isn't needed.
- # Tasks
- - Is Yacy good for this? Can it do semantic search? How good is it today at
- finding independently hosted software projects?
- - Understand DHTs, examine existing ones
- - Write some basic ontology for project info
- - How do GNU Social, Pump.io and Diaspora\* instances federate?
- - Given semantic search, is DHT still a good option for collaboration features?
- If yes, it may be better to use it for search too than rely on
- general-purpose search
- - Is a general-purpose quadstore DHT possible? Does it make sense? Maybe it can
- a meta-store which just says "who knows what" or "who's online", and querying
- can be done using direct connections to the found nodes?
- - Maybe it's much easier and reusable to use an existing DHT based system! For
- example, GNUnet and I2P are general-purpose (e.g. see how file sharing and
- instant messaging work with them) and maybe Freenet is relevant too. Also
- cjdns. Make a list of candidates to examine.
- # Features and Ideas
- - Allow instances to community with other instances, for queries and commands
- - DHT for distributed access where needed
- - Fork projects from other instances
- - Send merge requests to projects in other instances
- - Report bugs to projects in other instances
- - Use GPG WoT for trust between users and/or between instances
- - In the future try to use distributed storage for repos, or at least for
- backup. For now, just let projects be hosted on specific hosts
- - Global semantic project search
- - Transparent federation: UI doesn't make you handle local/remote difference.
- It gets abstracted, like Diaspora and GNU Social transparently connect you to
- people from other instances
- - Global uniform username space. Maybe avoid using hosts in the name, so moving
- between hosts is easy.
- - Easy 1-click move of projects between instances, without breaking anything
- # Random
- A development platform may consist of many components, each providing a
- solution for a particular need. Afifon should rely on any specific combination,
- because the common choice of components is arbitrary. What Afifon can and
- should rely on, is *theory* behind the combination of components.
- Under application [[/projects]] and under the [[/projects/Kiwi]] ontologies, I
- have been working on general-purpose models for wikis, issue tracking,
- discussions and much more. There is still a lot of work to do on these models
- and on the deployment aspects of solutions based on them. At least for now,
- Afifon won't rely on these models, because it may take a lot of time for them
- to be ready.
- The initial Afifon will just use existing common practices. Examples for common
- components are: version controlled repository, wiki, issue tracker, mailing
- list, forum, generated manual and API reference.
- For the very beginning, only core features will be supported. No integrated
- issue tracking, no wiki, no discussions. Just the version control and code
- repository aspects.
- [[This|http://tsyesika.co.uk/u/federated-code-issue-hosting.html]] is a plan by
- Jessica Tallon, which at the time of writing is concerned with the federation
- messages. Afifon should also support things like DHT, distributed DNS, storage
- sharing, p2p, routing and more. GNUnet may be a good idea here.
- Collaborating on that project may be an awesome idea, especially the
- information model. Also, the JSON snippets. While [[/projects/Idan]] may be
- more readable and more writable for humans, it's much too complicated for the
- basic simple needs of machine communication. Machines don't need convenience
- features like references and generators. So I suppose that with some
- inspiration from [[!wikipedia JSON-LD]], I could define a mapping between JSON
- and [[/projects/Smaoin]]. Anyway, we'll see soon.
|