afifon.mdwn 11 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252
  1. **Afifon** is planned to be a system for sharing software project information
  2. between development platform instances. Today, distributed search of projects
  3. is completely missing! The best way I know is to run a distributed web search
  4. using Yacy. But Yacy is heavy and does much more than needed, and I haven't
  5. examined how it works with semantic tagging. It's another thing to examine.
  6. Working on short term goals will hopefully help solve the problem of project
  7. hosting centralization and proprietary tools (especially g1thu8 which sadly
  8. many people use, even software freedom advocates), and in the process give me
  9. the knowledge, understanding and experience needed for the long term goal of
  10. bringing the same tools to all kinds of software.
  11. "Afifon" ([[!hewiktionary "עפיפון"]]) means *kite* in Hebrew (the flying toy on
  12. a string).
  13. **Table of Contents**
  14. [[!toc levels=3]]
  15. # Data Model
  16. Before the federation part, there must be a regular data model for repo
  17. hosting. I'd like to start minimal, because having the full package with wikis
  18. and issue tracking and CI etc. makes the federation core work unnecessarily
  19. harder.
  20. Therefore, the working assumption is that the system just stores repositories.
  21. There are no groups and no organizations and no projects - just a flat
  22. namespace for repos, and a flat namespace for users.
  23. Permissions, for now, will be simple too. Given a user and a repo, the
  24. permission system can determine whether the user can push changes to the repo,
  25. and whether the user can manage the permissions of the repo. That's all, just
  26. these two booleans. The permissions for a repo are therefore a pair of lists.
  27. One list contains users who may make changes to the repo, and the other list
  28. contains users who may add and remove users from these lists for this repo.
  29. From the permissions of the instance's repos, for each user it's possible to
  30. compute and maintain a list of repos she can push to. However, as merge
  31. requests are possible too, people can participate in other projects too.
  32. A merge request is an ordered pair of branches from different repos. It means
  33. "please merge the first branch into the second branch". For each repo, a list
  34. of these should be maintained.
  35. [[model.dia]]
  36. # Federation
  37. ## Minimal
  38. What is the minimal feature set required for removing all the inconvenience of
  39. decentralized repo hosting?
  40. - Shared user accounts
  41. - Repos, projects, tickets, wikis - all support grouping and collaboration
  42. between users of different instances, transparently
  43. - Forks and merge requests across instances
  44. - Global search and access to all data: Can reach any person, project, repo,
  45. group, wiki, etc. by starting to search or browse from *any* instance - no
  46. need to know the home base instance in advance. HTTP redirection between
  47. instances is OK, just make it happen automatically during the user browsing
  48. workflow
  49. - Maybe share user, project, repo, etc. name space, so that they're all unique
  50. and can be moved between instances without name collisions. Another option is
  51. to assign unique IDs, e.g. using UUIDs, without requiring unique names
  52. ## User Operations
  53. - vcs operations: as usual from command line vcs program
  54. - (TODO)
  55. ## Federation Features
  56. - Search for users and repositories worldwide
  57. - Give permissions to users from other instances
  58. - Take merge requests from users from other instances
  59. - Display user and repo links, icons, etc. for remote ones in the same way as
  60. for local ones, making the federation transparent. The UI doesn't
  61. differentiate between local and remote objects.
  62. ## Cases to Study
  63. - Gitolite
  64. - Darcsden
  65. - Gogs
  66. - Kallithea
  67. ## Implementation Plan
  68. - Storage backend which stores all the meta info (users, permissions, etc.)
  69. - Repo/file for configuration by server admin
  70. - Shell for SSH access, allowing commands according to permissions
  71. - Library which abstracts VCS access - [[/projects/repository]]
  72. - Library which implements all the data manipulations
  73. - Web API which wraps it
  74. - Some kind of UI - either web-based or desktop GUI, not critical right now
  75. ## User Stories
  76. ### Storage Sharing
  77. Using:
  78. - Alice has a repository R on instance S (denoted S.R)
  79. - Alice makes a local commit on her computer, using her local copy
  80. - Alice pushes the commit to the remote branch on instance S
  81. - Repository R has two read-only backup mirrors, on instances T and U
  82. - Either immediately after the push, or peiodically, S pushes the changes in
  83. S.R to the backup repositories T.R and U.R, which aren't necessarily visible
  84. publicly (maybe through some admin UI or backup mirror UI, not the main UI
  85. anyway)
  86. Choosing:
  87. - Alice creates a new repository R on instance S
  88. - S chooses, based on stats and the list of participating instances, two backup
  89. instances for R - instances T and U
  90. - Periodically, and/or when T or U have uptime or responsiveness problems, or
  91. they announce expected downtime, S may choose new backup instances, so that
  92. there are always two of them. When one or both are down, this period of lack
  93. of backup is minimized by detecting this and choosing new instances.
  94. # Distributed Search
  95. Today ways to find projects and be found are:
  96. - Have your project hosted in a proprietary centralized system like g1thu8
  97. - Have your project hosted in a free software system like gitorious, savannah
  98. - Run your own hosting platform with gitorious, kallithea, gogs, gitolite, etc.
  99. - Use a proprietary user-tracking centralized search engine, like g00gle
  100. - Use a distributed free software search engine, like Yacy
  101. Getting all the proprietary-ness, centralization and greed out of the picture,
  102. possible ideas are:
  103. - Make hosting platforms federate, i.e. each one has integrated project search
  104. and integration for merge requests, usernames, bugs, wiki pages etc. across
  105. instances over the network
  106. - Make distributed web search support semantic search and have instances
  107. communicate like social network nodes (e.g. Diaspora\* pods), which probably
  108. doesn't need a DHT
  109. - Have a shared vocabulary and API for dev platforms to provide info, use info,
  110. declare features and subsystems, etc.
  111. # Storage Distribution
  112. Should there be a clear concept of "hosting provider" in the traditional sense?
  113. In other words, should storage be completely distributed (i.e. a project
  114. doesn't belong to any specific hosting server), or should each project have a
  115. "home base"?
  116. From the user's point of view, it doesn't matter much in the technical sense.
  117. If a user can see project details using any instance, the actual storage
  118. location doesn't matter. But e.g. unlike with Tahoe LAFS, the information is
  119. public, so the LAFS concept and its overhead aren't needed here. It may not be
  120. critical to have a single specific upstream instance visible to the user, but
  121. in any case there should be several backup instances for each project.
  122. What about the regular version control system usage? Having no upstream
  123. instance means that things like `git clone` will be slower. The server will
  124. have to figure out the physical upstream, clone from it and stream the data to
  125. the requesting user client. Instead, if people *get the URL* of the physical
  126. upstream, they can clone it like always.
  127. Problem: What happens if physical upstream is down? How can people still work
  128. with the repo? Should they be able to push too, or just pull and clone?
  129. If a local GUI app handles detecting the functioning mirror, then no physical
  130. visible upstream is needed in the first place. Pulling only is trivial: Just
  131. detect the mirror and pull. Pushing means that once the upstream is down, one
  132. of the backups detects this and becomes the new upstream (there's a shared
  133. known protocol for choosing which backup it is). From that point on, users can
  134. work with it transparently.
  135. Question: Should I work on all these ideas, or should I focus on a *minimal*
  136. addition to the existing centralized model?
  137. Answer: Start minimal. In particular, it means storage sharing isn't needed.
  138. # Tasks
  139. - Is Yacy good for this? Can it do semantic search? How good is it today at
  140. finding independently hosted software projects?
  141. - Understand DHTs, examine existing ones
  142. - Write some basic ontology for project info
  143. - How do GNU Social, Pump.io and Diaspora\* instances federate?
  144. - Given semantic search, is DHT still a good option for collaboration features?
  145. If yes, it may be better to use it for search too than rely on
  146. general-purpose search
  147. - Is a general-purpose quadstore DHT possible? Does it make sense? Maybe it can
  148. a meta-store which just says "who knows what" or "who's online", and querying
  149. can be done using direct connections to the found nodes?
  150. - Maybe it's much easier and reusable to use an existing DHT based system! For
  151. example, GNUnet and I2P are general-purpose (e.g. see how file sharing and
  152. instant messaging work with them) and maybe Freenet is relevant too. Also
  153. cjdns. Make a list of candidates to examine.
  154. # Features and Ideas
  155. - Allow instances to community with other instances, for queries and commands
  156. - DHT for distributed access where needed
  157. - Fork projects from other instances
  158. - Send merge requests to projects in other instances
  159. - Report bugs to projects in other instances
  160. - Use GPG WoT for trust between users and/or between instances
  161. - In the future try to use distributed storage for repos, or at least for
  162. backup. For now, just let projects be hosted on specific hosts
  163. - Global semantic project search
  164. - Transparent federation: UI doesn't make you handle local/remote difference.
  165. It gets abstracted, like Diaspora and GNU Social transparently connect you to
  166. people from other instances
  167. - Global uniform username space. Maybe avoid using hosts in the name, so moving
  168. between hosts is easy.
  169. - Easy 1-click move of projects between instances, without breaking anything
  170. # Random
  171. A development platform may consist of many components, each providing a
  172. solution for a particular need. Afifon should rely on any specific combination,
  173. because the common choice of components is arbitrary. What Afifon can and
  174. should rely on, is *theory* behind the combination of components.
  175. Under application [[/projects]] and under the [[/projects/Kiwi]] ontologies, I
  176. have been working on general-purpose models for wikis, issue tracking,
  177. discussions and much more. There is still a lot of work to do on these models
  178. and on the deployment aspects of solutions based on them. At least for now,
  179. Afifon won't rely on these models, because it may take a lot of time for them
  180. to be ready.
  181. The initial Afifon will just use existing common practices. Examples for common
  182. components are: version controlled repository, wiki, issue tracker, mailing
  183. list, forum, generated manual and API reference.
  184. For the very beginning, only core features will be supported. No integrated
  185. issue tracking, no wiki, no discussions. Just the version control and code
  186. repository aspects.
  187. [[This|http://tsyesika.co.uk/u/federated-code-issue-hosting.html]] is a plan by
  188. Jessica Tallon, which at the time of writing is concerned with the federation
  189. messages. Afifon should also support things like DHT, distributed DNS, storage
  190. sharing, p2p, routing and more. GNUnet may be a good idea here.
  191. Collaborating on that project may be an awesome idea, especially the
  192. information model. Also, the JSON snippets. While [[/projects/Idan]] may be
  193. more readable and more writable for humans, it's much too complicated for the
  194. basic simple needs of machine communication. Machines don't need convenience
  195. features like references and generators. So I suppose that with some
  196. inspiration from [[!wikipedia JSON-LD]], I could define a mapping between JSON
  197. and [[/projects/Smaoin]]. Anyway, we'll see soon.