README.esync 11 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197
  1. This is eventfd-based synchronization, or 'esync' for short. Turn it on with
  2. WINEESYNC=1; debug it with +esync.
  3. == BUGS AND LIMITATIONS ==
  4. Please let me know if you find any bugs. If you can, also attach a log with
  5. +seh,+pid,+esync,+server,+timestamp.
  6. If you get something like "eventfd: Too many open files" and then things start
  7. crashing, you've probably run out of file descriptors. esync creates one
  8. eventfd descriptor for each synchronization object, and some games may use a
  9. large number of these. Linux by default limits a process to 4096 file
  10. descriptors, which probably was reasonable back in the nineties but isn't
  11. really anymore. (Fortunately Debian and derivatives [Ubuntu, Mint] already
  12. have a reasonable limit.) To raise the limit you'll want to edit
  13. /etc/security/limits.conf and add a line like
  14. * hard nofile 1048576
  15. then restart your session.
  16. On distributions using systemd, the settings in `/etc/security/limits.conf`
  17. will be overridden by systemd's own settings. If you run `ulimit -Hn` and it
  18. returns a lower number than the one you've previously set, then you can set
  19. DefaultLimitNOFILE=1048576
  20. in both `/etc/systemd/system.conf` and `/etc/systemd/user.conf`. You can then
  21. execute `sudo systemctl daemon-reexec` and restart your session. Check again
  22. with `ulimit -Hn` that the limit is correct.
  23. Also note that if the wineserver has esync active, all clients also must, and
  24. vice versa. Otherwise things will probably crash quite badly.
  25. == EXPLANATION ==
  26. The aim is to execute all synchronization operations in "user-space", that is,
  27. without going through wineserver. We do this using Linux's eventfd
  28. facility. The main impetus to using eventfd is so that we can poll multiple
  29. objects at once; in particular we can't do this with futexes, or pthread
  30. semaphores, or the like. The only way I know of to wait on any of multiple
  31. objects is to use select/poll/epoll to wait on multiple fds, and eventfd gives
  32. us those fds in a quite usable way.
  33. Whenever a semaphore, event, or mutex is created, we have the server, instead
  34. of creating a traditional server-side event/semaphore/mutex, instead create an
  35. 'esync' primitive. These live in esync.c and are very slim objects; in fact,
  36. they don't even know what type of primitive they are. The server is involved
  37. at all because we still need a way of creating named objects, passing handles
  38. to another process, etc.
  39. The server creates an eventfd file descriptor with the requested parameters
  40. and passes it back to ntdll. ntdll creates an object of the appropriate type,
  41. then caches it in a table. This table is copied almost wholesale from the fd
  42. cache code in server.c.
  43. Specific operations follow quite straightforwardly from eventfd:
  44. * To release an object, or set an event, we simply write() to it.
  45. * An object is signalled if read() succeeds on it. Notably, we create all
  46. eventfd descriptors with O_NONBLOCK, so that we can atomically check if an
  47. object is signalled and grab it if it is. This also lets us reset events.
  48. * For objects whose state should not be reset upon waiting—e.g. manual-reset
  49. events—we simply check for the POLLIN flag instead of reading.
  50. * Semaphores are handled by the EFD_SEMAPHORE flag. This matches up quite well
  51. (although with some difficulties; see below).
  52. * Mutexes store their owner thread locally. This isn't reliable information if
  53. a different process's thread owns the mutex, but this doesn't matter—a
  54. thread should only care whether it owns the mutex, so it knows whether to
  55. try waiting on it or simply to increase the recursion count.
  56. The interesting part about esync is that (almost) all waits happen in ntdll,
  57. including those on server-bound objects. The idea here is that on the server
  58. side, for any waitable object, we create an eventfd file descriptor (not an
  59. esync primitive), and then pass it to ntdll if the program tries to wait on
  60. it. These are cached too, so only the first wait will require a round trip to
  61. the server. Then the server signals the file descriptor as appropriate, and
  62. thereby wakes up the client. So far this is implemented for processes,
  63. threads, message queues (difficult; see below), and device managers (necessary
  64. for drivers to work). All of these are necessarily server-bound, so we
  65. wouldn't really gain anything by signalling on the client side instead. Of
  66. course, except possibly for message queues, it's not likely that any program
  67. (cutting-edge D3D game or not) is going to be causing a great wineserver load
  68. by waiting on any of these objects; the motivation was rather to provide a way
  69. to wait on ntdll-bound and server-bound objects at the same time.
  70. Some cases are still passed to the server, and there's probably no reason not
  71. to keep them that way. Those that I noticed while testing include: async
  72. objects, which are internal to the file APIs and never exposed to userspace,
  73. startup_info objects, which are internal to the loader and signalled when a
  74. process starts, and keyed events, which are exposed through an ntdll API
  75. (although not through kernel32) but can't be mixed with other objects (you
  76. have to use NtWaitForKeyedEvent()). Other cases include: named pipes, debug
  77. events, sockets, and timers. It's unlikely we'll want to optimize debug events
  78. or sockets (or any of the other, rather rare, objects), but it is possible
  79. we'll want to optimize named pipes or timers.
  80. There were two sort of complications when working out the above. The first one
  81. was events. The trouble is that (1) the server actually creates some events by
  82. itself and (2) the server sometimes manipulates events passed by the
  83. client. Resolving the first case was easy enough, and merely entailed creating
  84. eventfd descriptors for the events the same way as for processes and threads
  85. (note that we don't really lose anything this way; the events include
  86. "LowMemoryCondition" and the event that signals system processes to shut
  87. down). For the second case I basically had to hook the server-side event
  88. functions to redirect to esync versions if the event was actually an esync
  89. primitive.
  90. The second complication was message queues. The difficulty here is that X11
  91. signals events by writing into a pipe (at least I think it's a pipe?), and so
  92. as a result wineserver has to poll on that descriptor. In theory we could just
  93. let wineserver do so and then signal us as appropriate, except that wineserver
  94. only polls on the pipe when the thread is waiting for events (otherwise we'd
  95. get e.g. keyboard input while the thread is doing something else, and spin
  96. forever trying to wake up a thread that doesn't care). The obvious solution is
  97. just to poll on that fd ourselves, and that's what I did—it's just that
  98. getting the fd from wineserver was kind of ugly, and the code for waiting was
  99. also kind of ugly basically because we have to wait on both X11's fd and the
  100. "normal" process/thread-style wineserver fd that we use to signal sent
  101. messages. The upshot about the whole thing was that races are basically
  102. impossible, since a thread can only wait on its own queue.
  103. System APCs already work, since the server will forcibly suspend a thread if
  104. it's not already waiting, and so we just need to check for EINTR from
  105. poll(). User APCs and alertable waits are implemented in a similar style to
  106. message queues (well, sort of): whenever someone executes an alertable wait,
  107. we add an additional eventfd to the list, which the server signals when an APC
  108. arrives. If that eventfd gets signaled, we hand it off to the server to take
  109. care of, and return STATUS_USER_APC.
  110. Originally I kept the volatile state of semaphores and mutexes inside a
  111. variable local to the handle, with the knowledge that this would break if
  112. someone tried to open the handle elsewhere or duplicate it. It did, and so now
  113. this state is stored inside shared memory. This is of the POSIX variety, is
  114. allocated by the server (but never mapped there) and lives under the path
  115. "/wine-esync".
  116. There are a couple things that this infrastructure can't handle, although
  117. surprisingly there aren't that many. In particular:
  118. * Implementing wait-all, i.e. WaitForMultipleObjects(..., TRUE, ...), is not
  119. exactly possible the way we'd like it to be possible. In theory that
  120. function should wait until it knows all objects are available, then grab
  121. them all at once atomically. The server (like the kernel) can do this
  122. because the server is single-threaded and can't race with itself. We can't
  123. do this in ntdll, though. The approach I've taken I've laid out in great
  124. detail in the relevant patch, but for a quick summary we poll on each object
  125. until it's signaled (but don't grab it), check them all again, and if
  126. they're all signaled we try to grab them all at once in a tight loop, and if
  127. we fail on any of them we reset the count on whatever we shouldn't have
  128. consumed. Such a blip would necessarily be very quick.
  129. * The whole patchset only works on Linux, where eventfd is available. However,
  130. it should be possible to make it work on a Mac, since eventfd is just a
  131. quicker, easier way to use pipes (i.e. instead of writing 1 to the fd you'd
  132. write 1 byte; instead of reading a 64-bit value from the fd you'd read as
  133. many bytes as you can carry, which is admittedly less than 2**64 but
  134. can probably be something reasonable.) It's also possible, although I
  135. haven't yet looked, to use some different kind of synchronization
  136. primitives, but pipes would be easiest to tack onto this framework.
  137. * PulseEvent() can't work the way it's supposed to work. Fortunately it's rare
  138. and deprecated. It's also explicitly mentioned on MSDN that a thread can
  139. miss the notification for a kernel APC, so in a sense we're not necessarily
  140. doing anything wrong.
  141. There are some things that are perfectly implementable but that I just haven't
  142. done yet:
  143. * Other synchronizable server primitives. It's unlikely we'll need any of
  144. these, except perhaps named pipes (which would honestly be rather difficult)
  145. and (maybe) timers.
  146. * Access masks. We'd need to store these inside ntdll, and validate them when
  147. someone tries to execute esync operations.
  148. This patchset was inspired by Daniel Santos' "hybrid synchronization"
  149. patchset. My idea was to create a framework whereby even contended waits could
  150. be executed in userspace, eliminating a lot of the complexity that his
  151. synchronization primitives used. I do however owe some significant gratitude
  152. toward him for setting me on the right path.
  153. I've tried to maximize code separation, both to make any potential rebases
  154. easier and to ensure that esync is only active when configured. All code in
  155. existing source files is guarded with "if (do_esync())", and generally that
  156. condition is followed by "return esync_version_of_this_method(...);", where
  157. the latter lives in esync.c and is declared in esync.h. I've also tried to
  158. make the patchset very clear and readable—to write it as if I were going to
  159. submit it upstream. (Some intermediate patches do break things, which Wine is
  160. generally against, but I think it's for the better in this case.) I have cut
  161. some corners, though; there is some error checking missing, or implicit
  162. assumptions that the program is behaving correctly.
  163. I've tried to be careful about races. There are a lot of comments whose
  164. purpose are basically to assure me that races are impossible. In most cases we
  165. don't have to worry about races since all of the low-level synchronization is
  166. done by the kernel.
  167. Anyway, yeah, this is esync. Use it if you like.
  168. --Zebediah Figura