oggstream.html 20 KB


  1. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
  2. <html>
  3. <head>
  4. <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-15"/>
  5. <title>Ogg Documentation</title>
  6. <style type="text/css">
  7. body {
  8. margin: 0 18px 0 18px;
  9. padding-bottom: 30px;
  10. font-family: Verdana, Arial, Helvetica, sans-serif;
  11. color: #333333;
  12. font-size: .8em;
  13. }
  14. a {
  15. color: #3366cc;
  16. }
  17. img {
  18. border: 0;
  19. }
  20. #xiphlogo {
  21. margin: 30px 0 16px 0;
  22. }
  23. #content p {
  24. line-height: 1.4;
  25. }
  26. h1, h1 a, h2, h2 a, h3, h3 a {
  27. font-weight: bold;
  28. color: #ff9900;
  29. margin: 1.3em 0 8px 0;
  30. }
  31. h1 {
  32. font-size: 1.3em;
  33. }
  34. h2 {
  35. font-size: 1.2em;
  36. }
  37. h3 {
  38. font-size: 1.1em;
  39. }
  40. li {
  41. line-height: 1.4;
  42. }
  43. #copyright {
  44. margin-top: 30px;
  45. line-height: 1.5em;
  46. text-align: center;
  47. font-size: .8em;
  48. color: #888888;
  49. clear: both;
  50. }
  51. </style>
  52. </head>
  53. <body>
  54. <div id="xiphlogo">
  55. <a href="http://www.xiph.org/"><img src="fish_xiph_org.png" alt="Fish Logo and Xiph.org"/></a>
  56. </div>
  57. <h1>Ogg bitstream overview</h1>
  58. This document serves as starting point for understanding the design
  59. and implementation of the Ogg container format. If you're new to Ogg
  60. or merely want a high-level technical overview, start reading here.
  61. Other documents linked from the <a href="index.html">index page</a>
  62. give distilled technical descriptions and references of the container
  63. mechanisms. This document is intended to aid understanding.
  64. <h2>Container format design points</h2>
  65. <p>Ogg is intended to be a simplest-possible container, concerned only
  66. with framing, ordering, and interleave. It can be used as a stream delivery
  67. mechanism, for media file storage, or as a building block toward
  68. implementing a more complex, non-linear container (for example, see
  69. the <a href="skeleton.html">Skeleton</a> or <a
  70. href="http://en.wikipedia.org/wiki/Annodex">Annodex/CMML</a>).
  71. <p>The Ogg container is not intended to be a monolithic
  72. 'kitchen-sink'. It exists only to frame and deliver in-order stream
  73. data and as such is vastly simpler than most other containers.
  74. Elementary and multiplexed streams are both constructed entirely from a
  75. single building block (an Ogg page) comprised of eight fields
  76. totalling twenty-eight bytes (the page header) a list of packet lengths
  77. (up to 255 bytes) and payload data (up to 65025 bytes). The structure
  78. of every page is the same. There are no optional fields or alternate
  79. encodings.
  80. <p>Stream and media metadata is contained in Ogg and not built into
  81. the Ogg container itself. Metadata is thus compartmentalized and
  82. layered rather than part of a monolithic design, an especially good
  83. idea as no two groups seem able to agree on what a complete or
  84. complete-enough metadata set should be. In this way, the container and
  85. container implementation are isolated from unnecessary design flux.
  86. <h3>Streaming</h3>
  87. <p>The Ogg container is primarily a streaming format,
  88. encapsulating chronological, time-linear mixed media into a single
  89. delivery stream or file. The design is such that an application can
  90. always encode and/or decode all features of a bitstream in one pass
  91. with no seeking and minimal buffering. Seeking to provide optimized
  92. encoding (such as two-pass encoding) or interactive decoding (such as
  93. scrubbing or instant replay) is not disallowed or discouraged, however
  94. no container feature requires nonlinear access of the bitstream.
  95. <h3>Variable Bit Rate, Variable Payload Size</h3>
  96. <p>Ogg is designed to contain any size data payload with bounded,
  97. predictable efficiency. Ogg packets have no maximum size and a
  98. zero-byte minimum size. There is no restriction on size changes from
  99. packet to packet. Variable size packets do not require the use of any
  100. optional or additional container features. There is no optimal
  101. suggested packet size, though special consideration was paid to make
  102. sure 50-200 byte packets were no less efficient than larger packet
  103. sizes. The original design criteria was a 2% overhead at 50 byte
  104. packets, dropping to a maximum working overhead of 1% with larger
  105. packets, and a typical working overhead of .5-.7% for most practical
  106. uses.
  107. <h3>Simple pagination</h3>
  108. <p>Ogg is a byte-aligned container with no context-dependent, optional
  109. or variable-length fields. Ogg requires no repacking of codec data.
  110. The page structure is written out in-line as packet data is submitted
  111. to the streaming abstraction. In addition, it is possible to
  112. implement both Ogg mux and demux as MT-hot zero-copy abstractions (as
  113. is done in the Tremor sourcebase).
  114. <h3>Capture</h3>
  115. <p>Ogg is designed for efficient and immediate stream capture with
  116. high confidence. Although packets have no size limit in Ogg, pages
  117. are a maximum of just under 64kB meaning that any Ogg stream can be
  118. captured with confidence after seeing 128kB of data or less [worst
  119. case; typical figure is 6kB] from any random starting point in the
  120. stream.
  121. <h3>Seeking</h3>
  122. <p>Ogg implements simple coarse- and fine-grained seeking by design.
  123. <p>Coarse seeking may be performed by simply 'moving the tone arm' to a
  124. new position and 'dropping the needle'. Rapid capture with
  125. accompanying timecode from any location in an Ogg file is guaranteed
  126. by the stream design. From the acquisition of the first timecode,
  127. all data needed to play back from that time code forward is ahead of
  128. the stream cursor.
  129. <p>Ogg implements full sample-granularity seeking using an
  130. interpolated bisection search built on the capture and timecode
  131. mechanisms used by coarse seeking. As above, once a search finds
  132. the desired timecode, all data needed to play back from that time code
  133. forward is ahead of the stream cursor.
  134. <p>Both coarse and fine seeking use the page structure and sequencing
  135. inherent to the Ogg format. All Ogg streams are fully seekable from
  136. creation; seekability is unaffected by truncation or missing data, and
  137. is tolerant of gross corruption. Seek operations are neither 'fuzzy' nor
  138. heuristic.
  139. <p>Seeking without use of an index is a major point of the Ogg
  140. design. There are several reasons why Ogg forgoes an index:
  141. <ul>
  142. <li>It must be possible to create an Ogg stream in a single pass, and
  143. an index requires either two passes to create, or the index must be
  144. tacked onto the end of a live stream after the stream is finished.
  145. Both methods run afoul of other design constraints.
  146. <li>An index is only marginally useful in Ogg for the complexity
  147. added; it adds no new functionality and seldom improves performance
  148. noticeably. Empirical testing shows that indexless interpolation
  149. search does not require many more seeks in practice than using an
  150. index would.
  151. <li>'Optional' indexes encourage lazy implementations that can seek
  152. only when indexes are present, or that implement indexless seeking
  153. only by building an internal index after reading the entire file
  154. beginning to end. This has been the fate of other containers that
  155. specify optional indexing.
  156. </ul>
  157. <h3>Simple multiplexing</h3>
  158. <p>Ogg multiplexes streams by interleaving pages from multiple elementary streams into a
  159. multiplexed stream in time order. The multiplexed pages are not
  160. altered. Muxing an Ogg AV stream out of separate audio,
  161. video and data streams is akin to shuffling several decks of cards
  162. together into a single deck; the cards themselves remain unchanged.
  163. Demultiplexing is similarly simple (as the cards are marked).
  164. <p>The goal of this design is to make the mux/demux operation as
  165. trivial as possible to allow live streaming systems to build and
  166. rebuild streams on the fly with minimal CPU usage and no additional
  167. storage or latency requirements.
  168. <h3>Continuous and Discontinuous Media</h3>
  169. <p>Ogg streams belong to one of two categories, "Continuous" streams and
  170. "Discontinuous" streams.
  171. <p>A stream that provides a gapless, time-continuous media type with a
  172. fine-grained timebase is considered to be 'Continuous'. A continuous
  173. stream should never be starved of data. Examples of continuous data
  174. types include broadcast audio and video.
  175. <p>A stream that delivers data in a potentially irregular pattern or
  176. with widely spaced timing gaps is considered to be 'Discontinuous'. A
  177. discontinuous stream may be best thought of as data representing
  178. scattered events; although they happen in order, they are typically
  179. unconnected data often located far apart. One example of a
  180. discontinuous stream types would be captioning such as <a
  181. href="http://wiki.xiph.org/OggKate">Ogg Kate</a>. Although it's
  182. possible to design captions as a continuous stream type, it's most
  183. natural to think of captions as widely spaced pieces of text with
  184. little happening between.
  185. <p>The fundamental reason for distinction between continuous and
  186. discontinuous streams concerns buffering.
  187. <h3>Buffering</h3>
  188. <p>A continuous stream is, by definition, gapless. Ogg buffering is based
  189. on the simple premise of never allowing an active continuous stream
  190. to starve for data during decode; buffering works ahead until all
  191. continuous streams in a physical stream have data ready and no further.
  192. <p>Discontinuous stream data is not assumed to be predictable. The
  193. buffering design takes discontinuous data 'as it comes' rather than
  194. working ahead to look for future discontinuous data for a potentially
  195. unbounded period. Thus, the buffering process makes no attempt to fill
  196. discontinuous stream buffers; their pages simply 'fall out' of the
  197. stream when continuous streams are handled properly.
  198. <p>Buffering requirements in this design need not be explicitly
  199. declared or managed in the encoded stream. The decoder simply reads as
  200. much data as is necessary to keep all continuous stream types gapless
  201. and no more, with discontinuous data processed as it arrives in the
  202. continuous data. Buffering is implicitly optimal for the given
  203. stream. Because all pages of all data types are stamped with absolute
  204. timing information within the stream, inter-stream synchronization
  205. timing is always maintained without the need for explicitly declared
  206. buffer-ahead hinting.
  207. <h3>Codec metadata</h3>
  208. <p>Ogg does not replicate codec-specific metadata into the mux layer
  209. in an attempt to make the mux and codec layer implementations 'fully
  210. separable'. Things like specific timebase, keyframing strategy, frame
  211. duration, etc, do not appear in the Ogg container. The mux layer is,
  212. instead, expected to query a codec through a standardized interface,
  213. left to the implementation, for this data when it is needed.
  214. <p>Though modern design wisdom usually prefers to predict all possible
  215. needs of current and future codecs then embed these dependencies and
  216. the required metadata into the container itself, this strategy
  217. increases container specification complexity, fragility, and rigidity.
  218. The mux and codec implementations become more independent, but the
  219. specifications become less independent. A codec can't do what a
  220. container hasn't already provided for. New codecs are harder to
  221. support, and you can do fewer useful things with the ones you've
  222. already got (eg, try to make a good splitter without using any codecs.
  223. You're stuck splitting at keyframes only, or building yet another new
  224. mechanism into the container layer to mark what frames to skip
  225. displaying).
  226. <p>Ogg's design goes the opposite direction, where the specification
  227. is to be as simple, easy to understand, and 'proofed' against novel
  228. codecs as possible. When an Ogg mux layer requires codec-specific
  229. information, it queries the codec (or a codec stub). This trades a
  230. more complex implementation for a simpler, more flexible
  231. specification.
  232. <h3>Stream structure metadata</h3>
  233. <p>The Ogg container itself does not define a metadata system for
  234. declaring the structure and interrelations between multiple media
  235. types in a muxed stream. That is, the Ogg container itself does not
  236. specify data like 'which steam is the subtitle stream?' or 'which
  237. video stream is the primary angle?'. This metadata still exists, but
  238. is stored in the Ogg container rather than being built into the Ogg
  239. container. Xiph specifies the 'Skeleton' metadata format for Ogg
  240. streams, but this decoupling of container and stream structure
  241. metadata means it is possible to use Ogg with any metadata
  242. specification without altering the container itself, or without stream
  243. structure metadata at all.
  244. <h3>Frame accurate absolute position</h3>
  245. <p>Every Ogg page is stamped with a 64 bit 'granule position' that
  246. serves as an absolute timestamp for mux and seeking. A few nifty
  247. little tricks are usually also embedded in the granpos state, but
  248. we'll leave those aside for the moment (strictly speaking, they're
  249. part of each codec's mapping, not Ogg).
  250. <p>As previously mentioned above, granule positions are mapped into
  251. absolute timestamps by the codec, rather than being a hard timestamp.
  252. This allows maximally efficient use of the available 64 bits to
  253. address every sample/frame position without approximation while
  254. supporting new and previously unknown timebase encodings without
  255. needing to extend or update the mux layer. When a codec needs a novel
  256. timebase, it simply brings the code for that mapping along with it.
  257. This is not a theoretical curiosity; new, wholly novel timebases were
  258. deployed with the adoption of both Theora and Dirac. "Rolling INTRA"
  259. (keyframeless video) also benefits from novel use of the granule
  260. position.
  261. <h2>Ogg stream arrangement</h2>
  262. <h3>Packets, pages, and bitstreams</h3>
  263. <p>Ogg codecs use <em>packets</em>. Packets are octet payloads of
  264. raw, compressed data, containing the data needed for a single
  265. decompressed unit, eg, one video frame. Packets have no maximum size
  266. and may be zero length. They do not have any high-level structure or
  267. boundary information; strung together, the unframed packets form a
  268. <em>logical bitstream</em> of apparently random bytes with no internal
  269. landmarks.
  270. <p>Logical bitstream packets are grouped and framed into Ogg pages
  271. along with a unique stream <em>serial number</em> to produce a
  272. <em>physical bitstream</em>. An <em>elementary stream</em> is a
  273. physical bitstream containing only the pages framing a single logical
  274. bitstream. Each page is a self contained entity, although a packet may
  275. be split and encoded across one or more pages. The page decode
  276. mechanism is designed to recognize, verify and handle single pages at
  277. a time from the overall bitstream.
  278. <p><a href="framing.html">Ogg Bitstream Framing</a> specifies
  279. the page format of an Ogg bitstream, the packet coding process
  280. and elementary bitstreams in detail.
  281. <h3>Multiplexed bitstreams</h3>
  282. <p>Multiple logical/elementary bitstreams can be combined into a single
  283. <em>multiplexed bitstream</em> by interleaving whole pages from each
  284. contributing elementary stream in time order. The result is a single
  285. physical stream that multiplexes and frames multiple logical streams.
  286. Each logical stream is identified by the unique stream serial number
  287. stamped in its pages. A physical stream may include a 'meta-header'
  288. (such as the <a href="skeleton.html">Ogg Skeleton</a>) comprising its
  289. own Ogg page at the beginning of the physical stream. A decoder
  290. recovers the original logical/elementary bitstreams out of the
  291. physical bitstream by taking the pages in order from the physical
  292. bitstream and redirecting them into the appropriate logical decoding
  293. entity.
  294. <p><a href="ogg-multiplex.html">Ogg Bitstream Multiplexing</a> specifies
  295. proper multiplexing of an Ogg bitstream in detail.
  296. <h3>Chaining</h3>
  297. <p>Multiple Ogg physical bitstreams may be concatenated into a single new
  298. stream; this is <em>chaining</em>. The bitstreams do not overlap; the
  299. final page of a given logical bitstream is immediately followed by the
  300. initial page of the next.</p>
  301. <p>Each logical bitstream in a chain must have a unique serial number
  302. within the scope of the full physical bitstream, not only within a
  303. particular <em>link</em> or <em>segment</em> of the chain.</p>
  304. <h3>Continuous and discontinuous streams</h3>
  305. <p>Within Ogg, each stream must be declared (by the codec) to be
  306. continuous- or discontinuous-time. Most codecs treat all streams they
  307. use as either inherently continuous- or discontinuous-time, although
  308. this is not a requirement. A codec may, as part of its mapping, choose
  309. according to data in the initial header.
  310. <p>Continuous-time pages are stamped by end-time, discontinuous pages
  311. are stamped by begin-time. Pages in a multiplexed stream are
  312. interleaved in order of the time stamp regardless of stream type.
  313. Both continuous and discontinuous logical streams are used to seek
  314. within a physical stream, however only continuous streams are used to
  315. determine buffering depth; because discontinuous streams are stamped
  316. by start time, they will always 'fall out' in time when buffering
  317. tracks only the continuous streams. See 'Examples' for an
  318. illustration of the buffering mechanism.
  319. <h2>Mapping Requirements</h2>
  320. <p>Each codec is allowed some freedom in deciding how its logical
  321. bitstream is encapsulated into an Ogg bitstream (even if it is a
  322. trivial mapping, eg, 'plop the packets in and go'). This is the
  323. codec's <em>mapping</em>. Ogg imposes a few mapping requirements
  324. on any codec.
  325. <p>The <a href="framing.html">framing specification</a> defines
  326. 'beginning of stream' and 'end of stream' page markers via a header
  327. flag (it is possible for a stream to consist of a single page). A
  328. correct stream always consists of an integer number of pages, an easy
  329. requirement given the variable size nature of pages.</p>
  330. <p>The first page of an elementary Ogg bitstream consists of a single,
  331. small 'initial header' packet that must include sufficient information
  332. to identify the exact CODEC type. From this initial header, the codec
  333. must also be able to determine its timebase and whether or not it is a
  334. continuous- or discontinuous-time stream. The initial header must fit
  335. on a single page. If a codec makes use of auxiliary headers (for
  336. example, Vorbis uses two auxiliary headers), these headers must follow
  337. the initial header immediately. The last header finishes its page;
  338. data begins on a fresh page.
  339. <p>As an example, Ogg Vorbis places the name and revision of the
  340. Vorbis CODEC, the audio rate and the audio quality into this initial
  341. header. Comments and detailed codec setup appears in the larger
  342. auxiliary headers.</p>
  343. <h2>Multiplexing Requirements</h2>
  344. <p>Multiplexing requirements within Ogg are straightforward. When
  345. constructing a single-link (unchained) physical bitstream consisting
  346. of multiple elementary streams:
  347. <ol>
  348. <li> The initial header for each stream appears in sequence, each
  349. header on a single page. All initial headers must appear with no
  350. intervening data (no auxiliary header pages or packets, no data pages
  351. or packets). Order of the initial headers is unspecified. The
  352. 'beginning of stream' flag is set on each initial header.
  353. <li> All auxiliary headers for all streams must follow. Order
  354. is unspecified. The final auxiliary header of each stream must flush
  355. its page.
  356. <li>Data pages for each stream follow, interleaved in time order.
  357. <li>The final page of each stream sets the 'end of stream' flag.
  358. Unlike initial pages, terminal pages for the logical bitstreams need
  359. not occur contiguously; indeed it may not be possible for them to do so.
  360. </oL>
  361. <p>Each grouped bitstream must have a unique serial number within the
  362. scope of the physical bitstream.</p>
  363. <h3>chaining and multiplexing</h3>
  364. <p>Multiplexed and/or unmultiplexed bitstreams may be chained
  365. consecutively. Such a physical bitstream obeys all the rules of both
  366. chained and multiplexed streams. Each link, when unchained, must
  367. stand on its own as a valid physical bitstream. Chained streams do
  368. not mix; a new segment may not begin until all streams in the
  369. preceding segment have terminated. </p>
  370. <h2>Examples</h2>
  371. <em>[More to come shortly; this section is currently being revised and expanded]</em>
  372. <p>Below, we present an example of a multiplexed and chained bitstream:</p>
  373. <p><img src="stream.png" alt="stream"/></p>
  374. <p>In this example, we see pages from five total logical bitstreams
  375. multiplexed into a physical bitstream. Note the following
  376. characteristics:</p>
  377. <ol>
  378. <li>Multiplexed bitstreams in a given link begin together; all of the
  379. initial pages must appear before any data pages. When concurrently
  380. multiplexed groups are chained, the new group does not begin until all
  381. the bitstreams in the previous group have terminated.</li>
  382. <li>The ordering of pages of concurrently multiplexed bitstreams is
  383. goverened by timestamp (not shown here); there is no regular
  384. interleaving order. Pages within a logical bitstream appear in
  385. sequence order.</li>
  386. </ol>
  387. <div id="copyright">
  388. The Xiph Fish Logo is a
  389. trademark (&trade;) of Xiph.Org.<br/>
  390. These pages &copy; 1994 - 2010 Xiph.Org. All rights reserved.
  391. </div>
  392. </body>
  393. </html>