librsync  2.3.2
buffer_internals.md
1 # Buffer internals {#buffer_internals}
2 
3 ## Input scoop
4 
5 A module called the *scoop* is used for buffering data going into
6 librsync. It accumulates data when the application does not supply it
7 in large enough chunks for librsync to make use of it.
8 
9 The scoop object is a set of fields in the rs_job_t object::
10 
11  char *scoop_buf; /* the allocation pointer */
12  size_t scoop_alloc; /* the allocation size */
13  size_t scoop_avail; /* the data size */
14 
15 Data from the read callback always goes into the scoop buffer.
16 
17 The state functions call rs__scoop_read when they need some input
18 data. If the read callback blocks, it might take multiple attempts
19 before it can be filled. Each time, the state function will also need
20 to block, and then be reawakened by the library.
21 
22 Once the scoop has been sufficiently filled, it must be completely
23 consumed by the state function. This is easy if the state function
24 always requests one unit of work at a time: a block, a file header
25 element, etc.
26 
27 All this means that the valid data is always located at the start of
28 the scoop, continuing for scoop_avail bytes. The library is never
29 allowed to consume only part of the data.
30 
31 One the state function has consumed the data, it should call
32 rs__scoop_reset(), which resets scoop_avail to 0.
33 
34 
35 ## Output queue
36 
37 The library can set up data to be written out by putting a
38 pointer/length for it in the output queue::
39 
40  char *outq_ptr;
41  size_t outq_bytes;
42 
43 The job infrastructure will make sure this is written out before the
44 next call into the state machine.
45 
46 There is only one outq_ptr, so any given state function can only
47 produce one contiguous block of output.
48 
49 
50 ## Buffer sharing
51 
52 The scoop buffer may be used by the output queue. This means that
53 data can traverse the library with no extra copies: one copy into the
54 scoop buffer, and one copy out. In this case outq_ptr points into
55 scoop_buf, and outq_bytes tells how much data needs to be written.
56 
57 The state function calls rs__scoop_reset before returning when it is
58 finished with the data in the scoop. However, the outq may still
59 point into the scoop buffer, if it has not yet been able to be copied
60 out. This means that there is data in the scoop beyond scoop_avail
61 that must still be retained.
62 
63 This is safe because neither the scoop nor the state function will
64 get to run before the output queue has completely drained.
65 
66 
67 ## Readahead
68 
69 How much readahead is required?
70 
71 At the moment (??) our rollsum and MD4 routines require a full
72 contiguous block to calculate a checksum. This could be relaxed, at a
73 possible loss of efficiency.
74 
75 So calculating block checksums requires one full block to be in
76 memory.
77 
78 When applying a patch, we only need enough readahead to unpack the
79 command header.
80 
81 When calculating a delta, we need a full block to calculate its
82 checksum, plus space for the missed data. We can accumulate any
83 amount of missed data before emitting it as a literal; the more we can
84 accumulate the more compact the encoding will be.