librsync  2.3.2
streaming.md
1 # Streaming API {#api_streaming}
2 
3 A key design requirement for librsync is that it should handle data as
4 and when the hosting application requires it. librsync can be used
5 inside applications that do non-blocking IO or filtering of network
6 streams, because it never does IO directly, or needs to block waiting
7 for data.
8 
9 Arbitrary-length input and output buffers are passed to the
10 library by the application, through an instance of ::rs_buffers_t. The
11 library proceeds as far as it can, and returns an ::rs_result value
12 indicating whether it needs more data or space.
13 
14 All the state needed by the library to resume processing when more
15 data is available is kept in a small opaque ::rs_job_t structure.
16 After creation of a job, repeated calls to rs_job_iter() in between
17 filling and emptying the buffers keeps data flowing through the
18 stream. The ::rs_result values returned may indicate
19 
20 - ::RS_DONE: processing is complete
21 - ::RS_BLOCKED: processing has blocked pending more data
22 - one of various possible errors in processing (see ::rs_result.)
23 
24 These can be converted to a human-readable string by rs_strerror().
25 
26 \note Smaller buffers have high relative handling costs. Application
27 performance will be improved by using buffers of at least 32kb or so
28 on each call.
29 
30 \sa \ref api_whole - Simpler but more limited interface than the streaming
31 interface.
32 
33 \sa \ref api_pull - Intermediate-complexity callback interface.
34 
35 \sa \ref api_callbacks - for reading from the basis file
36 when doing a "patch" operation.
37 
38 
39 ## Creating Jobs
40 
41 All streaming librsync jobs are initiated using a `_begin`
42 function to create a ::rs_job_t object, passing in any necessary
43 initialization parameters. The various jobs available are:
44 
45 - rs_sig_begin(): Calculate the signature of a file.
46 - rs_loadsig_begin(): Load a signature into memory.
47 - rs_delta_begin(): Calculate the delta between a signature and a new
48 file.
49 - rs_patch_begin(): Apply a delta to a basis to recreate the new
50 file.
51 
52 Additionally, the following helper functions can be used to get the
53 recommended signature arguments from the input file's size.
54 
55 - rs_sig_args(): Get the recommended sigature arguments from the file size.
56 
57 After a signature has been loaded, before it can be used to calculate a delta,
58 the hashtable needs to be initialized by calling
59 
60 - rs_build_hash_table(): Initialized the signature hashtable.
61 
62 The patch job accepts the patch as input, and uses a callback to look up
63 blocks within the basis file.
64 
65 You must configure read, write and basis callbacks after creating the
66 job but before it is run.
67 
68 
69 ## Running Jobs
70 
71 The work of the operation is done when the application calls
72 rs_job_iter(). This includes reading from input files via the callback,
73 running the rsync algorithms, and writing output.
74 
75 The IO callbacks are only called from inside rs_job_iter(). If any of
76 them return an error, rs_job_iter() will generally return the same error.
77 
78 When librsync needs to do input or output, it calls one of the callback
79 functions. rs_job_iter() returns when the operation has completed or
80 failed, or when one of the IO callbacks has blocked.
81 
82 rs_job_iter() will usually be called in a loop, perhaps alternating
83 librsync processing with other application functions.
84 
85 
86 ## Deleting Jobs
87 
88 A job is deleted and its memory freed up using rs_job_free().
89 
90 This is typically called when the job has completed or failed. It can be
91 called earlier if the application decides it wants to cancel
92 processing.
93 
94 rs_job_free() does not delete the output of the job, such as the sumset
95 loaded into memory. It does delete the job's statistics.
96 
97 
98 ## State Machine Internals
99 
100 Internally, the operations are implemented as state machines that move
101 through various states as input and output buffers become available.
102 
103 All computers and programs are state machines. So why is the
104 representation as a state machine a little more explicit (and perhaps
105 verbose) in librsync than other places? Because we need to be able to
106 let the real computer go off and do something else like waiting for
107 network traffic, while still remembering where it was in the librsync
108 state machine.
109 
110 librsync will never block waiting for IO, unless the callbacks do
111 that.
112 
113 The current state is represented by the private field
114 ::rs_job_t::statefn, which points to a function with a name like
115 `rs_OPERATION_s_STATE`. Every time librsync tries to make progress,
116 it will call this function.
117 
118 The state function returns one of the ::rs_result values. The
119 most important values are
120 
121  * ::RS_DONE: Completed successfully.
122 
123  * ::RS_BLOCKED: Cannot make further progress at this point.
124 
125  * ::RS_RUNNING: The state function has neither completed nor blocked but
126  wants to be called again. **XXX**: Perhaps this should be removed?
127 
128 States need to correspond to suspension points. The only place the
129 job can resume after blocking is at the entry to a state function.
130 
131 Therefore states must be "all or nothing" in that they can either
132 complete, or restart without losing information.
133 
134 Basically every state needs to work from one input buffer to one
135 output buffer.
136 
137 States should never generally return ::RS_DONE directly. Instead, they
138 should call rs__job_done(), which sets the state function to
139 rs__s_done(). This makes sure that any pending output is flushed out
140 before ::RS_DONE is returned to the application.