tkrzw module¶
Python Binding of Tkrzw¶
Introduction¶
DBM (Database Manager) is a concept to store an associative array on a permanent storage. In other words, DBM allows an application program to store key-value pairs in a file and reuse them later. Each of keys and values is a string or a sequence of bytes. A key must be unique within the database and a value is associated to it. You can retrieve a stored record with its key very quickly. Thanks to simple structure of DBM, its performance can be extremely high.
Tkrzw is a library implementing DBM with various algorithms. It features high degrees of performance, concurrency, scalability and durability. The following data structures are provided.
- HashDBM : File datatabase manager implementation based on hash table.
- TreeDBM : File datatabase manager implementation based on B+ tree.
- SkipDBM : File datatabase manager implementation based on skip list.
- TinyDBM : On-memory datatabase manager implementation based on hash table.
- BabyDBM : On-memory datatabase manager implementation based on B+ tree.
- CacheDBM : On-memory datatabase manager implementation with LRU deletion.
- StdHashDBM : On-memory DBM implementations using std::unordered_map.
- StdTreeDBM : On-memory DBM implementations using std::map.
Whereas Tkrzw is C++ library, this package provides its Python interface. All above data structures are available via one adapter class “DBM
”. Read the homepage for details.
DBM stores key-value pairs of strings. Each string is represented as bytes in Python. You can specify any type of objects as keys and values if they can be converted into strings, which are “encoded” into bytes. When you retreive the value of a record, the type is determined according to the method: Get for bytes, GetStr for string, or [] for the same type as the key.
Symbols of the module “tkrzw” should be imported in each source file of application programs.:
import tkrzw
An instance of the class “DBM
” is used in order to handle a database. You can store, delete, and retrieve records with the instance. The result status of each operation is represented by an object of the class “Status
”. Iterator to access access each record is implemented by the class “Iterator
”.
Installation¶
Install the latest version of Tkrzw beforehand and get the package of the Python binding of Tkrzw. Python 3.6 or later is required to use this package.
Enter the directory of the extracted package then perform installation. If your system has the another command except for the “python3” command, edit the Makefile beforehand.:
make
make check
sudo make install
Example¶
The following code is a typical example to use a database. A DBM object can be used like a dictionary object. As DBM implements the generic iterator protocol, you can access each record with the “for” loop.:
import tkrzw
# Prepares the database.
dbm = tkrzw.DBM()
dbm.Open("casket.tkh", True, truncate=True, num_buckets=100)
# Sets records.
# If the operation fails, a runtime exception is raised.
# Keys and values are implicitly converted into bytes.
dbm["first"] = "hop"
dbm["second"] = "step"
dbm["third"] = "jump"
# Retrieves record values.
# If the operation fails, a runtime exception is raised.
# Retrieved values are strings if keys are strings.
print(dbm["first"])
print(dbm["second"])
print(dbm["third"])
try:
print(dbm["fourth"])
except tkrzw.StatusException as e:
print(repr(e))
# Traverses records.
# Retrieved keys and values are always bytes so we decode them.
for key, value in dbm:
print(key.decode(), value.decode())
# Closes the database.
dbm.Close()
The following code is a more complex example. Resources of DBM and Iterator are bound to their objects so when the refenrece count becomes zero, resources are released. Even if the database is not closed, the destructor closes it implicitly. The method “OrDie” throws an exception on failure so it is useful for checking errors.:
import tkrzw
# Prepares the database.
# Options are given by dictionary expansion.
# All methods except for [] and []= don't raise exceptions.
dbm = tkrzw.DBM()
open_params = {
"max_page_size": 4080,
"max_branches": 256,
"key_comparator": "decimal",
"concurrent": True,
"truncate": True,
}
status = dbm.Open("casket.tkt", True, **open_params)
if not status.IsOK():
raise tkrzw.StatusException(status)
# Sets records.
# The method OrDie raises a runtime error on failure.
dbm.Set(1, "hop").OrDie()
dbm.Set(2, "step").OrDie()
dbm.Set(3, "jump").OrDie()
# Retrieves records without checking errors.
# On failure, the return value is None.
print(dbm.GetStr(1))
print(dbm.GetStr(2))
print(dbm.GetStr(3))
print(dbm.GetStr(4))
# To know the status of retrieval, give a status object to Get.
# You can compare a status object and a status code directly.
status = tkrzw.Status()
value = dbm.GetStr(1, status)
print("status: " + str(status))
if status == tkrzw.Status.SUCCESS:
print("value: " + value)
# Rebuilds the database.
# Almost the same options as the Open method can be given.
dbm.Rebuild(align_pow=0, max_page_size=1024).OrDie()
# Traverses records with an iterator.
it = dbm.MakeIterator()
it.First()
while True:
status = tkrzw.Status()
record = it.GetStr(status)
if not status.IsOK():
break
print(record[0], record[1])
it.Next()
# Closes the database.
dbm.Close()
-
class
tkrzw.
DBM
[source]¶ Bases:
object
Polymorphic database manager.
All operations except for Open and Close are thread-safe; Multiple threads can access the same database concurrently. You can specify a data structure when you call the Open method. Every opened database must be closed explicitly by the Close method to avoid data corruption.
-
Append
(key, value, delim='')[source]¶ Appends data at the end of a record of a key.
Parameters: - key – The key of the record.
- value – The value to append.
- delim – The delimiter to put after the existing record.
Returns: The result status.
If there’s no existing record, the value is set without the delimiter.
-
CompareExchange
(key, expected, desired)[source]¶ Compares the value of a record and exchanges if the condition meets.
Parameters: - key – The key of the record.
- expected – The expected value.
- desired – The desired value. If it is None, the record is to be removed.
Returns: The result status.
If the record doesn’t exist, NOT_FOUND_ERROR is returned. If the existing value is different from the expected value, DUPLICATION_ERROR is returned. Otherwise, the desired value is set.
-
CopyFile
(dest_path)[source]¶ Copies the content of the database file to another file.
Parameters: dest_path – A path to the destination file. Returns: The result status.
-
Count
()[source]¶ Gets the number of records.
Returns: The number of records on success, or None on failure.
-
Export
(dest_dbm)[source]¶ Exports all records to another database.
Parameters: dest_dbm – The destination database. Returns: The result status.
-
ExportKeysAsLines
(dest_path)[source]¶ Exports the keys of all records as lines to a text file.
Parameters: dest_path – A path of the output text file. Returns: The result status.
-
Get
(key, status=None)[source]¶ Gets the value of a record of a key.
Parameters: - key – The key of the record.
- status – A status object to which the result status is assigned. It can be omitted.
Returns: The bytes value of the matching record or None on failure.
-
GetFilePath
()[source]¶ Gets the path of the database file.
Returns: The file path of the database, or None on failure.
-
GetFileSize
()[source]¶ Gets the current file size of the database.
Returns: The current file size of the database, or None on failure.
-
GetMulti
(*keys)[source]¶ Gets the values of multiple records of keys.
Parameters: keys – The keys of records to retrieve. Returns: A map of retrieved records. Keys which don’t match existing records are ignored.
-
GetMultiStr
(*keys)[source]¶ Gets the values of multiple records of keys, as strings.
Parameters: keys – The keys of records to retrieve. Returns: A map of retrieved records. Keys which don’t match existing records are ignored.
-
GetStr
(key, status=None)[source]¶ Gets the value of a record of a key, as a string.
Parameters: - key – The key of the record.
- status – A status object to which the result status is assigned. It can be omitted.
Returns: The string value of the matching record or None on failure.
-
Increment
(key, inc=1, init=0, status=None)[source]¶ Increments the numeric value of a record.
Parameters: - key – The key of the record.
- inc – The incremental value. If it is Utility.INT64MIN, the current value is not changed and a new record is not created.
- init – The initial value.
- status – A status object to which the result status is assigned. It can be omitted.
Returns: The current value, or None on failure.
The record value is stored as an 8-byte big-endian integer. Negative is also supported.
-
IsHealthy
()[source]¶ Checks whether the database condition is healthy.
Returns: True if the database condition is healthy, or false if not.
-
IsOpen
()[source]¶ Checks whether the database is open.
Returns: True if the database is open, or false if not.
-
IsOrdered
()[source]¶ Checks whether ordered operations are supported.
Returns: True if ordered operations are supported, or false if not.
-
Open
(path, writable, **params)[source]¶ Opens a database file.
Parameters: - path – A path of the file.
- writable – If true, the file is writable. If false, it is read-only.
- params – Optional parameters.
Returns: The result status.
- The extension of the path indicates the type of the database.
- .tkh : File hash database (HashDBM)
- .tkt : File tree database (TreeDBM)
- .tks : File skip database (SkipDBM)
- .tkmt : On-memory hash database (TinyDBM)
- .tkmb : On-memory tree database (BabyDBM)
- .tkmc : On-memory cache database (CacheDBM)
- .tksh : On-memory STL hash database (StdHashDBM)
- .tkst : On-memory STL tree database (StdTreeDBM)
The optional parameters can include an option for the concurrency tuning. By default, database operatins are done under the GIL (Global Interpreter Lock), which means that database operations are not done concurrently even if you use multiple threads. If the “concurrent” parameter is true, database operations are done outside the GIL, which means that database operations can be done concurrently if you use multiple threads. However, the downside is that swapping thread data is costly so the actual throughput is often worse in the concurrent mode than in the normal mode. Therefore, the concurrent mode should be used only if the database is huge and it can cause blocking of threads in multi-thread usage.
- The optional parameters can include options for the file opening operation.
- truncate (bool): True to truncate the file.
- no_create (bool): True to omit file creation.
- no_wait (bool): True to fail if the file is locked by another process.
- no_lock (bool): True to omit file locking.
The optional parameter “dbm” supercedes the decision of the database type by the extension. The value is the type name: “HashDBM”, “TreeDBM”, “SkipDBM”, “TinyDBM”, “BabyDBM”, “CacheDBM”, “StdHashDBM”, “StdTreeDBM”.
- For HashDBM, these optional parameters are supported.
- update_mode (string): How to update the database file: “UPDATE_IN_PLACE” for the in-palce and “UPDATE_APPENDING” for the appending mode.
- offset_width (int): The width to represent the offset of records.
- align_pow (int): The power to align records.
- num_buckets (int): The number of buckets for hashing.
- fbp_capacity (int): The capacity of the free block pool.
- lock_mem_buckets (bool): True to lock the memory for the hash buckets.
- For TreeDBM, all optional parameters for HashDBM are available. In addition, these optional parameters are supported.
- max_page_size (int): The maximum size of a page.
- max_branches (int): The maximum number of branches each inner node can have.
- max_cached_pages (int): The maximum number of cached pages.
- key_comparator (string): The comparator of record keys: “LexicalKeyComparator” for the lexical order, “LexicalCaseKeyComparator” for the lexical order ignoring case, “DecimalKeyComparator” for the order of the decimal integer numeric expressions, “HexadecimalKeyComparato” for the order of the hexadecimal integer numeric expressions, “RealNumberKeyComparator” for the order of the decimal real number expressions.
- For SkipDBM, these optional parameters are supported.
- offset_width (int): The width to represent the offset of records.
- step_unit (int): The step unit of the skip list.
- max_level (int): The maximum level of the skip list.
- sort_mem_size (int): The memory size used for sorting to build the database in the at-random mode.
- insert_in_order (bool): If true, records are assumed to be inserted in ascending order of the key.
- max_cached_records (int): The maximum number of cached records.
- For TinyDBM, these optional parameters are supported.
- num_buckets (int): The number of buckets for hashing.
- For BabyDBM, these optional parameters are supported.
- key_comparator (string): The comparator of record keys. The same ones as TreeDBM.
- For CacheDBM, these optional parameters are supported.
- cap_rec_num (int): The maximum number of records.
- cap_mem_size (int): The total memory size to use.
If the optional parameter “num_shards” is set, the database is sharded into multiple shard files. Each file has a suffix like “-00003-of-00015”. If the value is 0, the number of shards is set by patterns of the existing files, or 1 if they doesn’t exist.
-
Rebuild
(**params)[source]¶ Rebuilds the entire database.
Parameters: params – Optional parameters. Returns: The result status. The optional parameters are the same as the Open method. Omitted tuning parameters are kept the same or implicitly optimized.
-
Remove
(key)[source]¶ Removes a record of a key.
Parameters: key – The key of the record. Returns: The result status.
-
RemoveAndGet
(key)[source]¶ Removes a record and get the value.
Parameters: key – The key of the record. Returns: A pair of the result status and the record value. If the record does not exist, None is assigned as the value. If not None, the type of the returned value is the same as the parameter key.
-
Search
(mode, pattern, capacity=0, utf=False)[source]¶ Searches the database and get keys which match a pattern.
Parameters: - mode – The search mode. “contain” extracts keys containing the pattern. “begin” extracts keys beginning with the pattern. “end” extracts keys ending with the pattern. “regex” extracts keys partially matches the pattern of a regular expression. “edit” extracts keys whose edit distance to the pattern is the least.
- pattern – The pattern for matching.
- capacity – The maximum records to obtain. 0 means unlimited.
- utf – If true, text is treated as UTF-8, which affects “regex” and “edit”.
Returns: A list of keys matching the condition.
-
Set
(key, value, overwrite=True)[source]¶ Sets a record of a key and a value.
Parameters: - key – The key of the record.
- value – The value of the record.
- overwrite – Whether to overwrite the existing value. It can be omitted and then false is set.
Returns: The result status.
-
SetAndGet
(key, value, overwrite=True)[source]¶ Sets a record and get the old value.
Parameters: - key – The key of the record.
- value – The value of the record.
- overwrite – Whether to overwrite the existing value if there’s a record with the same key. If true, the existing value is overwritten by the new value. If false, the operation is given up and an error status is returned.
Returns: A pair of the result status and the old value. If the record has not existed when inserting the new record, None is assigned as the value. If not None, the type of the returned old value is the same as the parameter value.
-
SetMulti
(**records)[source]¶ Sets multiple records of the keyword arguments.
Parameters: records – Records to store. Existing records with the same keys are overwritten. Returns: The result status.
-
ShouldBeRebuilt
()[source]¶ Checks whether the database should be rebuilt.
Returns: True to be optimized or false with no necessity.
-
Synchronize
(hard, **params)[source]¶ Synchronizes the content of the database to the file system.
Parameters: - hard – True to do physical synchronization with the hardware or false to do only logical synchronization with the file system.
- params – Optional parameters.
Only SkipDBM uses the optional parameters. The “merge” parameter specifies paths of databases to merge, separated by colon. The “reducer” parameter specifies the reducer to apply to records of the same key. “ReduceToFirst”, “ReduceToSecond”, “ReduceToLast”, etc are supported.
-
-
class
tkrzw.
Iterator
(dbm)[source]¶ Bases:
object
Iterator for each record.
-
First
()[source]¶ Initializes the iterator to indicate the first record.
Returns: The result status. Even if there’s no record, the operation doesn’t fail.
-
Get
(status=None)[source]¶ Gets the key and the value of the current record of the iterator.
Parameters: status – A status object to which the result status is assigned. It can be omitted. Returns: A tuple of the bytes key and the bytes value of the current record. On failure, None is returned.
-
GetKey
(status=None)[source]¶ Gets the key of the current record.
Parameters: status – A status object to which the result status is assigned. It can be omitted. Returns: The bytes key of the current record or None on failure.
-
GetKeyStr
(status=None)[source]¶ Gets the key of the current record, as a string.
Parameters: status – A status object to which the result status is assigned. It can be omitted. Returns: The string key of the current record or None on failure.
-
GetStr
(status=None)[source]¶ Gets the key and the value of the current record of the iterator, as strings.
Parameters: status – A status object to which the result status is assigned. It can be omitted. Returns: A tuple of the string key and the string value of the current record. On failure, None is returned.
-
GetValue
(status=None)[source]¶ Gets the value of the current record.
Parameters: status – A status object to which the result status is assigned. It can be omitted. Returns: The bytes value of the current record or None on failure.
-
GetValueStr
(status=None)[source]¶ Gets the value of the current record, as a string.
Parameters: status – A status object to which the result status is assigned. It can be omitted. Returns: The string value of the current record or None on failure.
-
Jump
(key)[source]¶ Initializes the iterator to indicate a specific record.
Parameters: key – The key of the record to look for. Returns: The result status. Ordered databases can support “lower bound” jump; If there’s no record with the same key, the iterator refers to the first record whose key is greater than the given key. The operation fails with unordered databases if there’s no record with the same key.
-
JumpLower
(key, inclusive=False)[source]¶ Initializes the iterator to indicate the last record whose key is lower than a given key.
Parameters: - key – The key to compare with.
- inclusive – If true, the considtion is inclusive: equal to or lower than the key.
Returns: The result status.
Even if there’s no matching record, the operation doesn’t fail. This method is suppoerted only by ordered databases.
-
JumpUpper
(key, inclusive=False)[source]¶ Initializes the iterator to indicate the first record whose key is upper than a given key.
Parameters: - key – The key to compare with.
- inclusive – If true, the considtion is inclusive: equal to or upper than the key.
Returns: The result status.
Even if there’s no matching record, the operation doesn’t fail. This method is suppoerted only by ordered databases.
-
Last
()[source]¶ Initializes the iterator to indicate the last record.
Returns: The result status. Even if there’s no record, the operation doesn’t fail. This method is suppoerted only by ordered databases.
-
Next
()[source]¶ Moves the iterator to the next record.
Returns: The result status. If the current record is missing, the operation fails. Even if there’s no next record, the operation doesn’t fail.
-
-
class
tkrzw.
Status
(code=0, message='')[source]¶ Bases:
object
Status of operations.
-
APPLICATION_ERROR
= 12¶ Generic error caused by the application logic.
-
BROKEN_DATA_ERROR
= 11¶ Error that internal data are broken.
-
CANCELED_ERROR
= 6¶ Error that the operation is canceled.
-
DUPLICATION_ERROR
= 10¶ Error that a specific resource is duplicated.
-
INFEASIBLE_ERROR
= 9¶ Error that the operation is infeasible.
-
INVALID_ARGUMENT_ERROR
= 5¶ Error that a given argument is invalid.
-
IsOK
()[source]¶ Returns true if the status is success.
Returns: True if the status is success, or False on failure.
-
NOT_FOUND_ERROR
= 7¶ Error that a specific resource is not found.
-
NOT_IMPLEMENTED_ERROR
= 3¶ Error that the feature is not implemented.
-
OrDie
()[source]¶ Raises an exception if the status is not success.
Raises: StatusException – An exception containing the status object.
-
PERMISSION_ERROR
= 8¶ Error that the operation is not permitted.
-
PRECONDITION_ERROR
= 4¶ Error that a precondition is not met.
-
SUCCESS
= 0¶ Success.
-
SYSTEM_ERROR
= 2¶ Generic error from underlying systems.
-
Set
(code=0, message='')[source]¶ Sets the code and the message.
Parameters: - code – The status code. This can be omitted and then SUCCESS is set.
- message – An arbitrary status message. This can be omitted and the an empty string is set.
-
UNKNOWN_ERROR
= 1¶ Generic error whose cause is unknown.
-
-
exception
tkrzw.
StatusException
(status)[source]¶ Bases:
RuntimeError
Exception to convey the status of operations.
-
class
tkrzw.
TextFile
[source]¶ Bases:
object
Text file of line data.
DBM#ExportKeysAsLines outputs keys of the database into a text file. Scanning the text file is more efficient than scanning the whole database.
-
Open
(path)[source]¶ Opens a text file.
Parameters: path – A path of the file. Returns: The result status.
-
Search
(mode, pattern, capacity=0, utf=False)[source]¶ Searches the text file and get lines which match a pattern.
Parameters: - mode – The search mode. “contain” extracts lines containing the pattern. “begin” extracts lines beginning with the pattern. “end” extracts lines ending with the pattern. “regex” extracts lines partially matches the pattern of a regular expression. “edit” extracts lines whose edit distance to the pattern is the least.
- pattern – The pattern for matching.
- capacity – The maximum records to obtain. 0 means unlimited.
- utf – If true, text is treated as UTF-8, which affects “regex” and “edit”.
Returns: A list of lines matching the condition.
-
-
class
tkrzw.
Utility
[source]¶ Bases:
object
Library utilities.
-
classmethod
EditDistanceLev
(a, b)[source]¶ Gets the Levenshtein edit distance of two Unicode strings.
Parameters: - a – A Unicode string.
- b – The other Unicode string.
Returns: The Levenshtein edit distance of the two strings.
-
classmethod
GetMemoryUsage
()[source]¶ Gets the current memory usage of the process.
Returns: The current memory usage of the process.
-
INT32MAX
= 2147483647¶ The maximum value of int32.
-
INT32MIN
= -2147483648¶ The minimum value of int32.
-
INT64MAX
= 9223372036854775807¶ The maximum value of int64.
-
INT64MIN
= -9223372036854775808¶ The minimum value of int64.
-
classmethod
PrimaryHash
(data, num_buckets=None)[source]¶ Primary hash function for the hash database.
Parameters: - data – The data to calculate the hash value for.
- num_buckets – The number of buckets of the hash table. If it is omitted, 1<<64 is set.
Returns: The hash value.
-
classmethod
SecondaryHash
(data, num_shards=None)[source]¶ Secondary hash function for sharding.
Parameters: - data – The data to calculate the hash value for.
- num_shards – The number of shards. If it is omitted, 1<<64 is set.
Returns: The hash value.
-
UINT32MAX
= 4294967295¶ The maximum value of uint32.
-
UINT64MAX
= 18446744073709551615¶ The maximum value of uint64.
-
VERSION
= '0.0.0'¶ The package version numbers.
-
classmethod