BelNytheraSeiche.TrieDictionary
A collection of high-performance, memory-efficient dictionary and data storage classes for .NET, specializing in advanced search capabilities.
Features
- Multiple High-Performance Dictionary Implementations:
- Mutable: Double-Array Trie (
DoubleArrayDictionary
) - Read-Only & Memory-Efficient: LOUDS Trie (
LoudsDictionary
)
- Mutable: Double-Array Trie (
- Advanced Search Capabilities:
- Exact match, prefix search, common prefix search, longest prefix search, and wildcard search.
- Supports both Left-to-Right (LTR) and Right-to-Left (RTL) search directions.
- Flexible Record Storage:
- Each key can map to a list of records.
- Supports both persistent (serializable) and transient (in-memory) record lists for each key.
- Low-Level Storage Primitives:
- Includes various record store backends like
PrimitiveRecordStore
and several balanced binary trees (AVLTree
,AATree
, etc.).
- Includes various record store backends like
- Serialization Support:
- Robust serialization and deserialization for all dictionary and store types.
- Utility Classes:
- Includes high-performance utilities like
ValueBuffer<T>
, rank/select bit-set and fast random number generators.
- Includes high-performance utilities like
Installation
This library is available on NuGet. You can install it using the .NET CLI:
dotnet add package BelNytheraSeiche.TrieDictionary
Or via the NuGet Package Manager Console:
Install-Package BelNytheraSeiche.TrieDictionary
Quick Start
Here's a quick example of how to use the DoubleArrayDictionary
, which is a mutable (add/remove capable) dictionary.
using System;
using System.Text;
using BelNytheraSeiche.TrieDictionary;
// 1. Create a new dictionary.
var dict = KeyRecordDictionary.Create<DoubleArrayDictionary>();
// Utility: Create a wrapper, it converts string to byte-array automatically.
var ssdict = dict.AsStringSpecialized(Encoding.UTF8);
// 2. Add keys.
Console.WriteLine("Adding keys...");
ssdict.Add("apple"); // or dict.Add(Encoding.UTF8.GetBytes("apple"));
ssdict.Add("apricot"); // or dict.Add(Encoding.UTF8.GetBytes("apricot"));
ssdict.Add("banana"); // or dict.Add(Encoding.UTF8.GetBytes("banana"));
// 3. Perform a search.
var identifier1 = ssdict.SearchExactly("apple"); // or dict.SearchExactly(Encoding.UTF8.GetBytes("apple"));
if (identifier1 > 0)
{
Console.WriteLine($"Found 'apple' with id: {identifier1}");
// 4. Add records to the key.
var records = ssdict.GetRecordAccess(identifier1);
// add a record and append another record after it.
records.Add(Encoding.UTF8.GetBytes("This is a record for apple.")).InsertAfter(Encoding.UTF8.GetBytes("Hello World."));
}
// 5. Perform a prefix search.
Console.WriteLine("\nKeys starting with 'ap':");
foreach (var (identifier2, key) in ssdict.SearchByPrefix("ap")) // or dict.SearchByPrefix(Encoding.UTF8.GetBytes("ap")
{
Console.WriteLine($"- Found: '{key}' (id: {identifier2})");
foreach (var rec in ssdict.GetRecordAccess(identifier2))
Console.WriteLine($" Record: {Encoding.UTF8.GetString(rec.Content)}");
}
For more detailed examples covering all major classes in the library, please see the full sample file:
➡️ View Program.cs
for More Examples
Performance
The following benchmarks were run on [Snapdragon X Plus - X1P42100] with .NET 9.0.
The test data used was the words_alpha.txt
file from the dwyl/english-words
repository, containing 370,105 English words.
Time & Memory Usage
This benchmark measures the time taken to perform an exact match search for a single, existing key.
Method | Mean | Error | StdDev | Allocated |
---|---|---|---|---|
DoubleArrayDictionary | 40.99 ns | 0.083 ns | 0.069 ns | - |
BitwiseVectorDictionary | 56.89 ns | 0.153 ns | 0.143 ns | - |
LoudsDictionary | 1,477.55 ns | 1.041 ns | 0.974 ns | - |
DirectedAcyclicGraphDictionary | 357.80 ns | 2.610 ns | 2.441 ns | - |
This benchmark measures the time taken to perform an exact match search for all, existing keys.
Method | Mean | Error | StdDev | Allocated |
---|---|---|---|---|
DoubleArrayDictionary | 15.17 ms | 0.026 ms | 0.024 ms | - |
BitwiseVectorDictionary | 21.91 ms | 0.436 ms | 0.535 ms | - |
LoudsDictionary | 584.83 ms | 4.851 ms | 4.051 ms | - |
DirectedAcyclicGraphDictionary | 144.80 ms | 2.440 ms | 2.163 ms | - |
This benchmark measures the time taken to enumerate all keys in the dictionary using EnumerateAll()
.
Method | Mean | Error | StdDev | Gen0 | Allocated |
---|---|---|---|---|---|
DoubleArrayDictionary | 937.92 ms | 1.204 ms | 1.126 ms | 20000.0000 | 81.96 MB |
BitwiseVectorDictionary | 76.33 ms | 0.513 ms | 0.455 ms | 13857.1429 | 55.35 MB |
LoudsDictionary | 893.20 ms | 0.971 ms | 0.861 ms | 13000.0000 | 55.35 MB |
DirectedAcyclicGraphDictionary | 338.85 ms | 5.918 ms | 5.535 ms | 44000.0000 | 178.18 MB |
This benchmark measures the total time taken to SearchByPrefix("a")
through SearchByPrefix("z")
.
Method | Mean | Error | StdDev | Gen0 | Gen1 | Allocated |
---|---|---|---|---|---|---|
DoubleArrayDictionary | 464.38 ms | 0.991 ms | 0.927 ms | 11000.0000 | - | 44.50 MB |
BitwiseVectorDictionary | 46.45 ms | 0.104 ms | 0.097 ms | 13909.0909 | 818.1818 | 55.76 MB |
LoudsDictionary | 252.94 ms | 0.533 ms | 0.473 ms | 13500.0000 | 500.0000 | 55.76 MB |
DirectedAcyclicGraphDictionary | 100.25 ms | 0.202 ms | 0.179 ms | 21200.0000 | 600.0000 | 84.76 MB |
This benchmark measures the final file size after serializing a dictionary built from the source key set.
Method | Serialized |
---|---|
DoubleArrayDictionary | 5.56 MB |
BitwiseVectorDictionary | 4.39 MB |
LoudsDictionary | 0.82 MB |
DirectedAcyclicGraphDictionary | 13.61 MB |
This benchmark measures the time and memory required to build the dictionary from the source key set.
Method | Mean | Error | StdDev | Gen0 | Gen1 | Gen2 | Allocated |
---|---|---|---|---|---|---|---|
DoubleArrayDictionary | 454.6 ms | 3.53 ms | 3.30 ms | 80000.0000 | 1000.0000 | 1000.0000 | 361.07 MB |
BitwiseVectorDictionary | 281.9 ms | 3.39 ms | 3.17 ms | 6000.0000 | 3000.0000 | 1000.0000 | 85.41 MB |
LoudsDictionary | 407.9 ms | 3.30 ms | 2.58 ms | 7000.0000 | 5000.0000 | 2000.0000 | 141.62 MB |
DirectedAcyclicGraphDictionary | 967.5 ms | 19.02 ms | 18.68 ms | 53000.0000 | 16000.0000 | 3000.0000 | 501.56 MB |
This benchmark measures the time and memory required to serialize the built dictionary with the default options.
Method | Mean | Error | StdDev | Gen0 | Gen1 | Gen2 | Allocated |
---|---|---|---|---|---|---|---|
DoubleArrayDictionary | 82.647 ms | 0.9356 ms | 0.8294 ms | 571.4286 | 571.4286 | 571.4286 | 46.45 MB |
BitwiseVectorDictionary | 55.513 ms | 1.0995 ms | 1.1764 ms | 444.4444 | 444.4444 | 444.4444 | 34.38 MB |
LoudsDictionary | 8.208 ms | 0.0719 ms | 0.0601 ms | 125.0000 | 78.1250 | 78.1250 | 6.10 MB |
DirectedAcyclicGraphDictionary | 653.322 ms | 1.1429 ms | 1.0691 ms | - | - | - | 109.78 MB |
This benchmark measures the time and memory required to deserialize from the serialized dictionary.
Method | Mean | Error | StdDev | Gen0 | Gen1 | Gen2 | Allocated |
---|---|---|---|---|---|---|---|
DoubleArrayDictionary | 69.21 ms | 1.266 ms | 1.185 ms | 250.0000 | 250.0000 | 250.0000 | 18.75 MB |
BitwiseVectorDictionary | 49.49 ms | 0.090 ms | 0.084 ms | 500.0000 | 500.0000 | 500.0000 | 22.05 MB |
LoudsDictionary | 16.94 ms | 0.239 ms | 0.223 ms | 500.0000 | 468.7500 | 468.7500 | 19.21 MB |
DirectedAcyclicGraphDictionary | 243.02 ms | 2.419 ms | 2.263 ms | 12000.0000 | 6666.6667 | 1666.6667 | 94.74 MB |
API Overview
This library is composed of several key components:
KeyRecordDictionary
: The abstract base class for all high-level dictionaries. It defines the core architecture, including the dual record storage system (persistent/transient) and serialization patterns.Dictionary Implementations:
DoubleArrayDictionary
: A mutable trie. Ideal for scenarios where keys need to be added or removed dynamically.BitwiseVectorDictionary
: A read-only, high-performance dictionary. An alternative to LOUDS, also great for static key sets.LoudsDictionary
: A read-only, memory-efficient trie. An alternative to DAWG, also great for static key sets.DirectedAcyclicGraphDictionary
(DAWG): A read-only dictionary that compresses a trie by merging common nodes.
- Record Stores:
PrimitiveRecordStore
: A low-level, append-only store for raw byte records. Used as the default persistent storage.BasicRecordStore
Implementations (HashMapRecordStore
,AVLTreeRecordStore
, etc.): In-memory stores used for transient data or other specialized use cases.
For a complete reference of all public classes and methods, please see our full API Reference.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Project Website
Main Page - https://github.com/belnytheraseiche/TrieDictionary/
Reference - https://belnytheraseiche.github.io/TrieDictionary/