In this article, we will see how to build a basic yet efficient Least Recently Used (LRU) cache system in Rust, complete with an eviction policy and customizable eviction callbacks.
Starting with the foundational concepts of caching and the significance of the LRU strategy, we progressively enhance a basic cache implementation.
We introduce advanced features like efficient tracking of item usage, concurrency support, and the integration of eviction callbacks to handle resource cleanup upon item removal.
Let’s dive deep in! 🦀
The Basics
Implementing a basic cache system in Rust involves creating a struct to represent the cache, which typically includes a hashmap for storing key-value pairs and possibly some mechanism for tracking the age or usage of items for eviction purposes (e.g., for a Least Recently Used (LRU) cache).
For simplicity, I’ll show you how to implement a basic cache system without any eviction policy. This cache will store key-value pairs and won’t automatically remove old or less frequently used items.
use std::collections::HashMap;
use std::hash::Hash;
pub struct Cache<K, V> where K: Eq + Hash {
storage: HashMap<K, V>,
}
impl<K, V> Cache<K, V> where K: Eq + Hash + Clone {
pub fn new() -> Self {
Cache {
storage: HashMap::new(),
}
}
pub fn set(&mut self, key: K, value: V) {
self.storage.insert(key, value);
}
pub fn get(&self, key: &K) -> Option<&V> {
self.storage.get(key)
}
}
fn main() {
let mut cache = Cache::new();
cache.set("key1", "value1");
cache.set("key2", "value2");
if let Some(value) = cache.get(&"key1") {
println!("Found value: {}", value);
} else {
println!("Value not found");
}
match cache.get(&"key3") {
Some(value) => println!("Found value: {}", value),
None => println!("Value not found"),
}
}
This code defines a Cache struct that uses a HashMap for storage. The Cache struct has methods for creating a new cache (new), inserting a key-value pair (set), and retrieving a value by key (get). The main function demonstrates how to use the cache to store and retrieve values.
Implementing an eviction policy
To implement a cache system with an eviction policy in Rust, we’ll use the Least Recently Used (LRU) policy as an example. An LRU cache evicts the least recently accessed item when the cache reaches its capacity to make room for new items.
We’ll enhance the previous example by adding the capacity for the cache and tracking the usage of items using a combination of a HashMap and a VecDeque (a double-ended queue). The VecDeque will track the order of item usage, with the most recently used items at the front and the least recently used items at the back.
use std::collections::{HashMap, VecDeque};
use std::hash::Hash;
pub struct LRUCache<K, V>
where
K: Eq + Hash + Clone,
{
capacity: usize,
storage: HashMap<K, V>,
usage_order: VecDeque<K>,
}
impl<K, V> LRUCache<K, V>
where
K: Eq + Hash + Clone,
{
pub fn new(capacity: usize) -> Self {
assert!(capacity > 0, "Cache capacity must be greater than 0");
LRUCache {
capacity,
storage: HashMap::new(),
usage_order: VecDeque::new(),
}
}
pub fn set(&mut self, key: K, value: V) {
self.storage.insert(key.clone(), value);
self.update_usage(&key);
if self.storage.len() > self.capacity {
if let Some(least_recently_used) = self.usage_order.pop_back() {
self.storage.remove(&least_recently_used);
}
}
}
pub fn get(&mut self, key: &K) -> Option<&V> {
if self.storage.contains_key(key) {
self.update_usage(key);
self.storage.get(key)
} else {
None
}
}
fn update_usage(&mut self, key: &K) {
// Remove the key if it already exists in the usage_order.
self.usage_order.retain(|existing_key| existing_key != key);
// Insert the key at the front to mark it as recently used.
self.usage_order.push_front(key.clone());
}
}
fn main() {
// Example usage of the LRU cache.
let mut cache = LRUCache::new(2);
cache.set("key1", "value1");
cache.set("key2", "value2");
println!("Retrieved: {:?}", cache.get(&"key1")); // Should update key1's position
cache.set("key3", "value3"); // This should evict key2
// Trying to retrieve key2 should yield None since it was evicted.
match cache.get(&"key2") {
Some(value) => println!("Retrieved: {:?}", value),
None => println!("Key2 was evicted"),
}
}
This implementation defines an `LRUCache` struct that includes the cache's capacity, a `HashMap` for storage, and a `VecDeque` to track the usage order of the keys. The `set` method inserts or updates a key-value pair and ensures the cache does not exceed its specified capacity by evicting the least recently used item if necessary. The `get` method retrieves a value by its key and updates the usage order to reflect the recent access. The `update_usage` method manages the position of keys in the `VecDeque` to maintain the correct usage order.
This is a basic implementation and might need optimizations or modifications based on specific requirements, such as thread safety in concurrent environments.
Here’s a more advanced example implementing some of these concepts, focusing on the doubly linked list for efficient tracking:
```rust
use std::collections::HashMap;
use std::rc::{Rc, Weak};
use std::cell::RefCell;
struct Node<K, V> {
key: K,
value: V,
prev: Option<Weak<RefCell<Node<K, V>>>>,
next: Option<Rc<RefCell<Node<K, V>>>>,
}
impl<K, V> Node<K, V> {
fn new(key: K, value: V) -> Rc<RefCell<Self>> {
Rc::new(RefCell::new(Node {
key,
value,
prev: None,
next: None,
}))
}
}
pub struct LRUCache<K, V>
where
K: Eq + Hash + Clone,
{
capacity: usize,
map: HashMap<K, Rc<RefCell<Node<K, V>>>>,
head: Option<Rc<RefCell<Node<K, V>>>>,
tail: Option<Rc<RefCell<Node<K, V>>>>,
}
impl<K, V> LRUCache<K, V>
where
K: Eq + Hash + Clone,
{
pub fn new(capacity: usize) -> Self {
assert!(capacity > 0, "Cache capacity must be greater than 0");
LRUCache {
capacity,
map: HashMap::new(),
head: None,
tail: None,
}
}
pub fn set(&mut self, key: K, value: V) {
let node = if let Some(node) = self.map.get(&key) {
let mut node_ref = node.borrow_mut();
node_ref.value = value;
node.clone()
} else {
let new_node = Node::new(key.clone(), value);
self.map.insert(key, new_node.clone());
new_node
};
self.attach(node);
if self.map.len() > self.capacity {
if let Some(tail) = self.tail.take() {
let tail_key = &tail.borrow().key;
self.map.remove(tail_key);
self.tail = tail.borrow_mut().prev.take().and_then(|prev_weak| {
let prev = prev_weak.upgrade().unwrap();
prev.borrow_mut().next = None;
Some(prev)
});
}
}
}
pub fn get(&mut self, key: &K) -> Option<V> {
self.map.get(key).and_then(|node| {
let val = node.borrow().value.clone();
self.attach(node.clone());
Some(val)
})
}
fn attach(&mut self, node: Rc<RefCell<Node<K, V>>>) {
if let Some(ref head) = self.head {
head.borrow_mut().prev = Some(Rc::downgrade(&node));
}
node.borrow_mut().next = self.head.take();
self.head = Some(node.clone());
if self.tail.is_none() {
self.tail = Some(node);
}
}
}
fn main() {
let mut cache = LRUCache::new(2);
cache.set("key1", "value1");
cache.set("key2", "value2");
println!("Retrieved: {:?}", cache.get(&"key1")); // Access key1, making it most recently used
cache.set("key3", "value3"); // Evicts key2 as it is the least recently used
match cache.get(&"key2") {
Some(value) => println!("Retrieved: {:?}", value),
None => println!("Key2 was evicted"), // Expected outcome
}
}
This advanced example introduces a custom doubly linked list for tracking the usage order, significantly improving efficiency over the VecDeque approach. Each node in the list contains a key-value pair, and the list maintains pointers to the most and least recently used items. When an item is accessed or added, it's moved to the front of the list. If the cache exceeds its capacity, the item at the end of the list (the least recently used item) is removed.
Incorporating eviction callbacks
To incorporate eviction callbacks into the LRU cache implementation, you can modify the Node structure to include an optional callback function that gets called when a node is evicted. In Rust, you can use Box<dyn Fn()> to store a closure that takes no parameters and returns nothing, which is suitable for our eviction callback.
Here’s an updated version of the LRU cache that includes eviction callbacks:
use std::collections::HashMap;
use std::rc::{Rc, Weak};
use std::cell::RefCell;
struct Node<K, V> {
key: K,
value: V,
prev: Option<Weak<RefCell<Node<K, V>>>>,
next: Option<Rc<RefCell<Node<K, V>>>>,
on_evict: Option<Box<dyn Fn()>>,
}
impl<K, V> Node<K, V> {
fn new(key: K, value: V, on_evict: Option<Box<dyn Fn()>>) -> Rc<RefCell<Self>> {
Rc::new(RefCell::new(Node {
key,
value,
prev: None,
next: None,
on_evict,
}))
}
}
pub struct LRUCache<K, V>
where
K: Eq + Hash + Clone,
{
capacity: usize,
map: HashMap<K, Rc<RefCell<Node<K, V>>>>,
head: Option<Rc<RefCell<Node<K, V>>>>,
tail: Option<Rc<RefCell<Node<K, V>>>>,
}
impl<K, V> LRUCache<K, V>
where
K: Eq + Hash + Clone,
{
pub fn new(capacity: usize) -> Self {
assert!(capacity > 0, "Cache capacity must be greater than 0");
LRUCache {
capacity,
map: HashMap::new(),
head: None,
tail: None,
}
}
pub fn set(&mut self, key: K, value: V, on_evict: Option<Box<dyn Fn()>>) {
let node = if let Some(node) = self.map.get(&key) {
let mut node_ref = node.borrow_mut();
node_ref.value = value;
node.clone()
} else {
let new_node = Node::new(key.clone(), value, on_evict);
self.map.insert(key, new_node.clone());
new_node
};
self.attach(node);
if self.map.len() > self.capacity {
if let Some(tail) = self.tail.take() {
let tail_key = &tail.borrow().key;
if let Some(callback) = &tail.borrow().on_evict {
callback();
}
self.map.remove(tail_key);
self.tail = tail.borrow_mut().prev.take().and_then(|prev_weak| {
let prev = prev_weak.upgrade().unwrap();
prev.borrow_mut().next = None;
Some(prev)
});
}
}
}
pub fn get(&mut self, key: &K) -> Option<V> {
self.map.get(key).and_then(|node| {
let val = node.borrow().value.clone();
self.attach(node.clone());
Some(val)
})
}
fn attach(&mut self, node: Rc<RefCell<Node<K, V>>>) {
if let Some(ref head) = self.head {
head.borrow_mut().prev = Some(Rc::downgrade(&node));
}
node.borrow_mut().next = self.head.take();
self.head = Some(node.clone());
if self.tail.is_none() {
self.tail = Some(node);
}
}
}
fn main() {
let mut cache = LRUCache::new(2);
cache.set("key1", "value1", Some(Box::new(|| println!("Evicting key1"))));
cache.set("key2", "value2", Some(Box::new(|| println!("Evicting key2"))));
println!("Retrieved: {:?}", cache.get(&"key1"));
cache.set("key3", "value3", Some(Box::new(|| println!("Evicting key3 or key2"))));
match cache.get(&"key2") {
Some(value) => println!("Retrieved: {:?}", value),
None => println!("Key2 was evicted"),
}
}
In this version, the Node struct includes an on_evict field that holds an optional eviction callback. The set method accepts an additional parameter, on_evict, which is a closure that gets executed when the node is evicted. When the cache exceeds its capacity and needs to evict the least recently used item, it checks if there's an eviction callback for the node being removed and executes it if present.