首页 > 脚本专栏 > python > Python LRU缓存应用

Python之LRU缓存应用与实例

2025-10-09 08:43:37 作者：AI手记叨叨礼拜天

LRU（最近最少使用）是高效缓存淘汰算法,通过OrderedDict维护访问顺序,实现O(1)时间复杂度的get/put操作,适用于Web应用和配置管理,但不适用于强一致性场景或超大数据集

一、什么是LRU

LRU（Least Recently Used，最近最少使用）是一种常用的缓存淘汰算法，用于在缓存空间不足时决定哪些数据应该被移除。

核心思想

如果一个数据最近被访问过，那么它将来被访问的概率也更高。因此，当缓存空间不足时，应该优先淘汰最久未被访问的数据。

工作原理

访问数据时

如果数据在缓存中（缓存命中），则将该数据标记为"最近使用"，并移动到缓存的最前面（或最后面，取决于实现）。
如果数据不在缓存中（缓存未命中），则从原始数据源加载。

缓存满时

需要插入新数据时，移除最久未被访问的数据（即LRU数据），
然后插入新数据到最新位置。

主要特性

固定容量：限制缓存大小，防止内存无限增长。
自动淘汰机制：当缓存满时，移除最旧的条目。
快速访问：get() 和 put() 操作的时间复杂度均为 O(1)。
保持访问顺序：每次访问或更新缓存条目时，会将其移至最新位置。

二、核心实现

1. 数据结构

使用 OrderedDict 存储键值对，并维护访问顺序：

最新访问的条目位于字典的末尾。
最久未访问的条目位于字典的开头。

2. 关键方法

__init__(self, capacity)

初始化缓存，设置最大容量。

参数： capacity (int)：缓存的最大条目数。
示例：

cache = LatestCache(1000)  # 最大存储 1000 个条目

get(self, key)

获取缓存中的值，如果不存在则返回 None。

参数： key：要查询的键。
返回值：如果存在，返回对应的值；否则返回 None。
示例：

value = cache.get("some_key")

put(self, key, value)

向缓存中添加或更新键值对。

参数： key：要存储的键； value：要存储的值。
行为：如果 key 已存在，更新其值并移至最新位置；如果缓存已满，移除最旧的条目。
示例：

cache.put("some_key", "some_value")

三、使用示例

1. 基本用法

from collections import OrderedDict

class LatestCache:
    def __init__(self, capacity):
        self.cache = OrderedDict()
        self.capacity = capacity

    def get(self, key):
        if key not in self.cache:
            return None
        self.cache.move_to_end(key)  # 移至最新位置
        return self.cache[key]

    def put(self, key, value):
        if key in self.cache:
            self.cache.move_to_end(key)  # 更新时移至最新位置
        self.cache[key] = value
        if len(self.cache) > self.capacity:
            self.cache.popitem(last=False)  # 移除最旧的条目


# 初始化缓存
cache = LatestCache(3)

# 添加数据
cache.put("a", 1)
cache.put("b", 2)
cache.put("c", 3)

# 查询数据
print(cache.get("a"))  # 输出: 1

# 缓存满时自动淘汰
cache.put("d", 4)      # 淘汰最久未访问的键 "b"
print(cache.get("b"))  # 输出: None（已被淘汰）

2. 适用场景

高频读取、低频写入：如配置缓存、静态数据缓存。
减少重复计算：如函数结果缓存。
优化数据库/API 查询：缓存热点数据，减少 IO 开销。

四、优化建议

1. 线程安全改进

当前实现非线程安全，多线程环境下可能导致数据竞争。可引入 threading.RLock 加锁：

from threading import RLock

class LatestCache:
    def __init__(self, capacity):
        self._lock = RLock()
        self.cache = OrderedDict()
        self.capacity = capacity

    def get(self, key):
        with self._lock:
            if key not in self.cache:
                return None
            self.cache.move_to_end(key)
            return self.cache[key]

    def put(self, key, value):
        with self._lock:
            if key in self.cache:
                self.cache.move_to_end(key)
            self.cache[key] = value
            if len(self.cache) > self.capacity:
                self.cache.popitem(last=False)

2. 缓存命中率统计

增加 hits 和 misses 统计，评估缓存效率：

hits: 记录成功从缓存中获取数据的次数
misses: 记录未能从缓存中获取数据的次数
cache: 使用OrderedDict实现的缓存存储，保持键的插入顺序
capacity: 缓存的最大容量

from threading import RLock
from collections import OrderedDict


class LatestCache:
    def __init__(self, capacity):
        self._lock = RLock()
        self.hits = 0
        self.misses = 0
        self.cache = OrderedDict()
        self.capacity = capacity

    def get(self, key):
        with self._lock:
            if key in self.cache:
                self.hits += 1
                self.cache.move_to_end(key)
                return self.cache[key]
            self.misses += 1
            return None

    def put(self, key, value):
        with self._lock:
            if key in self.cache:
                self.cache.move_to_end(key)
            self.cache[key] = value
            if len(self.cache) > self.capacity:
                self.cache.popitem(last=False)

    def hit_rate(self):
        with self._lock:
            total = self.hits + self.misses
            return (self.hits / total) if total > 0 else 0.0


# 初始化缓存
cache = LatestCache(3)

# 添加数据
cache.put("a", 1)
cache.put("b", 2)
cache.put("c", 3)

# 查询数据
print(cache.get("a"))  # 命中，输出: 1
print(cache.get("b"))  # 命中，输出: 2
print(cache.get("a"))  # 命中，输出: 1
print(cache.get("x"))  # 未命中，输出: None

# 缓存满时自动淘汰
cache.put("d", 4)  # 淘汰最久未访问的键 "c"
print(cache.get("c"))  # 未命中(已被淘汰)，输出: None

# 查看命中率统计
print(f"命中次数: {cache.hits}")  # 输出: 3 (aba)
print(f"未命中次数: {cache.misses}")  # 输出: 2 (xc)
print(f"命中率: {cache.hit_rate():.2%}")  # 输出: 60.00% (3命中/(3命中+2未命中))

3. 支持 TTL

TTL（Time To Live）是数据在缓存中存活的生存时间，过期后自动失效。

from collections import OrderedDict
import time
import random


class LatestCache:
    def __init__(self, capacity):
        self.cache = OrderedDict()
        self.capacity = capacity

    def get(self, key):
        if key not in self.cache:
            return None
        value, expire_time = self.cache[key]
        if expire_time and time.time() > expire_time:
            del self.cache[key]  # 自动清理过期数据
            return None
        self.cache.move_to_end(key)  # 更新为最近使用
        return value

    def put(self, key, value, ttl=None):
        expire_time = time.time() + ttl if ttl else None
        if key in self.cache:
            self.cache.move_to_end(key)
        self.cache[key] = (value, expire_time)
        if len(self.cache) > self.capacity:
            self.cache.popitem(last=False)  # 移除最久未使用的


# 初始化缓存（容量为3）
cache = LatestCache(3)

# 添加数据（带TTL和不带TTL的混合）
cache.put("a", 1, ttl=2)  # 2秒后过期
cache.put("b", 2)  # 永不过期
cache.put("c", 3, ttl=4)  # 4秒后过期

# 立即查询（全部命中）
print(f"初始查询: a={cache.get('a')}, b={cache.get('b')}, c={cache.get('c')}")
# 输出: 初始查询: a=1, b=2, c=3

# 模拟2秒后（'a'已过期）
print("等待2秒后...")
time.sleep(2)

print(f"查询: a={cache.get('a')}, b={cache.get('b')}, c={cache.get('c')}")
# 输出: 查询: a=None , b=2, c=3

五、总结

1. 优点

简单高效：基于 OrderedDict，get() 和 put() 均为 O(1) 时间复杂度。
自动淘汰：LRU 策略防止内存无限增长。
易于扩展：可增加 TTL、线程安全、命中统计等功能。

2. 适用场景

Web 应用：缓存 API 响应、数据库查询结果。
计算密集型任务：缓存中间计算结果，避免重复计算。
配置管理：缓存频繁读取的配置数据。

3. 不适用场景

强一致性要求：缓存可能导致数据短暂不一致，如缓存更新延迟、缓存失效策略、分布式环境同步等。
超大数据集：单机内存有限，可改用 Redis 等分布式缓存。

以上为个人经验，希望能给大家一个参考，也希望大家多多支持脚本之家。

Python之LRU缓存应用与实例

一、什么是LRU

核心思想

工作原理

主要特性

二、核心实现

1. 数据结构

2. 关键方法

三、使用示例

1. 基本用法

2. 适用场景

四、优化建议

1. 线程安全改进

2. 缓存命中率统计

3. 支持 TTL

五、总结

1. 优点

2. 适用场景

3. 不适用场景

您可能感兴趣的文章: