绑定完请刷新页面
取消
刷新

分享好友

×
取消 复制
LMDB源码官方介绍
2022-04-15 14:44:50

前期研究了一下LMDB与RocksDB的效率,发现LMDB在查询方面确实要快于RocksDB。于是准备从理论到源码做一个对LMDB做一个系统的分析。

首先拿到代码后发现只有四个关键文件,于是本文对官方的英文解释做一个翻译,整体学习一下官方是如何对LMDB进行介绍的


我们这次要分析的文件为lmdb.h: 

/** @file lmdb.h
* @brief Lightning memory-mapped database library
*
* @mainpage Lightning Memory-Mapped Database Manager (LMDB)
*
* @section intro_sec Introduction
* LMDB is a Btree-based database management library modeled loosely on the
* BerkeleyDB API, but much simplified. The entire database is exposed
* in a memory map, and all data fetches return data directly
* from the mapped memory, so no malloc's or memcpy's occur during
* data fetches. As such, the library is extremely simple because it
* requires no page caching layer of its own, and it is extremely high
* performance and memory-efficient. It is also fully transactional with
* full ACID semantics, and when the memory map is read-only, the
* database integrity cannot be corrupted by stray pointer writes from
* application code.
*
* The library is fully thread-aware and supports concurrent read/write
* access from multiple processes and threads. Data pages use a copy-on-
* write strategy so no active data pages are ever overwritten, which
* also provides resistance to corruption and eliminates the need of any
* special recovery procedures after a system crash. Writes are fully
* serialized; only one write transaction may be active at a time, which
* guarantees that writers can never deadlock. The database structure is
* multi-versioned so readers run with no locks; writers cannot block
* readers, and readers don't block writers.
*
* Unlike other well-known database mechanisms which use either write-ahead
* transaction logs or append-only data writes, LMDB requires no maintenance
* during operation. Both write-ahead loggers and append-only databases
* require periodic checkpointing and/or compaction of their log or database
* files otherwise they grow without bound. LMDB tracks free pages within
* the database and re-uses them for new write operations, so the database
* size does not grow without bound in normal use.
*
* The memory map can be used as a read-only or read-write map. It is
* read-only by default as this provides total immunity to corruption.
* Using read-write mode offers much higher write performance, but adds
* the possibility for stray application writes thru pointers to silently
* corrupt the database. Of course if your application code is known to
* be bug-free (...) then this is not an issue.
*
* If this is your first time using a transactional embedded key/value
* store, you may find the \ref starting page to be helpful.
*
* @section caveats_sec Caveats
* Troubleshooting the lock file, plus semaphores on BSD systems:
*
* - A broken lockfile can cause sync issues.
* Stale reader transactions left behind by an aborted program
* cause further writes to grow the database quickly, and
* stale locks can block further operation.
*
* Fix: Check for stale readers periodically, using the
* #mdb_reader_check function or the \ref mdb_stat_1 "mdb_stat" tool.
* Stale writers will be cleared automatically on some systems:
* - Windows - automatic
* - Linux, systems using POSIX mutexes with Robust option - automatic
* - not on BSD, systems using POSIX semaphores.
* Otherwise just make all programs using the database close it;
* the lockfile is always reset on first open of the environment.
*
* - On BSD systems or others configured with MDB_USE_POSIX_SEM,
* startup can fail due to semaphores owned by another userid.
*
* Fix: Open and close the database as the user which owns the
* semaphores (likely last user) or as root, while no other
* process is using the database.
*
* Restrictions/caveats (in addition to those listed for some functions):
*
* - Only the database owner should normally use the database on
* BSD systems or when otherwise configured with MDB_USE_POSIX_SEM.
* Multiple users can cause startup to fail later, as noted above.
*
* - There is normally no pure read-only mode, since readers need write
* access to locks and lock file. Exceptions: On read-only filesystems
* or with the #MDB_NOLOCK flag described under #mdb_env_open().
*
* - An LMDB configuration will often reserve considerable \b unused
* memory address space and maybe file size for future growth.
* This does not use actual memory or disk space, but users may need
* to understand the difference so they won't be scared off.
*
* - By default, in versions before 0.9.10, unused portions of the data
* file might receive garbage data from memory freed by other code.
* (This does not happen when using the #MDB_WRITEMAP flag.) As of
* 0.9.10 the default behavior is to initialize such memory before
* writing to the data file. Since there may be a slight performance
* cost due to this initialization, applications may disable it using
* the #MDB_NOMEMINIT flag. Applications handling sensitive data
* which must not be written should not use this flag. This flag is
* irrelevant when using #MDB_WRITEMAP.
*
* - A thread can only use one transaction at a time, plus any child
* transactions. Each transaction belongs to one thread. See below.
* The #MDB_NOTLS flag changes this for read-only transactions.
*
* - Use an MDB_env* in the process which opened it, not after fork().
*
* - Do not have open an LMDB database twice in the same process at
* the same time. Not even from a plain open() call - close()ing it
* breaks fcntl() advisory locking. (It is OK to reopen it after
* fork() - exec*(), since the lockfile has FD_CLOEXEC set.)
*
* - Avoid long-lived transactions. Read transactions prevent
* reuse of pages freed by newer write transactions, thus the
* database can grow quickly. Write transactions prevent
* other write transactions, since writes are serialized.
*
* - Avoid suspending a process with active transactions. These
* would then be "long-lived" as above. Also read transactions
* suspended when writers commit could sometimes see wrong data.
*
* ...when several processes can use a database concurrently:
*
* - Avoid aborting a process with an active transaction.
* The transaction becomes "long-lived" as above until a check
* for stale readers is performed or the lockfile is reset,
* since the process may not remove it from the lockfile.
*
* This does not apply to write transactions if the system clears
* stale writers, see above.
*
* - If you do that anyway, do a periodic check for stale readers. Or
* close the environment once in a while, so the lockfile can get reset.
*
* - Do not use LMDB databases on remote filesystems, even between
* processes on the same host. This breaks flock() on some OSes,
* possibly memory map sync, and certainly sync between programs
* on different hosts.
*
* - Opening a database can fail if another process is opening or
* closing it at exactly the same time.
*
* @author Howard Chu, Symas Corporation.
*
* @copyright Copyright 2011-2018 Howard Chu, Symas Corp. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted only as authorized by the OpenLDAP
* Public License.
*
* A copy of this license is available in the file LICENSE in the
* top-level directory of the distribution or, alternatively, at
* <http://www.OpenLDAP.org/license.html>.
*
* @par Derived From:
* This code is derived from btree.c written by Martin Hedenfalk.
*
* Copyright (c) 2009, 2010 Martin Hedenfalk <martin@bzero.se>
*
* Permission to use, copy, modify, and distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
* copyright notice and this permission notice appear in all copies.
*
* THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
* WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
* MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
* ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
* ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/

解释如下:

LMDB是在BerkeleyDB的基础上基于B树与来管理数据的数据库。其所有的数据均通过mmap从内存中读出,所以不需要maclloc以及memcpy操作参与。由于不需要page缓存层的参与,所以该数据库非常简单与高效。同样该DB使用ACID语义,当内存映射为只读时,数据库完整性不会被应用程序代码中的指针写入破坏。

该DB支持多线程多进程并发读写操作。数据页使用“写时复制”策略,所以避免了数据的重写,所以支持很好的错误恢复特性。其写操作是顺序进行的,在一个时间只允许一个写操作进行因此保证了写者不会死锁。数据结构是多版本的所以读者可以在无锁的环境下进行操作;除此之外,写者不会阻塞写操作,反之也是。

不同于其他使用写前日志或追加写方案的的数据库,LMDB在操作时没有进行维护工作。上述两种方案均需要定期进行检查点保存或者log合并。而LMDB会追踪空闲的页并重复利用他们进行后续写,所以数据库的大小并不会增长的太严重。

mmap机制被用于只读或者读写映射,默认情况下呗设置为只读从而防止数据库被破坏。使用读写模式可以提高写性能,但是一些应用如果编写不严格会破坏数据库

次使用kv存储,我们需要看一下下面的内容:

为了解决文件锁问题,需要对BSD系统增加信号量机制:

  • 损坏的锁文件会造成同步问题

被中止的程序留下的陈旧的读取器事务会导致进一步的写入操作,从而使数据库快速增长,并且陈旧的锁可能会阻止进一步的操作。

一个时间一个线程只能操作一个任务。

一个进程不能同时打开两次LMDB数据库。

避免长周期的任务操作,读操作会阻碍写操作复用page,从而使得数据库增长速度变快。由于写是顺序进行,所以写操作同样会阻碍其他写操作。

避免将正在运行的工作挂起。作者提交时暂停的读取事务有时可能会看到错误的数据。

当多进程同时使用数据库的时候需要注意:

  • 避免强行中止正在工作的进程

需要定期检查过少id读操作,或者关闭工作环境从而使得锁文件可以重置。

不用在远程文件系统上使用LMDB,因为这会使得flock()在某些操作系统上失效。

同时打开或者关闭数据库会使得操作失效。

作者:CPinging
链接:https://www.jianshu.com/p/ed07857570fd

分享好友

分享这个小栈给你的朋友们,一起进步吧。

LMDB
创建时间:2022-04-15 14:36:38
LMDB
展开
订阅须知

• 所有用户可根据关注领域订阅专区或所有专区

• 付费订阅:虚拟交易,一经交易不退款;若特殊情况,可3日内客服咨询

• 专区发布评论属默认订阅所评论专区(除付费小栈外)

技术专家

查看更多
  • itt0918
    专家
戳我,来吐槽~