MongoDB.in.Action notes
MongoDB算是比较接近传统关系数据库的no-sql类型的文档数据库,最近其发展越来越快,一个是使用范围越来越广,搜索mongodb ppt的话可以有一大堆;一个是,不断增加的新功能和社区增加的新需求都在使MongoDB不断完善并且变的庞大。
以下是阅读MongoDB.in.Action一些读书笔记,都是”点”:主要是MongoDB模式设计和维护相关。包括
- Embed versus reference 例如blog中如何实现post和comments
- One-to-many:
- Many-to-many: Products and categories之间多对多的relation展现(一个product属于很多category,一个category可以有很多product)
- Tree: 如论坛留言
- Dynamic attributes 是否引入嵌入式文档
- namespaces文件的限制
- mmap– MongoDB如何使用内存
- 内存是如何影响性能的,推荐是index都被缓存在内存中
- Limits on document size
- Custom types,例如存储带有时区的时间
- a single global reader-writer lock 如何影响并行
- 因为单一global RW lock对创建/重建索引的影响
- WHEN TO SHARD
- indexes in shard
- Sharding an existing collection
- MANUAL PARTITIONING
Embed versus reference 例如blog中如何实现post和comments
Suppose you’re building a simple application in MongoDB that stores blog posts and comments. How do you represent this data? Do you embed the comments in their respective blog post documents? Or is it better to create two collections, one for posts and the other for comments, and then relate the comments to the posts with an object id reference?
This is the problem of embedding versus referencing, and it’s a common source of confusion for new users of MongoDB. Fortunately, there’s a simple rule of thumb that works for most schema design scenarios: Embed when the child objects always appear in the context of their parent. Otherwise, store the child objects in a separate collection.
What does this mean for blog posts and comments? It depends on the application. If the comments always appear within a blog post, and if they don’t need to be ordered in arbitrary ways (by post date, comment rank, and so on), then embedding is fine. But if, say, you want to be able to display the most recent comments, regardless of which post they appear on, then you’ll want to reference. Embedding may provide a slight performance advantage, but referencing is far more flexible.但要注意MongoDB对单一文档有大小限制,不同版本可能不一样。如果将post和comments都存储在一个文档中,需要注意大小。
One-to-many : embedding or referencing.
As stated in the previous section, you can represent a one-to-many relationship by either embedding or referencing. You should embed when the many object intrinsically belongs with its parent and rarely changes.
embedding example: step and sub-steps
{ title: “How to soft-boil an egg”,
steps: [
250 APPENDIX B Design patterns
{ desc: “Bring a pot of water to boil.”,
materials: [”water”, “eggs”] },
{ desc: “Gently add the eggs a cook for four minutes.”,
materials: [”egg timer”]},
{ desc: “Cool the eggs under running water.” },
]
}referencing example: post and comments
a sample of post document = { _id: ObjectId(”4d650d4cf32639266022018d”),
title: “Cultivating herbs”,
text: “Herbs require occasional watering…”
}A sample of comment document = { _id: ObjectId(”4d650d4cf32639266022ac01″),
post_id: ObjectId(”4d650d4cf32639266022018d”),
username: “zjones”,
text: “Indeed, basil is a hearty herb!”
}we need index on comments collection : db.comments.ensureIndex({post_id: 1})
Many-to-many: Products and categories之间多对多的relation展现(一个product属于很多category,一个category可以有很多product)
here’s a sample product from a gardening store.
a sample of product doc =
{ _id: new ObjectId(”4c4b1476238d3b4dd5003981″),
slug: “wheel-barrow-9092″,
sku: “9092″,
name: “Extra Large Wheel Barrow”,
description: “Heavy duty wheel barrow…”,
details: {
weight: 47,
weight_units: “lbs”,
model_num: 4039283402,
manufacturer: “Acme”,
color: “Green”
},
total_reviews: 4,
average_review: 4.5,pricing: {
retail: 589700,
sale: 489700,
},
price_history: [
{retail: 529700,
sale: 429700,
start: new Date(2010, 4, 1),
end: new Date(2010, 4,
},
{retail: 529700,
sale: 529700,
start: new Date(2010, 4, 9),
end: new Date(2010, 4, 16)
},
],
category_ids: [new ObjectId(”6a5b1476238d3b4dd5000048″),
new ObjectId(”6a5b1476238d3b4dd5000049″)],
main_cat_id: new ObjectId(”6a5b1476238d3b4dd5000048″),
tags: [”tools”, “gardening”, “soil”],
}a sample of category doc =
{ _id: new ObjectId(”6a5b1476238d3b4dd5000048″),
slug: “gardening-tools”,
ancestors: [{ name: “Home”,
_id: new ObjectId(”8b87fb1476238d3b4dd500003″),
slug: “home”
},
{ name: “Outdoors”,
Listing 4.2 A category document
Designing an e-commerce data model 61
_id: new ObjectId(”9a9fb1476238d3b4dd5000001″),
slug: “outdoors”
}
],
parent_id: new ObjectId(”9a9fb1476238d3b4dd5000001″),
name: “Gardening Tools”,
description: “Gardening gadgets galore!”,
}如果category的name并不经常变化,但经常要查询属于某个category下的product,则可以将category name也加入到product文档中。
Tree: 例如如下论坛留言的形式,每条记录需要级录其parent
each node in the tree contains a path field. This field stores the concatenation of each of the node’s ancestor’s IDs, and root-level nodes have a null path because they have no ancestors.
{ _id: ObjectId(”4d692b5d59e212384d95001″),
depth: 0,
path: null,
created: ISODate(”2011-02-26T17:18:01.251Z”),
username: “plotinus”,
body: “Who was Alexander the Great’s teacher?”,
thread_id: ObjectId(”4d692b5d59e212384d95223a”)
}Examine the first of these, and note that path contains the _id of the immediate parent:
{ _id: ObjectId(”4d692b5d59e212384d951002″),
depth: 1,
path: “4d692b5d59e212384d95001″,
created: ISODate(”2011-02-26T17:21:01.251Z”),
username: “asophist”,
body: “It was definitely Socrates.”,
thread_id: ObjectId(”4d692b5d59e212384d95223a”)
}The next deeper comment’s path contains both the IDs of the original and immediate parents, in that order and separated by a colon:
{ _id: ObjectId(”4d692b5d59e212384d95003″),
depth: 2,
path: “4d692b5d59e212384d95001:4d692b5d59e212384d951002″,
created: ISODate(”2011-02-26T17:21:01.251Z”),
username: “daletheia”,
body: “Oh you sophist…It was actually Aristotle!”,
thread_id: ObjectId(”4d692b5d59e212384d95223a”)
}some indexes are needed.
db.comments.ensureIndex({thread_id: 1})
db.comments.ensureIndex({path: 1})
Dynamic attributes 是否引入嵌入式文档
In a single products collection, you can then store disparate product types. You might store a set of headphones
{ _id: ObjectId(”4d669c225d3a52568ce07646″)
sku: “ebd-123″
name: “Hi-Fi Earbuds”,
type: “Headphone”,
attrs: { color: “silver”,
freq_low: 20,
freq_hi: 22000,
weight: 0.5
}
}and an SSD drive:
{ _id: ObjectId(”4d669c225d3a52568ce07646″)
sku: “ssd-456″
name: “Mini SSD Drive”,
type: “Hard Drive”,
attrs: { interface: “SATA”,
capacity: 1.2 * 1024 * 1024 * 1024,
rotation: 7200,
form_factor: 2.5
}
}If you need to frequently query on these attributes, you can create sparse indexes for them. For example, you can optimize for range queries in headphone frequency response:
db.products.ensureIndex({”attrs.freq_low”: 1, “attrs.freq_hi”: 1},{sparse: true})
You can also efficiently search hard disks by rotation speed with the following index:
db.products.ensureIndex({”attrs.rotation”: 1}, {sparse: true})
If your attributes are completely unpredictable, then you can’t build a separate index for each one. You have to use a different strategy in this case as illustrated by the following sample document:
{ _id: ObjectId(”4d669c225d3a52568ce07646″)
sku: “ebd-123″
name: “Hi-Fi Earbuds”,
type: “Headphone”,
attrs: [ {n: “color”, v: “silver”},
{n: “freq_low”, v: 20},
{n: “freq_hi”, v: 22000},
{n: “weight”, v: 0.5}
]
}Here attrs points to an array of sub-documents. Each of these documents has two values,n and v, corresponding to each dynamic attribute’s name and value. This normalized representation allows you to index these attributes using a single compound index:
db.products.ensureIndex({”attrs.n”: 1, “attrs.v”: 1})
You can then query using these attributes, but to do that, you must use the $elemMatch query operator:
db.products.find({attrs: {$elemMatch: {n: “color”, v: “silver”}}})
这个设计还是挺拧的;
namespaces文件的限制
The database files themselves are all named after the database they belong to. garden.ns is the first file to be generated. The file’s extension, ns, stands for namespaces. Every collection and index in a database gets its own namespace, and the metadata for each namespace is stored in this file. By default, the .ns file is fixed to 16 MB, which lets it store approximately 24,000 namespaces. This means that the sum of the number of indexes and collections in your database can’t exceed 24,000. You’re not likely to need anywhere close to this number of collections or indexes. But on the off chance that you need even more, you can makes the file larger by using the –nssize server option.
Limits on document size
BSON documents in MongoDB v2.0 are limited to 16 MB in size.
If you’re simply storing large binary objects, like images or videos, that’s a slightly different case — GridFS.
The number has varied by server version and is continually increasing. To see the limit for your server version,run db.ismaster from the shell, and examine the maxBsonObjectSize field. If you can’t find this field, then the limit is 4 MB (and you’re using a very old version of MongoDB).
mmap– MongoDB如何使用内存
MongoDB tells the operating system to map all data files to memory using the mmap() system call. From this point on, the data files,which include all documents, collections, and their indexes, are swapped in and out of RAM by the operating system in 4 KB chunks called pages.
Whenever data from a given page is requested, the operating system must ensure that the page is available in RAM. If it’s not, then a kind of exception known as a page fault is raised, and this tells the memory manager to load the page from disk into RAM.With sufficient RAM, all of the data files in use will eventually be loaded into memory.
Whenever that memory is altered, as in the case of a write, those changes will be flushed to disk asynchronously by the OS, but the write will be fast, occurring directly in RAM. When data fits into RAM, you have the ideal situation because the number of disk accesses is reduced to a minimum. But if the working data set can’t fit into RAM, then page faults will start to creep up. This means that the operating system will begoing to disk frequently, greatly slowing read and write operations. In the worst case, as data size becomes much larger than available RAM, a situation can occur where, for any read or write, data must be paged to and from disk. This is known as thrashing, and it causes performance to take a severe dive.
MongoDB中read block write,较长时间的磁盘访问会降低并行和吞吐量。
内存是如何影响性能的,推荐是index都被缓存在内存中
It’s important to keep an eye on total index size, as database performance will be best
when all utilized indexes can fit in RAM.db.stats()
{
“collections” : 3,
“objects” : 10004,
“avgObjSize” : 36.005,
“dataSize” : 360192,
“storageSize” : 791296,
“numExtents” : 7,
“indexes” : 1,
“indexSize” : 425984,
“fileSize” : 201326592,
“ok” : 1
}
Custom types,例如存储带有时区的时间
But what if you must store your times with their time zones? Sometimes the basic BSON types don’t suffice. Though there’s no way to create a custom BSON type, you can compose the various primitive BSON values to create your own virtual type. For instance, if you wanted to store times with zone, you might use a document structure like this, in Ruby
{:time_with_zone =>
{:time => Time.utc.now,
:zone => “EST”
}
}算是充分利用文档结构的数据类型
a single global reader-writer lock 如何影响并行
It’s important to understand how concurrency works in MongoDB.
As of MongoDB v2.0, the locking strategy is rather coarse; a single global reader-writer lock reigns over the entire mongod instance. What this means is that at any moment in time, the database permits either one writer or multiple readers (but not both). This sounds a lot worse than it is in practice because there exist quite a few concurrency optimizations around this lock.
One is that the database keeps an internal map of which document are in RAM. For requests to read or write documents not residing in RAM, the database yields to other operations until the document can be paged into memory.
A second optimization is the yielding of write locks. The issue is that if any one write takes a long time to complete, all other read and write operations will be blocked for the duration of the original write. All inserts, updates, and removes take a write lock. Inserts rarely take a long time to complete. But updates that affect, say, an entire collection, as well as deletes that affect a lot of documents, can run long. The current solution to this is to allow these long-running ops to yield periodically for other readers and writers. When an operation yields, it pauses itself, releases its lock, and resumes later. But when updating and removing documents, this yielding behavior can be amixed blessing. It’s easy to imagine situations where you’d want all documents updated or removed before any other operation takes place.
For these cases, you can use a special option called $atomic to keep the operation from yielding. You simply add the $atomic operator to the query selector like so:
db.reviews.remove({user_id: ObjectId(’4c4b1476238d3b4dd5000001′),{$atomic: true}})
The same can be applied to any multi-update. This forces the entire multi-update to complete in isolation:
db.reviews.update({$atomic: true}, {$set: {rating: 0}}, false, true)
This update sets each review’s rating to 0. Because the operation happens in isolation,the operation will never yield, ensuring a consistent view of the system at all times.
因为单一global RW lock对创建/重建索引的影响
you can check the index build progress by running the shell’s currentOp() method:10
> db.currentOp()
{”inprog” : [{”opid” : 58,
“active” : true,
“lockType” : “write”,
“waitingForLock” : false,
“secs_running” : 55,
“op” : “insert”,
“ns” : “stocks.system.indexes”,
“query” : {},
“client” : “127.0.0.1:53421″,
“desc” : “conn”,
“msg” : “index: (1/3) external sort 3999999/4308303 92%”}]}The last field, msg, describes the build’s progress.
Note also the lockType, which indicates that the index build takes a write lock. This means that no other client can read or write from the database at this time. If you’re running in production, this is obviously a bad thing, and it’s the reason why long index builds can be so vexing. We’re going to look right now at two possible solutions to this problem.
Background indexingIf you’re running in production and can’t afford to halt access to the database, you can specify that an index be built in the background. Although the index build will still take a write lock, the job will yield to allow other readers and writers to access the database. If your application typically exerts a heavy load on MongoDB, then a background index build will degrade performance, but this may be acceptable under certain circumstances. For example, if you know that the index can be built within a time window where application traffic is at a minimum, then background indexing in this case might be a good choice.
To build an index in the background, specify {background: true} when you declare the index. The previous index can be built in the background like so:
db.values.ensureIndex({open: 1, close: 1}, {background: true})
Offline indexing
If your production data set is too large to be indexed within a few hours, then you’ll need to make alternate plans. This will usually involve taking a replica node offline, building the index on that node by itself, and then allowing the node to catch up with the master replica. Once it’s caught up, you can promote the node to primary and then take another secondary offline and build its version of the index. This tactic presumes that your replication oplog is large enough to prevent the offline node from becoming stale during the index build.
You can do this by dropping and recreating individual indexes or by running the reIndex command, which will rebuild all indexes for a given collection:
db.values.reIndex();
Be careful about reindexing: the command will take out a write lock for the duration of the rebuild, temporarily rendering your MongoDB instance unusable. Reindexing is best done offline, as described earlier for building indexes on a secondary. Note that the compact command, will also rebuild indexes for the collection on which it’s run.
WHEN TO SHARD
The question of when to shard is more straightforward than you might expect. We’ve talked about the importance of keeping indexes and the working data set in RAM, and this is the primary reason to shard.
If an application’s data set continues to grow unbounded, then there will come a moment when that data no longer fits in RAM.
To be sure, there are some fudge factors here. For instance, if you have your own hardware and can store all your data on solid state drives (an increasingly affordable prospect), then you’ll likely be able to push the data-to-RAM ratio without negatively affecting performance.
Whatever the case, the decision to shard an existing system will always be based on regular analyses of disk activity, system load, and the ever-important ratio of working set size to available RAM.
indexes in shard
a few more points about indexes should bekept in mind when running a sharded cluster.
1 Each shard maintains its own indexes. This should be obvious, but to be clear,know that when you declare an index on a sharded collection, each shard builds a separate index for its portion of the collection. For example, when you issued the db.spreasheets.ensureIndex() command via mongos in the previous section, each individual shard processed the index creation command individually.
2 It follows that the sharded collections on each shard should have the same indexes. If this ever isn’t the case, you’ll see inconsistent query performance.
3 Sharded collections permit unique indexes on the _id field and on the shard key only. Unique indexes are prohibited elsewhere because enforcing them would require intershard communication, which is complicated and still deemed too slow to be worth implementing.
Sharding an existing collection
You can shard existing collections, but don’t be surprised if it takes some time to distribute the data across shards. Only one balancing round can happen at a time, and the migrations will move only around 100-200 MB of data per minute. Thus, sharding a 50 GB collection will take around eight hours, and this will likely involve some moderate disk activity.
In addition, when you initially shard a large collection like this, you may have to split manually to expedite the sharding process, since splitting is triggered by inserts.
最好还是从一开始就设计好sharding
MANUAL PARTITIONING
There are a couple of cases where you may want to manually split and migrate chunks on a live shard cluster. For example, as of MongoDB v2.0, the balancer doesn’t directly take into account the load on any one shard. Obviously, the more a shard is written to, the larger its chunks become, and the more likely they are to eventually migrate. Nevertheless, it’s not hard to imagine situations where you’d be able to alleviate load on a shard by migrating chunks. This is another situation where the movechunk command can be helpful.
> sh.splitAt( “cloud-docs.spreadsheets”,
{ “username” : “Chen”, “_id” : ObjectId(”4d6d59db1d41c8536f001453″) })> sh.moveChunk(”cloud-docs.spreadsheets”, {username: “Chen”}, “shardB”)
比方说hot chunk,通过手工migration,可以将hot chunk分散到多个shards中。

加到书签, 这个一定要抽时间看.
mongod
The mongod process uses a modified reader/writer lock with dynamic yielding on page faults and long operations. Any number of concurrent read operations are allowed, but a write operation can block all other operations.
mongod threads yield their lock (read or write) in two classes of situations:
yield-on-page-fault – v2.0 implements a yield-on-page-fault feature which results in much more concurrency than one would achieve with a pure reader/writer lock. For common operational cases, file system page faults are detected in advanced and handled outside of any lock, then the lock is resumed. Not all fault situations yield, but many do. This results in v2.0 having much better concurrency in practice than v1.8.
yield-on-long-operation – mongod also yields periodically on common operations that are extremely long running. The goal here is to allow interleaving so that other operations which are quick-running can execute soon. Operations which yield include the following:
queries
multi document updates
multi document removes/deletes
bulk inserts
Write lock acquisition is greedy: a pending write lock acquisition will prevent further read lock acquisitions until fulfilled. Thus yielding by reads can be important.
Collection level locking is under development. SERVER-1240.