本地安装mongodb
1.ubuntu安装
sudo apt-get install mongodb 会安装很多依赖的包
安装完后会自动启动服务
开始:sudo service mongodb start
停止:sudo service mongodb stop
xiaofei@xiaofei-desktop:~$ ps aux | grep mongodb
mongodb 12470 0.0 0.1 70348 3624 ? Ssl 16:20 0:00 /usr/lib/mongodb/mongod --config /etc/mongodb.conf
xiaofei 12496 0.0 0.0 3544 832 pts/3 S+ 16:20 0:00 grep --color=auto mongodb
2.看看/etc/mongodb.conf文件
# This is an config file for MongoDB master daemon mongod
# it is passed to mongod as --config parameter
logpath = /var/log/mongodb/mongod.log
dbpath = /var/lib/mongodb/
# use 'true' for options that don't take an argument
logappend = true
bind_ip = 127.0.0.1
#noauth = true
3.浏览器中输入http://127.0.0.1:28017/,即可查看数据库一些基本系统信息
连接mongodb并建立一个新数据库
1.安装 pymongo
easy_install pymongo
连接mongodb
>>> from pymongo import Connection
>>> conn = Connection("127.0.0.1",27017)
如果不存在就创建一个icbase
>>> db = conn.icbase
>>> db
Database(Connection('127.0.0.1', 27017), u'icbase')
在db上创建一个document
>>> attrs = db.attrs
>>> import datetime
>>> attr = {'author':'Mike','text':'My first blog post!','tags':["mongodb", "python", "pymongo"],'date':datetime.datetime.utcnow()}
>>> attrs.insert(attr)
ObjectId('500e65103ec1ee314c000001')
查询
>>> attrs.find_one()
{u'date': datetime.datetime(2012, 7, 24, 9, 3, 55, 440000), u'text': u'My first blog post!', u'_id': ObjectId('500e65103ec1ee314c000001'), u'author': u'Mike', u'tags': [u'mongodb', u'python', u'pymongo']}
>>> attrs.find_one({'author':'mike'})
>>> attrs.find_one({'author':'Mike'})
{u'date': datetime.datetime(2012, 7, 24, 9, 3, 55, 440000), u'text': u'My first blog post!', u'_id': ObjectId('500e65103ec1ee314c000001'), u'author': u'Mike', u'tags': [u'mongodb', u'python', u'pymongo']}
删除
>>> attrs.remove({'author':'Mike'})
写脚本将icbase的信息插入到mongodb
写道
{'ic_id':xxx,'ic_partno':xxx,'ic_mfr':mfr_id,'attrname1_id':attrvalue1,'attrname2_id':attrvalue2,......}
*mfr_id: 是icbase_mfr的id
*attrname1_id: 是icbase_attrname的id
*attrvalue1: 是icbase_icattrvalue的value
{ "_id" : ObjectId("500f6a7b3ec1ee0edd0009f1"), "600" : "100 Ohms", "669" : "PTC", "3862" : "1", "706" : "20 %", "577" : "Thermistors - PTC", "576" : "PTC", "136" : "300 mAmps", "ic_partno" : "PRG18BB101MB1RB", "ic_id" : 2545, "695" : "SMD/SMT", "3576" : "http://www.murata.com/products/catalog/pdf/r90e.pdf", "540" : "Reel", "3856" : "http://cn.mouser.com/Catalog/Simplified_Chinese/638/384.pdf", "3854" : "Thermistors - PTC 100 OHM 24V", "3855" : "81-PRG18BB101MB1RB", "3853" : "PRG18BB101MB1RB", "739" : "24 V", "629" : "PRG", "501" : "- 10 C to + 60 C", "612" : "是", "321" : "Murata", "161" : "0.8 mm W x 1.6 mm L x 0.8 mm H", "ic_mfr" : 119, "538" : "0603" }
所有数据导完预计30-40分钟
273W,总共花掉:0:50:09.411991
尝试在icgoo中使用这些数据
1.读取每个型号的详细参数时使用;
2.详细参数过滤时使用;
在改变之前,首先再次检查了一下在调用产品详细参数的地方,将所以多余的要读库的行为去掉,
只使用attr_name_id,不再循环内通过该id去AttrName找对象.
1.product.models
class Product:
有一个icbase方法,
#ic = IC.objects.get(pk=self.ic_id)
#if ic:
return self.ic.attrs
本来self.ic就是外键对象,可以直接用,注释掉前二句话,不过外键的调用好像也是跟上面是一样,这样改应该效果没有变化
2.在做参数过滤时,对型号进行循环时
#try:
#obj_key = AttrName.objects.get(id=key)
#except:
#continue
原先是要将key从AttrName中读取对象的,现在不读对象,还是只保存id_key
查看我的mongodb状态
写道
> db.attrs.stats()
{
"ns" : "icbase.attrs",
"count" : 2731252,
"size" : 1415603180,
"storageSize" : 1558881712,
"nindexes" : 1,
"ok" : 1
}
> db.attrs.totalIndexSize()
208570256
> db.attrs.getIndexes()
[
{
"name" : "_id_",
"ns" : "icbase.attrs",
"key" : {
"_id" : ObjectId("000000000000000000000000")
}
}
]
创建ic_id的索引,但是失败
> db.attrs.ensureIndex({ 'ic_id' : 1 })
Thu Jul 26 14:07:41 MessagingPort recv() error "Connection reset by peer" (104) 127.0.0.1:27017
Thu Jul 26 14:07:41 JS Error: Error: error doing query: failed (anon):100
Thu Jul 26 14:07:41 trying reconnect to 127.0.0.1
Thu Jul 26 14:07:41 reconnect 127.0.0.1 ok
Thu Jul 26 14:07:41 JS Error: Error: error doing query: failed (anon):100
看了一下日志:
Thu Jul 26 14:49:03 building new index on { ic_id: 1.0 } for icbase.attrs...
Thu Jul 26 14:49:03 Buildindex icbase.attrs idxNo:1 { ns: "icbase.attrs", key: { ic_id: 1.0 }, name: "ic_id_1" }
1166400/2731251 42%
Thu Jul 26 14:49:16 shutdown: going to flush oplog...
Thu Jul 26 14:49:16 shutdown: going to close sockets...
Thu Jul 26 14:49:16 shutdown: waiting for fs...
Thu Jul 26 14:49:16 shutdown: closing all files...
Thu Jul 26 14:49:16 closeAllFiles() finished
Thu Jul 26 14:49:16 connection accepted from 127.0.0.1:33860 #2
Thu Jul 26 14:49:16 shutdown: removing fs lock...
Thu Jul 26 14:49:16 Listener on port 27017 aborted
Thu Jul 26 14:49:16 dbexit: really exiting now
ERROR: Client::shutdown not called!
Thu Jul 26 14:49:48 Mongo DB : starting : pid = 7291 port = 27017 dbpath = /var/lib/mongodb/ master = 0 slave = 0 32-bit
** NOTE: when using MongoDB 32 bit, you are limited to about 2 gigabytes of data
** see http://blog.mongodb.org/post/137788967/32-bit-limitations for more
应该是一个数据块在32位mongodb下不能超过2G,原来attrs就有1.5G了,在索引创建到42%的时候应该就超过这个限制了,所以不成功.......
默认情况下每个表都会有一个唯一索引:_id,如果插入数据时没有指定_id,服务会自动生成一个_id,为了充分利用已有索引,减少空间开销,最好是自己指定一个unique的key为_id,通常用对象的ID比较合适,比如商品的ID。
_id的索引占用情况
> db.attrs.totalIndexSize()
208570256
> db.attrs.dropIndex('_id_')
{ "nIndexesWas" : 1, "errmsg" : "may not delete _id index", "ok" : 0 }
不能删除
*****
或者可以重新插入数据,但在插入数据的时候手动指定'_id' = ic_id,这样会'_id_'索引就是ic_id的索引
重新在icbase上建一张新表:attrs2,将'_id'指定为ic_id
{ "_id" : 18, "740" : "50 Volts", "612" : "是", "ic_id" : 18, "695" : "SMD/SMT", "55" : "1000 pF", "706" : "10 %", "577" : "多层陶瓷电容 (MLCC) - SMD/SMT", "576" : "General Type MLCCs", "ic_mfr" : 94, "501" : "- 55 C to + 125 C", "540" : "Reel", "321" : "Kemet", "168" : "0.1", "688" : "C0G (NP0)", "ic_partno" : "C1812C102K5GACTU", "161" : "3.2 mm W x 4.5 mm L", "538" : "1812 (4532 metric)" }
> db.attrs2.stats()
{
"ns" : "icbase.attrs2",
"count" : 1468431,
"size" : 703207344,
"storageSize" : 726707424,
"nindexes" : 1,
"ok" : 1
}
{
"ns" : "icbase.attrs",
"count" : 2731252,
"size" : 1415603180,
"storageSize" : 1558881712,
"nindexes" : 1,
"ok" : 1
}
> db.attrs.totalIndexSize()
208570256
> db.attrs.getIndexes()
[
{
"name" : "_id_",
"ns" : "icbase.attrs",
"key" : {
"_id" : ObjectId("000000000000000000000000")
}
}
]
创建ic_id的索引,但是失败
> db.attrs.ensureIndex({ 'ic_id' : 1 })
Thu Jul 26 14:07:41 MessagingPort recv() error "Connection reset by peer" (104) 127.0.0.1:27017
Thu Jul 26 14:07:41 JS Error: Error: error doing query: failed (anon):100
Thu Jul 26 14:07:41 trying reconnect to 127.0.0.1
Thu Jul 26 14:07:41 reconnect 127.0.0.1 ok
Thu Jul 26 14:07:41 JS Error: Error: error doing query: failed (anon):100
看了一下日志:
Thu Jul 26 14:49:03 building new index on { ic_id: 1.0 } for icbase.attrs...
Thu Jul 26 14:49:03 Buildindex icbase.attrs idxNo:1 { ns: "icbase.attrs", key: { ic_id: 1.0 }, name: "ic_id_1" }
1166400/2731251 42%
Thu Jul 26 14:49:16 shutdown: going to flush oplog...
Thu Jul 26 14:49:16 shutdown: going to close sockets...
Thu Jul 26 14:49:16 shutdown: waiting for fs...
Thu Jul 26 14:49:16 shutdown: closing all files...
Thu Jul 26 14:49:16 closeAllFiles() finished
Thu Jul 26 14:49:16 connection accepted from 127.0.0.1:33860 #2
Thu Jul 26 14:49:16 shutdown: removing fs lock...
Thu Jul 26 14:49:16 Listener on port 27017 aborted
Thu Jul 26 14:49:16 dbexit: really exiting now
ERROR: Client::shutdown not called!
Thu Jul 26 14:49:48 Mongo DB : starting : pid = 7291 port = 27017 dbpath = /var/lib/mongodb/ master = 0 slave = 0 32-bit
** NOTE: when using MongoDB 32 bit, you are limited to about 2 gigabytes of data
** see http://blog.mongodb.org/post/137788967/32-bit-limitations for more
应该是一个数据块在32位mongodb下不能超过2G,原来attrs就有1.5G了,在索引创建到42%的时候应该就超过这个限制了,所以不成功.......
默认情况下每个表都会有一个唯一索引:_id,如果插入数据时没有指定_id,服务会自动生成一个_id,为了充分利用已有索引,减少空间开销,最好是自己指定一个unique的key为_id,通常用对象的ID比较合适,比如商品的ID。
_id的索引占用情况
> db.attrs.totalIndexSize()
208570256
> db.attrs.dropIndex('_id_')
{ "nIndexesWas" : 1, "errmsg" : "may not delete _id index", "ok" : 0 }
不能删除
*****
或者可以重新插入数据,但在插入数据的时候手动指定'_id' = ic_id,这样会'_id_'索引就是ic_id的索引
重新在icbase上建一张新表:attrs2,将'_id'指定为ic_id
{ "_id" : 18, "740" : "50 Volts", "612" : "是", "ic_id" : 18, "695" : "SMD/SMT", "55" : "1000 pF", "706" : "10 %", "577" : "多层陶瓷电容 (MLCC) - SMD/SMT", "576" : "General Type MLCCs", "ic_mfr" : 94, "501" : "- 55 C to + 125 C", "540" : "Reel", "321" : "Kemet", "168" : "0.1", "688" : "C0G (NP0)", "ic_partno" : "C1812C102K5GACTU", "161" : "3.2 mm W x 4.5 mm L", "538" : "1812 (4532 metric)" }
> db.attrs2.stats()
{
"ns" : "icbase.attrs2",
"count" : 1468431,
"size" : 703207344,
"storageSize" : 726707424,
"nindexes" : 1,
"ok" : 1
}
MongoDB数据文件内部结构:
http://blog.nosqlfan.com/html/3515.html