电脑配置是固态硬盘,i7cpu,不进行计算,只读数据,不做数据增广,只做resize
只看不赞不文明,这是大约花费两天时间测试结果
目录:
1,最初级版
2,将每次生成key的操作换成从内存中查找
3,用lmdb来加速
4,其他读取的测试1
1,最初级版
DataLoader(train_dataset, batch_size=16, shuffle=True,
num_workers=0, pin_memory=True, drop_last=True, )
2020-12-23 11:20:06
0
2020-12-23 11:21:53
10000
读取了一万张图片花费时间107秒,采用的程序是最直接的pytorch的data loader结构,
全部图片大约130W张,读取全部数据预计花费13000秒,除以3600即是3.61小时。
将batch size从16变为256看是否有关系?
DataLoader(train_dataset, batch_size=256, shuffle=True,
num_workers=0, pin_memory=True, drop_last=True, )
2020-12-23 11:26:53
0
2020-12-23 11:28:40
10240
也是107秒,没任何关系
worker数量调为4
2020-12-23 11:30:30
0
2020-12-23 11:31:14
10240
花费44秒,全部读完 130W /10000 *44 /3600=1.59小时
速度大约快了一倍
worker 8
2020-12-23 11:36:04
0
2020-12-23 11:36:45
10240
41秒,没怎么快 1.48 小时
感觉那个key每次生成一个keylist会很慢,将其加到内存试试
self.keys=list(self.truth.keys())
def __len__(self):
return len(self.keys)
# 返回的box为xmin,ymin,xmax,ymax绝对坐标,图像为未白化的float32图像
def __getitem__(self, index):
img_path = self.keys[index]
train_loader = DataLoader(train_dataset, batch_size=256, shuffle=True,
num_workers=4, pin_memory=True, drop_last=True, )
2020-12-23 11:45:51
0
2020-12-23 11:46:11
10240
花费20秒,这个读全部图片大约花费54分钟
3,这一次使用lmdb来加速
train_loader = DataLoader(train_dataset, batch_size=256, shuffle=True,
num_workers=0, pin_memory=True, drop_last=True, )
2020-12-23 14:46:37
0
2020-12-23 14:47:19
10240
花费40秒,大约1.48小时读完
workers调成4报错。。但是就算调成4的话,快一倍多一点,也就是小于但接近20秒,似乎提升不大。
改进
看教程https://www.cnblogs.com/jiangkejie/p/13192518.html
似乎要修改下,修改后
2020-12-23 14:58:15
0
2020-12-23 14:58:59
10240
没啥卵用啊,44秒,还慢了。而且workers 4就报错。
再改进
2020-12-23 15:04:22
env
env
env
env
0
2020-12-23 15:04:37
10240
把env放到类的外面,变成全局变量,就可以4worker了,速度为15秒,快了一丢丢。
2020-12-23 15:11:24
10240
2020-12-23 15:11:35
20480
2020-12-23 15:11:50
30720
129024
2020-12-23 15:13:49
15*320/3600 =0.54 这次大约半小时就可以读完全部数据。就这样吧
2020-12-23 15:21:04
env
env
env
env
0
2020-12-23 15:21:20
10240
2020-12-23 15:21:32
20480
2020-12-23 15:21:59
43008
残差网络训练了118个epoch,因此我们需要训练至少118*0.5小时,也即是59小时。
放弃!
4 其他读取测试1
原始速度
5 [0.1] 0.0 2021-03-20 13:33:20
train_iters::300
train_acc::0.3857
test_acc::0.3256
validation_acc::0.34
9 [0.1] 0.0 2021-03-20 13:38:16
train_iters::600
train_acc::0.2859
test_acc::0.2804
validation_acc::0.313
13 [0.1] 0.0 2021-03-20 13:43:07
train_iters::900
train_acc::0.3087
test_acc::0.2986
validation_acc::0.3354
将workers 从0调为1
5 [0.1] 0.0 2021-03-20 13:54:18
train_iters::300
train_acc::0.3816
test_acc::0.3041
validation_acc::0.3371
9 [0.1] 0.0 2021-03-20 14:00:42
train_iters::600
train_acc::0.1968
test_acc::0.2511
validation_acc::0.2732
13 [0.1] 0.0 2021-03-20 14:07:07
train_iters::900
train_acc::0.3428
test_acc::0.2583
validation_acc::0.3065
将workers 从0调为2
worker_num=2
coco_g = coco_data_generater(r'D:\zy\data\coco2017\train2017')
train_dataset = size_test_dataset(class_name_list, data_index_start_end=[0, train_data_num], coco_g=coco_g)
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True,
num_workers=worker_num, pin_memory=True, drop_last=True, collate_fn=collate)
test_dataset = size_test_dataset(class_name_list, data_index_start_end=[-1000, None], coco_g=coco_g)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False,
num_workers=worker_num, pin_memory=True, drop_last=True, collate_fn=collate)
validation_dataset = size_test_dataset(class_name_list, data_index_start_end=[-2000, -1000], coco_g=coco_g)
validation_loader = DataLoader(validation_dataset, batch_size=batch_size, shuffle=False,
num_workers=worker_num, pin_memory=True, drop_last=True, collate_fn=collate)
5 [0.1] 0.0 2021-03-20 14:14:25
train_iters::300
train_acc::0.3842
test_acc::0.2853
validation_acc::0.3175
9 [0.1] 0.0 2021-03-20 14:19:41
train_iters::600
train_acc::0.2561
test_acc::0.272
validation_acc::0.3096
13 [0.1] 0.0 2021-03-20 14:24:56
train_iters::900
train_acc::0.1925
test_acc::0.2467
validation_acc::0.2609
将workers 从0调为4
5 [0.1] 0.0 2021-03-20 14:33:14
train_iters::300
train_acc::0.3858
test_acc::0.3451
validation_acc::0.3766
9 [0.1] 0.0 2021-03-20 14:39:50
train_iters::600
train_acc::0.2448
test_acc::0.265
validation_acc::0.2888
13 [0.1] 0.0 2021-03-20 14:46:28
train_iters::900
train_acc::0.304
test_acc::0.1576
validation_acc::0.1634
不知道为啥,并没有变快啊