python学习笔记08-python-进程和线程

1.多进程

1.fork

Unix/Linux操作系统提供了一个fork()系统调用，它非常特殊。普通的函数调用，调用一次，返回一次，但是fork()调用一次，返回两次，因为操作系统自动把当前进程（称为父进程）复制了一份（称为子进程），然后，分别在父进程和子进程内返回。
子进程永远返回0，而父进程返回子进程的ID。这样做的理由是，一个父进程可以fork出很多子进程，所以，父进程要记下每个子进程的ID，而子进程只需要调用getppid()就可以拿到父进程的ID。

#!/usr/bin/env python
# -*- coding: utf-8 -*-
#引入os和time模块
import os,time
print 'Process (%s) start...' % os.getpid()
#复制当前进程
#os.fork()方法会返回再次（分别在子进程和父进程中返回一次）
pid = os.fork()
#当pid为0，说明是子进程
if pid==0:
        #getppid是获取父进程的pid，注意，如果父进程比子进程先执行完并退出了，getppid获取到的值会永远为1
        print 'I am child process (%s) and my parent is %s.' % (os.getpid(),os.getppid())
#当pid不为0说明是父进程
else:
        #让当前进程等待0.1秒(即，让主进程等待0.1秒，好让子进程先执行完)
        time.sleep(0.1)
        print ' I (%s) just create a child process (%s).' % (os.getpid(),pid)

2.multiprocessing——Process

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from multiprocessing import Process
import os
#子进程要执行的代码
def run_proc(name):
        print 'Run child process %s (%s)...' % (name,os.getpid())
#判断是否是直接运行的当前py文件
if __name__=='__main__':
        print 'Parent process %s.' % os.getpid()
        #创建一个Process对象，传入任务和参数
        p = Process(target=run_proc,args=('test',))
        print 'Process will start.'
        #启动一个新进程
        p.start()
        #在主进程中调用p.join()方法，可以让主进程在此阻塞
        #主进程会等待子进程（p）执行完成之后，再继续往下执行
        p.join()
        print 'Process end.'

3.multiprocessing——Pool

#!/usr/bin/env python
# -*- coding: utf-8 -*-
#引入Pool模块
from multiprocessing import Pool
#引入os、time、random模块
import os,time,random
#子进程将要执行的任务
def long_time_task(name):
        print 'Run task %s (%s)...' % (name,os.getpid())
        #使用time.time()获取当前时间(秒)
        start = time.time()
        #使用random.random()产生一个0到1的随机数
        time.sleep(random.random()*3)
        #获取结果时间
        end = time.time()
        print 'Task %s runns %0.2f seconds.' % (name,(end - start))
if __name__=='__main__':
        print 'Parent process %s.' % os.getpid()
        #创建一个进程池
        #默认的最大进程数为CPU的核心数。当然也可以自己定义最大进程数。如p = Pool(5)
        pl = Pool()
        for i in range(5):
                pl.apply_async(long_time_task,args=(i,))
        print 'Waiting for all subprocesses done...'
        #当调用了进程池的close方法后，将不允许再往里面添加任务
        pl.close()
        #只要调用了进程池的close方法后，才允许调用进程池的join方法
        #让主进程等待进程池中所有任务完成，再继续向下执行
        pl.join()
        print 'All subprocesses done.'

4.进程间通信

Python的multiprocessing模块包装了底层的机制，提供了Queue、Pipes等多种方式来交换数据

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from multiprocessing import Process,Queue
import os,time,random
#写数据的进程执行的代码
def write(q):
        for value in ['A','B','C']:
                print 'Put %s to queue...' % value
                q.put(value)
                #让写进程等待片刻
                time.sleep(random.random())
#读数据进程执行的代码
def read(q):
        while True:
                value = q.get(True)
                print 'Get %s from queue.' % value
if __name__=='__main__':
        #父进程创建Queue,并传给各个子进程:
        q = Queue()
        pw = Process(target=write,args=(q,))
        pr = Process(target=read,args=(q,))
        #启动子进程pw,写入:
        pw.start()
        #启动子进程pr,读取
        pr.start()
        #等待pw结束
        pw.join()
        #pr进程是死循环,无法等待其结束,只能强行终止
        pr.terminate()
        print '---END---'

2.多线程

Python的线程是真正的Posix Thread，而不是模拟出来的线程。
Python的标准库提供了两个模块：thread和threading，thread是低级模块，threading是高级模块，对thread进行了封装。绝大多数情况下，我们只需要使用threading这个高级模块。
由于任何进程默认就会启动一个线程，我们把该线程称为主线程，主线程又可以启动新的线程，Python的threading模块有个current_thread()函数，它永远返回当前线程的实例。主线程实例的名字叫MainThread，子线程的名字在创建时指定，我们用LoopThread命名子线程。名字仅仅在打印时用来显示，完全没有其他意义。
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import time,threading
#新线程执行的代码
def loop():
        #打印子线程的名字
        print 'thread %s is running...' % threading.current_thread().name
        n = 0
        while n < 5:
                n = n + 1
                print 'thread %s >>> %s' % (threading.current_thread().name,n)
                #子线程暂停0.5秒
                time.sleep(0.5)
        print 'thread %s ended.' % threading.current_thread().name
#打印主线程的名字
print 'thread %s is running...' % threading.current_thread().name
#创建一个新的子线程，命名为LoopThread
t = threading.Thread(target=loop,name='LoopThread')
#启动子线程LoopThread
t.start()
#主线程调用join方法，等待子线程执行完毕再继续执行后续代码
t.join()
#打印主线程结束的信息
print 'thread %s ended.' % threading.current_thread().name

3.锁——Lock

Python的线程虽然是真正的线程，但解释器执行代码时，有一个GIL锁：Global Interpreter Lock，任何Python线程执行前，必须先获得GIL锁，然后，每执行100条字节码，解释器就自动释放GIL锁，让别的线程有机会执行。这个GIL全局锁实际上把所有线程的执行代码都给上了锁，所以，多线程在Python中只能交替执行，即使100个线程跑在100核CPU上，也只能用到1个核。
GIL是Python解释器设计的历史遗留问题，通常我们用的解释器是官方实现的CPython，要真正利用多核，除非重写一个不带GIL的解释器。
Python虽然不能利用多线程实现多核任务，但可以通过多进程实现多核任务。多个Python进程有各自独立的GIL锁，互不影响。
Python解释器由于设计时有GIL全局锁，导致了多线程无法利用多核。多线程的并发在Python中就是一个美丽的梦
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import time,threading
#假定这是你的银行存款
balance = 0
def change_it(n):
        #先存后取,结果应该为0
        global balance
        balance = balance + n
        balance = balance - n
#创建一个锁
lock = threading.Lock()
def run_thread(n):
        for i in range(100000):
                #先要获取锁，才能往下执行
                lock.acquire()
                try:
                        change_it(n)
                finally:
                        #在finally模块中，释放锁
                        lock.release()
t1 = threading.Thread(target=run_thread,args=(5,))
t2 = threading.Thread(target=run_thread,args=(8,))
t1.start()
t2.start()
t1.join()
t2.join()
print balance

4.ThreadLocal

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import threading,time
#创建ThreadLocal对象
local_thread_arr = threading.local()
def process_thread(name):
        #绑定ThreadLocal的student
        local_thread_arr.student = name
        time.sleep(0.5)
        print 'Hello,%s (in %s)' % (local_thread_arr.student,threading.current_thread().name)
t1 = threading.Thread(target=process_thread,args=('Alice',),name='Thread-A')
t2 = threading.Thread(target=process_thread,args=('Bob',),name='Thread-B')
t1.start()
t2.start()
t1.join()
t2.join()

5.分布式进程

1.taskmanager.py

服务进程，服务进程负责启动Queue，把Queue注册到网络上，然后往Queue里面写入任务

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import random,time,Queue
from multiprocessing.managers import BaseManager
#发送任务的队列
task_queue = Queue.Queue()
#接收结果的队列
result_queue = Queue.Queue()
#从BaseManager继承的QueueManager
class QueueManager(BaseManager):
        pass
#把两个Queue都注册到网络上,callable参数关联了Queue对象
QueueManager.register('get_task_queue',callable=lambda: task_queue)
QueueManager.register('get_result_queue',callable=lambda: result_queue)
#绑定端口5000,设置验证码'abc'
manager = QueueManager(address=('',5000),authkey='abc')
#启动Queue
manager.start()
#获得通过网络访问的Queue对象
task = manager.get_task_queue()
result = manager.get_result_queue()
#放几个任务进去
for i in range(5):
        n =random.randint(0,1000)
        print('Put task %d...' % n )
        task.put(n)
#从result队列读取结果
print('Try get results...')
for i in range(5):
        r = result.get(timeout=10)
        print('Result: %s ' % r)
print 'over'
#time.sleep(10)
#关闭

请注意，当我们在一台机器上写多进程程序时，创建的Queue可以直接拿来用，但是，在分布式多进程环境下，添加任务到Queue不可以直接对原始的taskqueue进行操作，那样就绕过了QueueManager的封装，必须通过manager.gettask_queue()获得的Queue接口添加。

2.taskmanager.py

在另一台机器上启动任务进程（本机上启动也可以）

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import time,sys,Queue,random
from multiprocessing.managers import BaseManager
#创建类似的QueueManager
class QueueManager(BaseManager):
        pass
#由于这个QueueManager只从网络上获取Queue,所以注册时只提供名字
QueueManager.register('get_task_queue')
QueueManager.register('get_result_queue')
#连接到服务器,也就是运行taskmanager.py的机器
server_addr = '192.168.56.111'
print('Connect to server %s...' % server_addr)
#端口和验证码注意保持与taskmanager.py设置一致
m = QueueManager(address=(server_addr,5000),authkey='abc')
#从网络连接
m.connect()
#获取Queue的对象
task = m.get_task_queue()
result = m.get_result_queue()
#从task队列取任务,并把结果写入result队列
for i in range(10):
        try:
                n = task.get(timeout=1)
                print('run task %d * %d ..' % (n,n))
                r = '%d * %d = %d' % (n,n,n*n)
                time.sleep(random.random()*2)
                result.put(r)
        except Queue.Empty:
                print('task queue is empty.')
        except EOFError:
                print('the server(%s) is down.' % server_addr)
#处理结束
print('worker exit.')