python内置HTTP Server如何实现及原理解析
作者:王稀饭
应用案例
from http.server import HTTPServer, BaseHTTPRequestHandler IP = '127.0.0.1' PORT = 8000 class Handler(BaseHTTPRequestHandler): def do_GET(self): self.send_response(200) self.send_header('Content-type', 'text/html') self.end_headers() message = "Hello, World!" self.wfile.write(bytes(message, "utf8")) with HTTPServer((IP, PORT), Handler) as httpd: print("serving at port", PORT) httpd.serve_forever()
以上是使用内置模块 http.server
实现的一个最简单的 http 服务器,能处理 http GET 请求。
python 内置的 http server 主要集中在两个代码文件上,分别是 socketserver.py
和 http/server.py
。socketserver.py
提供 socket 通信能力的 Server 封装并预留了用户自定义请求处理的接口;http/server.py
基于前者做进一步封装,用得比较多的是 HTTP 的封装。
从开头的例子出发阅读代码(python 3.10.1),大致梳理出以下代码结构,图画得很随意无规范可言,只是为了更具象化解释。
问题一:实现一个 HTTP 服务器大致需要什么要素
先看图 1,左边 BaseServer
一列是类,从上往下是父类到子类;右边 server_forever()
一列是方法,从上往下是逐步深入的调用链。
从父类到子类 主线流程 +----------------+ +------------------+ | | | | | BaseServer +--------------------->| serve_forever() | | | | | +--------+-------+ +--------=+--------+ | | | | | | V V +----------------+ +----------------------------+ | | | | | TCPServer | | _handle_request_noblock() | | | | | +--------+-------+ +-------------+--------------+ | | +-----------+------------+ | | | | V V V +----------------+ +----------------+ +------------------+ | | | | | | | HTTPServer | | UDPServer | | process_request()| | | | | | | +----------------+ +----------------+ +---------+--------+ | | | V +------------------+ | | | finish_request() | | | +------------------+
图 1
例子中使用了 HTTPServer
这个类,字面意思,这个类就是一个 HTTP 服务器,顺着继承链看到 HTTPServer
是 TCPServer
的子类,符合 HTTP 报文是基于 TCP 协议传输的认知,HTTPServer
类其实没什么内容,代码如下:
class HTTPServer(socketserver.TCPServer): allow_reuse_address = 1 # Seems to make sense in testing environment def server_bind(self): """Override server_bind to store the server name.""" socketserver.TCPServer.server_bind(self) host, port = self.server_address[:2] self.server_name = socket.getfqdn(host) self.server_port = port
TCPServer
的源码实现得益于父类的预留接口,只需要 TCP socket 走一遍 bind
、listen
、accept
、close
流程(子类 UDPServer
同理)。
重点关注 BaseServer
,这里是网络请求处理核心流程的实现,文章最开头的例子中 serve_forever()
这个入口方法就是在此类被实现,我在源码上加了些简单的注释:
def serve_forever(self, poll_interval=0.5): """Handle one request at a time until shutdown. Polls for shutdown every poll_interval seconds. Ignores self.timeout. If you need to do periodic tasks, do them in another thread. """ self.__is_shut_down.clear() try: # XXX: Consider using another file descriptor or connecting to the # socket to wake this up instead of polling. Polling reduces our # responsiveness to a shutdown request and wastes cpu at all other # times. with _ServerSelector() as selector: selector.register(self, selectors.EVENT_READ) # 注册Server描述符并监听I/O读事件 while not self.__shutdown_request: ready = selector.select(poll_interval) # 超时时长poll_interval避免长时间阻塞,在while循环下实现轮询 # bpo-35017: shutdown() called during select(), exit immediately. if self.__shutdown_request: break if ready: self._handle_request_noblock() # 请求过来,I/O读事件准备好,开始处理请求 self.service_actions() finally: self.__shutdown_request = False self.__is_shut_down.set()
从 _handle_request_noblock()
中看到,一个网络请求的处理流程无非就是 verify_request()
、process_request()
、shoutdown_request()
加上些许异常处理逻辑,比较简明。在 finish_request()
中出现 RequestHandlerClass
的类对象创建,这里其实就是用户自定义的 RequestHandler(在 BaseServer
的 __int__()
中被初始化)。源码如下,较好理解:
def _handle_request_noblock(self): """Handle one request, without blocking. I assume that selector.select() has returned that the socket is readable before this function was called, so there should be no risk of blocking in get_request(). """ try: request, client_address = self.get_request() except OSError: return if self.verify_request(request, client_address): # 从这里开始就是网络请求的处理流程 try: self.process_request(request, client_address) except Exception: self.handle_error(request, client_address) self.shutdown_request(request) except: self.shutdown_request(request) raise else: self.shutdown_request(request) def process_request(self, request, client_address): """Call finish_request. Overridden by ForkingMixIn and ThreadingMixIn. """ self.finish_request(request, client_address) self.shutdown_request(request) def finish_request(self, request, client_address): """Finish one request by instantiating RequestHandlerClass.""" self.RequestHandlerClass(request, client_address, self) def shutdown_request(self, request): """Called to shutdown and close an individual request.""" self.close_request(request)
小结:要实现一个 HTTP 服务器,需要包含 TCP socket 实现,网络请求流程大致抽象为 verify_request()
、process_request()
、shoutdown_request()
。如果考虑支持用户自定义请求处理,还需要预留接口提供扩展性。当然如何要支持处理 HTTP 协议,还需要具备解析 HTTP 报文的能力,下文继续探讨。
问题二:python 内置的 HTTP Server 是怎么实现的
前文介绍了内置一个网络请求的处理流程(等价于 HTTP Server 的运行流程),一定程度上解释了本节的问题,但欠缺一点细节,没有体现 HTTP 报文的解析逻辑在哪里实现。其实内置的 HTTP Server 的把 HTTP 协议解析的工作解耦出去,单独做成 BaseHTTPRequestHandler
类,这样允许用户自行实现任意应用层的协议解析工作,参考下面图 2:
+----------------------+ +----------------+ | | | | | BaseRequestHandler +------->| __init__() | | | | | +-----------+----------+ +----------------+ | | | +----------------+ | | | V +--->| setup() | +----------------------+ | | | | | | +----------------+ | StreamRequestHandler +---+ | | | +-----------+----------+ | +----------------+ | | | | | +----> finish() | V | | +------------------------+ +----------------------+ +----------------+ | | | | |SimpleHTTPRequestHandler|<---+BaseHTTPRequestHandler| | | | | +------------------------+ +-----------+----------+ | | | V +------------------+ | | | handler() | | | +---------+--------+ +----------------+ | | | | +--->| parse_request()| | | | | V | +----------------+ +----------------------+ | | | | | handler_one_request()+---+ | | | +----------------+ +----------------------+ | | | +--->| do_XXX() | | | +----------------+
图 2
图 2 中,但凡带括号的都是方法,不带括号的是类,从上往下也是父类到子类。本着代码最大化复用的原则,父类 BaseRequestHandler
的 __init__()
中将工作流程确定下来,分别是 setup()
、handler()
、finish()
的先后调用顺序。setup()
和 finish()
在子类 StreamRequestHandler
被实现,最后在 BaseHTTPRequestHandler
类中实现 HTTP 协议解析功能,以及用 HTTP method 来决定调用哪个用户自定义的 do_XXX()
方法,如 do_GET()
、do_POST()
等。代码如下:
class BaseRequestHandler: """Base class for request handler classes. ...... """ def __init__(self, request, client_address, server): self.request = request self.client_address = client_address self.server = server self.setup() try: self.handle() finally: self.finish() def setup(self): pass def handle(self): pass def finish(self): pass class StreamRequestHandler(BaseRequestHandler): """Define self.rfile and self.wfile for stream sockets.""" # 省略代码 def setup(self): # 设置链接超时时长、nagle算法、读写缓冲区 self.connection = self.request if self.timeout is not None: self.connection.settimeout(self.timeout) if self.disable_nagle_algorithm: self.connection.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, True) self.rfile = self.connection.makefile('rb', self.rbufsize) if self.wbufsize == 0: self.wfile = _SocketWriter(self.connection) else: self.wfile = self.connection.makefile('wb', self.wbufsize) def finish(self): if not self.wfile.closed: try: self.wfile.flush() except socket.error: # A final socket error may have occurred here, such as # the local error ECONNABORTED. pass self.wfile.close() self.rfile.close()
HTTP 协议解析关注 parse_request()
方法,由于代码较多不单独贴过来,思路如下:
- 解析 HTTP 协议版本号,确定版本解析是否支持(1.1 <= version < 2.0)
- 获取 HTTP method
- 解析 HTTP header 解析完 HTTP 协议后,根据所获取的 HTTP method,调用用户自定义的对应方法,至此结束。
总结
python 内置的 HTTP Server 实现比较简洁,功能相对简单。如果要自行从零实现一个 HTTP Server,设计上参考 python 的实现,应该具备以下要素:
- TCP socket 通信
- HTTP 协议的报文解析
- 用户自定义的 RequestHandler 调用(设计上需要引入拓展)