Bootstrap

使用okHttp不走代理问题

背景

某日使用okhttp设置代理并发送爬虫请求时,发现部分url请求没有走代理直接和目标url建立了连接,伪代码如下。初始化okhttpClient时设置了proxySelecter代理,但是调用okhttpClient.newCall请求时并没用调用proxySelecter.select函数获取代理,日志也没有打印。

    public void call(String url) {
        ProxySelector proxySelector = new ProxySelector() {
            @Override
            public List<Proxy> select(URI uri) {
                log.info("run into proxy");
                Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress("127.0.0.1", 80));
                return Collections.singletonList(proxy);
            }

            @Override
            public void connectFailed(URI uri, SocketAddress sa, IOException ioe) {
                return;
            }
        };
        OkHttpClient client = new OkHttpClient.Builder().proxySelector(proxySelector).build();
        okhttp3.Request request = new Request.Builder()
            .url(url).build();

        client.newCall(request);
    }
okHttp & 代理
Android | 彻底理解 OkHttp 代理与路由
为什么没走代理

okhttp选择proxy时,现将传入的url传换为uri,如果uri的host为空,okhttp选择直连url,放弃走代理

okhttp3.internal.connection.RouteSelector
  private fun resetNextProxy(url: HttpUrl, proxy: Proxy?) {
    fun selectProxies(): List<Proxy> {
      // If the user specifies a proxy, try that and only that.
      if (proxy != null) return listOf(proxy)

      // If the URI lacks a host (as in "http://</"), don't call the ProxySelector.
      val uri = url.toUri()
      //此处,如果host解析出来为null。放弃走设置的代理
      if (uri.host == null) return immutableListOf(Proxy.NO_PROXY)

      // Try each of the ProxySelector choices until one connection succeeds.
      val proxiesOrNull = address.proxySelector.select(uri)
      if (proxiesOrNull.isNullOrEmpty()) return immutableListOf(Proxy.NO_PROXY)

      return proxiesOrNull.toImmutableList()
    }

    eventListener.proxySelectStart(call, url)
    proxies = selectProxies()
    nextProxyIndex = 0
    eventListener.proxySelectEnd(call, url, proxies)
  }

 val uri = url.toUri() 函数扒到底,实际获取hostName的执行代码如下。java.net.Uri包解uri时,如果uri的host不合法,则降级设置host为null。

java.net.uri类节选代码
  private int parseAuthority(int start, int n)
            throws URISyntaxException
        {
            ...

            if (serverChars) {
                // Might be (probably is) a server-based authority, so attempt
                // to parse it as such.  If the attempt fails, try to treat it
                // as a registry-based authority.
                try {
                    //此处解析hostName,不合法的话会扔出URISyntaxException异常
                    q = parseServer(p, n);
                    if (q < n)
                        failExpecting("end of authority", q);
                    authority = substring(p, n);
                } catch (URISyntaxException x) {
                    // Undo results of failed parse
                    userInfo = null;
                    //host被赋值为空指针
                    host = null;
                    port = -1;
                    if (requireServerAuthority) {
                        // If we're insisting upon a server-based authority,
                        // then just re-throw the exception
                        throw x;
                    } else {
                        // 
                        ex = x;
                        q = p;
                    }
                }
            }

            ...

            return n;
        }

参考:JDK(java.net.URL) 中的 一个 "bug" | 唐磊的个人博客

;