0%

WebFlux性能问题和适用场景

Spring5的主推功能可能就是WebFlux了
但是网上一堆人吹捧性能完爆SpringMVC似乎有点过头了

结论就是WebFlux的线程模型不同,所以适应场景也不同
SpringMVC并不是互相替代的关系(个人感觉)

使用

使用我推荐看这个视频
https://www.youtube.com/watch?v=zVNIZXf4BG8&t=2947s&frags=pl%2Cwn

简单易懂

细节问题

Web容器

如果你同时引入的是

1
2
3
4
5
6
7
8
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-webflux</artifactId>
</dependency>

那么默认的Web容器就是Tomcat

如果想要使用Netty作为容器,那么可以在web模块中手动把Tomcat去除

1
2
3
4
5
6
7
8
9
10
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
<exclusions>
<<exclusion>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-tomcat</artifactId>
</exclusion>
</exclusions>
</dependency>

内部依赖

其实看官方的图就会很费解,为啥WebFlux吹捧的是Reactive编程,却能跑在Tomcat上

为此我去翻看了文档

如果Web容器使用的是Tomcat,那么就是使用Reactor桥接的servlet async api
如果Web容器是Netty,那么就是使用的Netty,天生支持Reactive

所以官方的推荐还是使用Netty跑WebFlux

而用Netty跑SpringMVC行不行呢,也是可以的,但是性能并不会很好,主要是Tomcat是暴力创建线程,但是Netty默认线程数量较少

Reavtive代码

其实这是第二个误区,很多人以为只要我们在Controller中返回的是Mono或者Flux,性能就会得到提升
不存在的
如果你要使用WebFlux,那么对不起,从Dao到Service,全部都要是Mono和Flux。
目前官方的数据层Reactive框架只支持Redis,Mongo等几个,没有JDBC
所以你的代码是JDBC,想要迁移到WebFlux并指望性能提升,那你可能要失望了

不过值得庆幸的是,JDBC的Reactive正在开发中,虽然尚不成熟,但是可以关注一下

性能和迁移

这个其实是值得商榷的问题
因为性能和很多因素有关

  • Web容器的线程模型 这个Tomcat和Netty不一样
  • Web容器的线程个数 Tomcat默认200个Nio线程,但是Netty默认可能只有核心数*2
  • 业务代码,如果是IO操作较多,Netty模型可能比较适合,如果是业务阻塞较多,默认的Tomcat可能比较适合,Netty可能需要较多的冗余代码和调优,且性能可能不会有较大提升

既然性能和这么多因素有关,所以官方也没有打包票WebFlux碾压SpringMVC

关于迁移,文档是这么写的

If you have a Spring MVC application that works fine, there is no need to change. Imperative programming is the easiest way to write, understand, and debug code. You have maximum choice of libraries, since, historically, most are blocking.

这句话翻译过来就是
如果你的代码中有任何阻塞操作,请谨慎选择WebFlux

关于性能

Performance has many characteristics and meanings. Reactive and non-blocking generally do not make applications run faster. They can, in some cases, (for example, if using the WebClient to execute remote calls in parallel). On the whole, it requires more work to do things the non-blocking way and that can increase slightly the required processing time.

The key expected benefit of reactive and non-blocking is the ability to scale with a small, fixed number of threads and less memory. That makes applications more resilient under load, because they scale in a more predictable way. In order to observe those benefits, however, you need to have some latency (including a mix of slow and unpredictable network I/O). That is where the reactive stack begins to show its strengths, and the differences can be dramatic.

WebFlux并不保证应用能运行的更快,但是它主打的是scale和低内存消耗
它的性能需要在一些特定的场景才能展现,比如慢网络IO的情况

惨痛的测试过程

看起来可能会有点乱,而且也并不是很权威,个人建议直接跳过这一节,去看结论

第一轮

其实为啥写这个文章,就是我开始很开心的想要看看WebFlux到底比SpringMVC强多少

然后写出了下面的代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
@GetMapping("/hello")
public Map<String, String> hello() {
try {
Thread.sleep(5);
} catch (InterruptedException e) {

}
return Collections.singletonMap("Hello", "world");
}

@GetMapping("/reactor")
public Mono<Map<String, String>> reactor() {
return Mono.create( sink -> {
try {
Thread.sleep(5);
} catch (InterruptedException e) {

}
sink.success(Collections.singletonMap("Hello", "world"));
});
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
» wrk -t30 -c3000 -d60s http://127.0.0.1:8080/hello --latency
Running 1m test @ http://127.0.0.1:8080/hello
30 threads and 3000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 89.76ms 56.33ms 401.67ms 61.64%
Req/Sec 628.19 383.30 3.88k 81.15%
Latency Distribution
50% 89.00ms
75% 132.28ms
90% 161.62ms
99% 229.86ms
1027462 requests in 1.00m, 153.02MB read
Socket errors: connect 0, read 3283, write 0, timeout 0
Requests/sec: 17095.73
Transfer/sec: 2.55MB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
» wrk -t30 -c3000 -d60s http://127.0.0.1:8080/reactor --latency
Running 1m test @ http://127.0.0.1:8080/reactor
30 threads and 3000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 100.82ms 65.39ms 451.83ms 65.85%
Req/Sec 438.16 321.89 2.96k 80.20%
Latency Distribution
50% 94.37ms
75% 146.32ms
90% 185.54ms
99% 281.39ms
739620 requests in 1.00m, 110.15MB read
Socket errors: connect 0, read 2524, write 0, timeout 0
Requests/sec: 12306.22
Transfer/sec: 1.83MB

发现WebFlux性能反而不如SpringMVC,这里标注一点,Mono.create里面的逻辑,还是tomcat的线程去执行的
其实我想了想这里进行Sleep应该不是很合理,因为sleep算是个block操作

继而我手动发布到parallel中,虽然我在官方的例子中没有看过这种写法

1
2
3
4
5
6
7
8
9
10
11
@GetMapping("/reactor")
public Mono<Map<String, String>> reactor() {
return Mono.<Map<String, String>>create(sink -> {
try {
Thread.sleep(5);
} catch (InterruptedException e) {

}
sink.success(Collections.singletonMap("Hello", "world"));
}).publishOn(Schedulers.parallel());
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
» wrk -t30 -c3000 -d60s http://127.0.0.1:8080/reactor --latency
Running 1m test @ http://127.0.0.1:8080/reactor
30 threads and 3000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 122.59ms 81.11ms 634.86ms 66.16%
Req/Sec 378.94 246.55 2.56k 79.17%
Latency Distribution
50% 110.07ms
75% 175.18ms
90% 236.01ms
99% 344.06ms
662844 requests in 1.00m, 98.71MB read
Socket errors: connect 0, read 3813, write 0, timeout 0
Requests/sec: 11029.13
Transfer/sec: 1.64MB

可以看到性能反而变慢了,因为Schedulers.parallel()默认只有8线程在运行,于是我手动改成30

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
» wrk -t30 -c3000 -d60s http://127.0.0.1:8080/reactor --latency
Running 1m test @ http://127.0.0.1:8080/reactor
30 threads and 3000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 132.54ms 87.81ms 691.34ms 69.49%
Req/Sec 390.05 258.97 2.97k 74.94%
Latency Distribution
50% 116.71ms
75% 183.79ms
90% 251.33ms
99% 401.91ms
678598 requests in 1.00m, 101.06MB read
Socket errors: connect 0, read 4059, write 0, timeout 0
Requests/sec: 11291.03
Transfer/sec: 1.68MB

可以看到结果其实差别并不大,瓶颈应该不在这里,或者WebFlux在publicOn做了什么调用

第二轮

第二轮我修改了测试案例,换成了从Redis中取值的操作

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
private ReactiveRedisTemplate<String, String> reactiveRedisTemplate;

private StringRedisTemplate stringRedisTemplate;

public HelloController(ReactiveRedisTemplate<String, String> reactiveRedisTemplate, StringRedisTemplate stringRedisTemplate) {
this.reactiveRedisTemplate = reactiveRedisTemplate;
this.stringRedisTemplate = stringRedisTemplate;
}

private List<String> keys = Arrays.asList("name1", "name2", "name3");

@GetMapping("/hello")
public List<String> hello() {
return stringRedisTemplate.opsForValue().multiGet(keys);
}

@GetMapping("/reactor")
public Mono<List<String>> reactor() {
return reactiveRedisTemplate.opsForValue().multiGet(keys);
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
» wrk -t30 -c3000 -d60s http://127.0.0.1:8080/hello --latency
Running 1m test @ http://127.0.0.1:8080/hello
30 threads and 3000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 110.62ms 57.26ms 279.77ms 65.42%
Req/Sec 467.92 327.19 4.65k 85.21%
Latency Distribution
50% 109.25ms
75% 155.67ms
90% 182.36ms
99% 234.64ms
808566 requests in 1.00m, 128.90MB read
Socket errors: connect 0, read 4070, write 0, timeout 0
Requests/sec: 13453.13
Transfer/sec: 2.14MB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
» wrk -t30 -c3000 -d60s http://127.0.0.1:8080/reactor --latency
Running 1m test @ http://127.0.0.1:8080/reactor
30 threads and 3000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 236.17ms 113.54ms 650.53ms 68.95%
Req/Sec 293.21 161.31 5.09k 81.31%
Latency Distribution
50% 251.06ms
75% 313.69ms
90% 352.13ms
99% 521.69ms
505806 requests in 1.00m, 80.62MB read
Socket errors: connect 0, read 3535, write 0, timeout 0
Requests/sec: 8416.85
Transfer/sec: 1.34MB

可以看到性能还是很低,甚至差别略大

于是我还是改成手动publish

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
» wrk -t30 -c3000 -d60s http://127.0.0.1:8080/reactor --latency
Running 1m test @ http://127.0.0.1:8080/reactor
30 threads and 3000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 228.61ms 101.83ms 666.54ms 70.65%
Req/Sec 293.82 148.63 1.98k 75.36%
Latency Distribution
50% 237.21ms
75% 287.44ms
90% 335.04ms
99% 526.61ms
499892 requests in 1.00m, 79.68MB read
Socket errors: connect 0, read 3198, write 0, timeout 0
Requests/sec: 8317.62
Transfer/sec: 1.33MB

发现其实区别还是不大

既然把pool改成30呢

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
» wrk -t30 -c3000 -d60s http://127.0.0.1:8080/reactor --latency
Running 1m test @ http://127.0.0.1:8080/reactor
30 threads and 3000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 212.81ms 94.51ms 709.88ms 69.39%
Req/Sec 290.22 151.03 2.57k 77.73%
Latency Distribution
50% 214.93ms
75% 262.16ms
90% 317.48ms
99% 479.01ms
508591 requests in 1.00m, 81.07MB read
Socket errors: connect 0, read 3190, write 0, timeout 0
Requests/sec: 8462.78
Transfer/sec: 1.35MB

还是差别不大,所以问题的关键应该不在这儿。

第三轮

第三轮把容器改成了Netty
改为Netty之后有个要注意的就是Web容器的线程不会疯狂的进行创建了,一般就是核心数或者核心数*2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
» wrk -t30 -c3000 -d60s http://127.0.0.1:8080/hello --latency
Running 1m test @ http://127.0.0.1:8080/hello
30 threads and 3000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 63.26ms 38.60ms 1.99s 83.37%
Req/Sec 467.59 453.97 6.27k 85.41%
Latency Distribution
50% 65.65ms
75% 83.72ms
90% 93.78ms
99% 129.69ms
747480 requests in 1.00m, 80.55MB read
Socket errors: connect 0, read 4343, write 1, timeout 874
Requests/sec: 12437.61
Transfer/sec: 1.34MB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
» wrk -t30 -c3000 -d60s http://127.0.0.1:8080/reactor --latency
Running 1m test @ http://127.0.0.1:8080/reactor
30 threads and 3000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 156.88ms 74.43ms 527.17ms 66.33%
Req/Sec 431.27 207.17 5.08k 82.32%
Latency Distribution
50% 176.96ms
75% 211.25ms
90% 236.23ms
99% 312.83ms
762879 requests in 1.00m, 82.21MB read
Socket errors: connect 0, read 3419, write 0, timeout 0
Requests/sec: 12694.40
Transfer/sec: 1.37MB

和上面的tomcat相比,SpringMVC会比较糟糕,这个也是可以预见的,因为Redis中阻塞的EventLoop
WebFlux的性能超过了SpringMVC,同时注意看WebFlux超时为0,而SpringMVC伴随着大量的超时

横向进行对比的话,SpringMVC的性能还是进行了下降
但是WebFlux却有很大的提升

修改默认的EventLoop大小呢

1
System.setProperty("reactor.schedulers.defaultPoolSize", "30");

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
» wrk -t30 -c3000 -d60s http://127.0.0.1:8080/hello --latency
Running 1m test @ http://127.0.0.1:8080/hello
30 threads and 3000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 61.56ms 35.75ms 1.98s 84.77%
Req/Sec 484.48 503.66 8.98k 88.50%
Latency Distribution
50% 65.20ms
75% 80.28ms
90% 90.15ms
99% 113.19ms
764109 requests in 1.00m, 82.34MB read
Socket errors: connect 0, read 4749, write 2, timeout 916
Requests/sec: 12713.41
Transfer/sec: 1.37MB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
» wrk -t30 -c3000 -d60s http://127.0.0.1:8080/reactor --latency
Running 1m test @ http://127.0.0.1:8080/reactor
30 threads and 3000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 117.43ms 58.73ms 310.79ms 65.99%
Req/Sec 464.75 350.25 8.26k 84.65%
Latency Distribution
50% 116.12ms
75% 160.88ms
90% 192.55ms
99% 257.21ms
803283 requests in 1.00m, 86.57MB read
Socket errors: connect 0, read 3011, write 0, timeout 0
Requests/sec: 13366.05
Transfer/sec: 1.44MB

可以看到SpringMVC基本没变,同时带有大量的超时
但是WebFlux性能有提升,同时超时个数依旧为0

同时,发现无论怎么设置,都无法超过第二轮Tomcat的值

第四轮

比较一下Tomcat的阻塞模型和Netty的非阻塞模型,在线程差不多的情况下的性能
增加配置

1
server.tomcat.max-threads=20

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
» wrk -t30 -c3000 -d60s http://127.0.0.1:8080/hello --latency
Running 1m test @ http://127.0.0.1:8080/hello
30 threads and 3000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 119.35ms 73.88ms 323.07ms 61.31%
Req/Sec 429.58 428.04 8.30k 89.22%
Latency Distribution
50% 124.08ms
75% 179.28ms
90% 215.99ms
99% 268.68ms
713803 requests in 1.00m, 113.80MB read
Socket errors: connect 0, read 3452, write 0, timeout 0
Requests/sec: 11876.60
Transfer/sec: 1.89MB

会发现WebFlux的表现要好的多

结论

WebFlux相对于SpringMVC下的优点

  • 同样的性能场景消耗的资源更少
  • 适合横向扩展

所以总结一下WebFlux什么场景下可以替换SpringMVC呢

  • 想要内存和线程数较少的场景
  • 网络较慢或者IO会经常出现问题的场景

但是WebFlux需要

  • 非阻塞的业务代码,如果阻塞,需要自己开线程池去运行

混写

其实SpringMVC和WebFlux混写个人感觉不会太好
因为SpringMVC一般配合的是业务阻塞较多,如果配合Netty,可能会阻塞EventLoop,编程压力较大,配合Tomcat,疯狂开线程就行
WebFlux还是比较适合Netty,Reactor模式,业务不能阻塞IO线程,如果业务阻塞操作较多,可能需要自己去单独开线程池去运行,编程较为复杂,所以业务那边,需要框架支持非阻塞运行
那WebFlux跑在Tomcat中呢,我觉得是可以的,但是感觉很变扭
WebClient例外,那个就是AsyncHttpClient,和Tomcat没啥关系

未来与展望

WebFlux更多的是对标的Vertx,但是没有Vertx完善
好在支持自己的IOC和AOP
等JDBC支持异步的库完善了,可以用来写纯异步的轻量级应用