Benchmark¶
Input Sequence Size = \(16 \times 1024\)
Single GPU¶
Model |
Device |
Runtime (ms) |
Memory (GB) |
Energy (J/token) |
Throughput (token/sec) |
||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Measured |
Estimated |
Error |
Measured |
Estimated |
Error |
Measured |
Estimated |
Error |
Measured |
Estimated |
Error |
||
facebook/opt-1.3b |
NVIDIA RTX A5000 |
693.77 |
740.79 |
6.35% |
5.53 |
5.52 |
-0.17% |
0.010 |
0.010 |
6.35% |
42.34 |
45.21 |
6.35% |
facebook/opt-6.7b |
NVIDIA RTX A5000 |
3275.30 |
3300.85 |
0.77% |
15.48 |
15.47 |
-0.06% |
0.046 |
0.046 |
0.77% |
199.91 |
201.47 |
0.77% |
AlCrossSim/clm-60m |
NVIDIA RTX A5000 |
121.46 |
112.05 |
-8.40% |
6.16 |
6.15 |
-0.16% |
0.002 |
0.002 |
-8.40% |
7.41 |
6.84 |
-8.40% |
AlCrossSim/clm-200m |
NVIDIA RTX A5000 |
241.97 |
226.40 |
-6.87% |
6.50 |
6.49 |
-0.30% |
0.003 |
0.003 |
-6.87% |
14.77 |
13.82 |
-6.87% |
AlCrossSim/clm-400m |
NVIDIA RTX A5000 |
380.49 |
355.55 |
-7.01% |
6.87 |
6.85 |
-0.30% |
0.005 |
0.005 |
-7.01% |
23.22 |
21.70 |
-7.01% |
AlCrossSim/clm-1.1b |
NVIDIA RTX A5000 |
747.17 |
716.71 |
-4.25% |
8.17 |
8.13 |
-0.51% |
0.010 |
0.010 |
-4.25% |
45.60 |
43.74 |
-4.25% |
facebook/opt-1.3b |
NVIDIA RTX A6000 |
510.96 |
594.34 |
14.03% |
5.53 |
5.52 |
-0.17% |
0.009 |
0.011 |
14.03% |
31.19 |
36.28 |
14.03% |
facebook/opt-6.7b |
NVIDIA RTX A6000 |
2150.14 |
2121.35 |
-1.36% |
15.48 |
15.47 |
-0.06% |
0.039 |
0.039 |
-1.36% |
131.23 |
129.48 |
-1.36% |
AlCrossSim/clm-60m |
NVIDIA RTX A6000 |
109.11 |
76.87 |
-41.93% |
6.16 |
6.15 |
-0.16% |
0.002 |
0.001 |
-41.93% |
6.66 |
4.69 |
-41.93% |
AlCrossSim/clm-200m |
NVIDIA RTX A6000 |
204.46 |
178.60 |
-14.48% |
6.50 |
6.49 |
-0.30% |
0.004 |
0.003 |
-14.48% |
12.48 |
10.90 |
-14.48% |
AlCrossSim/clm-400m |
NVIDIA RTX A6000 |
315.30 |
295.36 |
-6.75% |
6.87 |
6.85 |
-0.30% |
0.006 |
0.005 |
-6.75% |
19.24 |
18.03 |
-6.75% |
AlCrossSim/clm-1.1b |
NVIDIA RTX A6000 |
600.65 |
613.92 |
2.16% |
8.17 |
8.13 |
-0.51% |
0.011 |
0.011 |
2.16% |
36.66 |
37.47 |
2.16% |
Multi GPU¶
Model |
Device |
Runtime (ms) |
Memory (GB) |
Energy (J/token) |
Throughput (token/s) |
||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Measured |
Estimated |
Error |
Measured |
Estimated |
Error |
Measured |
Estimated |
Error |
Measured |
Estimated |
Error |
||
facebook/opt-1.3b |
NVIDIA RTX A5000 x 8 |
1311.08 |
1415.36 |
7.37% |
4.99 |
4.98 |
-0.16% |
0.018 |
0.020 |
7.37% |
80.02 |
86.39 |
7.37% |
facebook/opt-6.7b |
NVIDIA RTX A5000 x 8 |
4241.64 |
4287.28 |
1.06% |
16.04 |
16.03 |
-0.05% |
0.060 |
0.060 |
1.06% |
258.89 |
261.67 |
1.06% |
AlCrossSim/clm-60m |
NVIDIA RTX A5000 x 8 |
124.78 |
131.60 |
5.18% |
7.62 |
7.68 |
-0.12% |
0.002 |
0.002 |
5.18% |
7.62 |
8.03 |
5.18% |
AlCrossSim/clm-200m |
NVIDIA RTX A5000 x 8 |
245.71 |
239.86 |
-2.44% |
7.87 |
7.85 |
-0.26% |
0.004 |
0.004 |
-2.44% |
15.00 |
14.64 |
-2.44% |
AlCrossSim/clm-400m |
NVIDIA RTX A5000 x 8 |
356.11 |
360.63 |
1.25% |
8.10 |
8.08 |
-0.25% |
0.007 |
0.007 |
1.25% |
21.74 |
22.01 |
1.25% |
AlCrossSim/clm-1.1b |
NVIDIA RTX A5000 x 8 |
673.65 |
639.71 |
-5.30% |
8.91 |
8.89 |
-0.20% |
0.012 |
0.012 |
-5.30% |
41.12 |
39.04 |
-5.30% |
facebook/opt-1.3b |
NVIDIA RTX A6000 x 8 |
1199.32 |
1219.91 |
1.69% |
4.99 |
4.98 |
-0.16% |
0.022 |
0.022 |
1.69% |
73.20 |
74.46 |
1.69% |
facebook/opt-6.7b |
NVIDIA RTX A6000 x 8 |
3755.24 |
3319.92 |
-13.11% |
16.04 |
16.03 |
-0.05% |
0.069 |
0.061 |
-13.11% |
229.20 |
202.63 |
-13.11% |
AlCrossSim/clm-60m |
NVIDIA RTX A6000 x 8 |
115.16 |
96.43 |
-19.42% |
7.62 |
7.61 |
-0.12% |
0.002 |
0.002 |
-19.42% |
7.03 |
5.89 |
-19.42% |
AlCrossSim/clm-200m |
NVIDIA RTX A6000 x 8 |
212.33 |
189.00 |
-12.35% |
7.87 |
7.85 |
-0.26% |
0.004 |
0.003 |
-12.35% |
12.96 |
11.54 |
-12.35% |
AlCrossSim/clm-400m |
NVIDIA RTX A6000 x 8 |
309.74 |
294.85 |
-5.05% |
8.09 |
8.07 |
-0.25% |
0.006 |
0.005 |
-5.05% |
18.91 |
18.00 |
-5.05% |
AlCrossSim/clm-1.1b |
NVIDIA RTX A6000 x 8 |
568.35 |
569.74 |
0.24% |
8.91 |
8.89 |
-0.20% |
0.010 |
0.010 |
0.24% |
34.69 |
34.77 |
0.24% |