Benchmark

Input Sequence Size = \(16 \times 1024\)

Single GPU

Model

Device

Runtime (ms)

Memory (GB)

Energy (J/token)

Throughput (token/sec)

Measured

Estimated

Error

Measured

Estimated

Error

Measured

Estimated

Error

Measured

Estimated

Error

facebook/opt-1.3b

NVIDIA RTX A5000

693.77

740.79

6.35%

5.53

5.52

-0.17%

0.010

0.010

6.35%

42.34

45.21

6.35%

facebook/opt-6.7b

NVIDIA RTX A5000

3275.30

3300.85

0.77%

15.48

15.47

-0.06%

0.046

0.046

0.77%

199.91

201.47

0.77%

AlCrossSim/clm-60m

NVIDIA RTX A5000

121.46

112.05

-8.40%

6.16

6.15

-0.16%

0.002

0.002

-8.40%

7.41

6.84

-8.40%

AlCrossSim/clm-200m

NVIDIA RTX A5000

241.97

226.40

-6.87%

6.50

6.49

-0.30%

0.003

0.003

-6.87%

14.77

13.82

-6.87%

AlCrossSim/clm-400m

NVIDIA RTX A5000

380.49

355.55

-7.01%

6.87

6.85

-0.30%

0.005

0.005

-7.01%

23.22

21.70

-7.01%

AlCrossSim/clm-1.1b

NVIDIA RTX A5000

747.17

716.71

-4.25%

8.17

8.13

-0.51%

0.010

0.010

-4.25%

45.60

43.74

-4.25%

facebook/opt-1.3b

NVIDIA RTX A6000

510.96

594.34

14.03%

5.53

5.52

-0.17%

0.009

0.011

14.03%

31.19

36.28

14.03%

facebook/opt-6.7b

NVIDIA RTX A6000

2150.14

2121.35

-1.36%

15.48

15.47

-0.06%

0.039

0.039

-1.36%

131.23

129.48

-1.36%

AlCrossSim/clm-60m

NVIDIA RTX A6000

109.11

76.87

-41.93%

6.16

6.15

-0.16%

0.002

0.001

-41.93%

6.66

4.69

-41.93%

AlCrossSim/clm-200m

NVIDIA RTX A6000

204.46

178.60

-14.48%

6.50

6.49

-0.30%

0.004

0.003

-14.48%

12.48

10.90

-14.48%

AlCrossSim/clm-400m

NVIDIA RTX A6000

315.30

295.36

-6.75%

6.87

6.85

-0.30%

0.006

0.005

-6.75%

19.24

18.03

-6.75%

AlCrossSim/clm-1.1b

NVIDIA RTX A6000

600.65

613.92

2.16%

8.17

8.13

-0.51%

0.011

0.011

2.16%

36.66

37.47

2.16%

Multi GPU

Model

Device

Runtime (ms)

Memory (GB)

Energy (J/token)

Throughput (token/s)

Measured

Estimated

Error

Measured

Estimated

Error

Measured

Estimated

Error

Measured

Estimated

Error

facebook/opt-1.3b

NVIDIA RTX A5000 x 8

1311.08

1415.36

7.37%

4.99

4.98

-0.16%

0.018

0.020

7.37%

80.02

86.39

7.37%

facebook/opt-6.7b

NVIDIA RTX A5000 x 8

4241.64

4287.28

1.06%

16.04

16.03

-0.05%

0.060

0.060

1.06%

258.89

261.67

1.06%

AlCrossSim/clm-60m

NVIDIA RTX A5000 x 8

124.78

131.60

5.18%

7.62

7.68

-0.12%

0.002

0.002

5.18%

7.62

8.03

5.18%

AlCrossSim/clm-200m

NVIDIA RTX A5000 x 8

245.71

239.86

-2.44%

7.87

7.85

-0.26%

0.004

0.004

-2.44%

15.00

14.64

-2.44%

AlCrossSim/clm-400m

NVIDIA RTX A5000 x 8

356.11

360.63

1.25%

8.10

8.08

-0.25%

0.007

0.007

1.25%

21.74

22.01

1.25%

AlCrossSim/clm-1.1b

NVIDIA RTX A5000 x 8

673.65

639.71

-5.30%

8.91

8.89

-0.20%

0.012

0.012

-5.30%

41.12

39.04

-5.30%

facebook/opt-1.3b

NVIDIA RTX A6000 x 8

1199.32

1219.91

1.69%

4.99

4.98

-0.16%

0.022

0.022

1.69%

73.20

74.46

1.69%

facebook/opt-6.7b

NVIDIA RTX A6000 x 8

3755.24

3319.92

-13.11%

16.04

16.03

-0.05%

0.069

0.061

-13.11%

229.20

202.63

-13.11%

AlCrossSim/clm-60m

NVIDIA RTX A6000 x 8

115.16

96.43

-19.42%

7.62

7.61

-0.12%

0.002

0.002

-19.42%

7.03

5.89

-19.42%

AlCrossSim/clm-200m

NVIDIA RTX A6000 x 8

212.33

189.00

-12.35%

7.87

7.85

-0.26%

0.004

0.003

-12.35%

12.96

11.54

-12.35%

AlCrossSim/clm-400m

NVIDIA RTX A6000 x 8

309.74

294.85

-5.05%

8.09

8.07

-0.25%

0.006

0.005

-5.05%

18.91

18.00

-5.05%

AlCrossSim/clm-1.1b

NVIDIA RTX A6000 x 8

568.35

569.74

0.24%

8.91

8.89

-0.20%

0.010

0.010

0.24%

34.69

34.77

0.24%