Cohere Embed Multilingual V3

Review performance benchmarks for the cohere.embed-multilingual-light-v3.0 (Cohere Embed Multilingual Light V3) model hosted on one Embed Cohere unit of a dedicated AI cluster in OCI Generative AI.

Embeddings

This scenario applies only to the embedding models. This scenario mimics embedding generation as part of the data ingestion pipeline of a vector database. In this scenario, all requests are the same size, which is 96 documents, each one with 512 tokens. An example would be a collection of large PDF files, each file with 30,000+ words that a user wants to ingest into a vector DB.

Concurrency Request-level Latency (second) Request-level Throughput (Request per minute) (RPM)
1 1.69 42
8 3.80 118
32 14.26 126
128 37.17 138

Lighter Embeddings

This scenario applies only to the embedding models. This lighter embeddings scenario is similar to the embeddings scenario, except that we reduce the size of each request to 16 documents, each with 512 tokens. Smaller files with fewer words could be supported by this scenario.

Concurrency Request-level Latency (second) Request-level Throughput (Request per minute) (RPM)
1 1.03 54
8 1.35 300
32 3.11 570
128 11.50 888