Inference provides a selection of AI models for common tasks, such as visual reasoning, question answering, or embedding modalities like texts and images. All the available models are accessible via simple API calls - HTTPS or gRPC.

The task of using a model through Inference APIs, is simple as 1..2..3!

Model Selection

We support models for the following tasks with Inference

  1. Caption
  2. Encode
  3. Rank
  4. VQA
  5. Image Upscaling

The first step in creation is to select the machine learning model that you would like to use for your application. In Jina AI Cloud, several machine learning models are available for your use - we curate the best public models for tasks that are most common across machine learning landscapes.

All available models are listed under the Inference tab on Jina AI Cloud, in the form of cards containing model name with a short description.

Start by clicking on your card model of choice.

In deep learning, one model may support multiple tasks, such as input classification, embedding generation, question-answering or image captioning. When you click on a card, you will see the list of tasks supported by the model as tabs under Demo.


For every model and supported task, users of Jina AI Cloud can try out examples before going ahead with creation of an API endpoint, and incurring costs on their account.

Note: Regardless of the task chosen by the user for demo, at the time of API creation, after the API creation process, the user will be able access _all tasks supported by the selected model._

Under the task tab for demo, we first explain what the task is, in simple terms for our users. The users, then, have the opportunity to try out the model inference by, first, selecting input belonging to a relevant modality, such as text or image. Alternatively, the users may also choose the demo input values.

Upon clicking the "Run" button, users can see the output for their demo inputs.

Limitations of Inference Demo

To prevent misuse of our demo functionality, we have restricted the number of demo runs available to 100 per hour. However, there is no such limit on Inference APIs created by the users.

Model Details

Users can click on the "Details" button below the Demo area to see more details about the model, including its architecture, research or other original publication, variants' description and similar models.

API Creation

Once the users are satisfied with the demo performance and output, they can click on the "Create" button to create an API endpoint for their selected model.

On the page titled "Create Inference API", users need to enter a name for this API endpoint, that they will later use to identify it.

Next the user needs to select a model for API creation - while the pre-selected model on this page is always the same as what the user selected for the demo and model details page, the user still needs to select a variant of this model.

Variant Selection

Usually a model class, such as CLIP, is a collection of model variants, which are nothing but instances of the same model class, with similar architecture, but variations in terms of training data source, volume, or the number of parameters to be trained.

Users of Inference on Jina AI Cloud, can use the tooltips on the creation page, next to the variant name, to choose the variant that is most suitable for their downstream task - while some variants offer high performance, in terms of documents processed per second, other offer higher precision or accuracy. We have made the best possible curation of models across these two criteria of performance in terms of speed and inference, but the final decision is left up to the users of Inference.

API Management

A list of all Inference APIs created by a user can be seen on the first page of Inference on Jina AI Cloud - under Inference API List.

In this list, the status of each created API can be seen, which can be either Serving, or Stopped. Additionally, actions, such as deleting an API, are shown for every API.

For integrating Inference APIs into their applications, users can copy the gRPC or HTTPS endpoint links created for each API.

Integration with Client Application

To integrate an API with an application, users can click on an API listed on the first page.

The details page for a user's API, contains information on how to integrate this API using following options -

  1. cURL command
  2. Python - Python client
  3. Javascript - JavaScript client

For any of the above options, our users can copy the code sample in their own application, and replace <your access token>with their personal Jina AI Cloud access token.

Token Creation

Personal access tokens (PATs) that need to be used for integrating an Inference API with users' client applications, can be generated from the Token Generation page. This token can be used across different Jina AI Cloud products, and new PATs can be generated by the users anytime.