Encoding in machine learning transforms raw data into a format that can be efficiently processed by a model.
The resulting encoded data, called embeddings, allows machine learning models to learn from input data and make accurate predictions or decisions.

Send Requests via Python Client

The encode method of the Model object takes raw data as input and returns embeddings as output.
It allows a wide range of data types as input, including plain text, image URIs, image bytes, arrays that represent images, and DocumentArrays.

๐Ÿšง

Ensuring Valid Input for the encode Method

To ensure a successful operation and prevent potential data loss, it is crucial to provide valid input when using the encode method. Please take note of the following guidelines:

  1. Avoid Broken or Missing Images: Verify that all images supplied as input are intact and accessible. Broken images or those resulting in a 404 error will cause the request to fail, potentially leading to the loss of any completed results.
  2. Implement Custom Callback Functions, Especially for Large Input: When handling a substantial volume of input data, it becomes even more crucial to implement custom callback functions. This approach allows you to process and manage the results in real-time, minimizing the risk of potential failures or data loss. By leveraging callback functions, such as storing the results directly into a database, you can effectively handle large input scenarios and ensure a robust and reliable outcome. Learn the best practices for handling large request.

By adhering to these precautions, you can ensure a smoother and more reliable experience while utilizing the encode method.

Plain Input

Plain Text

To encode plain text data, you can assign the text to the text parameter of the encode method:

from inference_client import Client

client = Client(token='<your access token>')
model = client.get_model('<model of your selection>')

embedding = model.encode(text='Hello, world!')
[0.123, 0.456, 0.789, ...]

Plain Image

You can also encode plain image data by assigning the image path, bytes, or array to the image parameter of the encode method:

from inference_client import Client
from PIL import Image
import numpy as np

client = Client(token='<your access token>')
model = client.get_model('<model of your selection>')

embedding = model.encode(image='path/to/image.jpg')
embedding = model.encode(image=open('path/to/image.jpg', 'rb'))
embedding = model.encode(image=np.array(Image.open('path/to/image.jpg')))
[0.123, 0.456, 0.789, ...]

A List of Plain Inputs

In addition to a single input, you can also encode a list of inputs by assigning the list to the text or image parameter of the encode method:

from inference_client import Client

client = Client(token='<your access token>')
model = client.get_model('<model of your selection>')

embeddings = model.encode(text=['Hello, world!', 'Hello, Jina!'])
embeddings = model.encode(image=['path/to/image1.jpg', 'path/to/image2.jpg'])
[[0.123, 0.456, 0.789, ...]
 [0.987, 0.654, 0.321, ...]]

The result will be a high-dimensional array of embeddings, where each row represents the embedding of the corresponding input.

DocumentArray Input

The encode method also supports DocumentArray inputs.
DocArray is a library for representing, sending and storing multi-model data, which is perfect for Machine Learning applications.
You can pass a DocumentArray object or a list of Document objects to the encode method to encode the data:

from inference_client import Client
from jina import DocumentArray, Document

client = Client(token='<your access token>')
model = client.get_model('<model of your selection>')

# A list of three Documents
docs = [
    Document(text='Hello, world!'),
    Document(text='Hello, Jina!'),
    Document(text='Hello, Goodbye!'),
]

# A DocumentArray containing three text Documents
docs = DocumentArray(
    [
        Document(text='Hello, world!'),
        Document(text='Hello, Jina!'),
        Document(text='Hello, Goodbye!'),
    ]
)

# A DocumentArray containing three image Documents
docs = DocumentArray(
    [
        Document(uri='path/to/image1.jpg'),
        Document(uri='path/to/image2.jpg').load_uri_to_blob(),
        Document(uri='path/to/image3.jpg').load_uri_to_image_tensor(),
    ]
)

result = model.encode(docs=docs)
print(result.embeddings)
[[0.123, 0.456, 0.789, ...]
 [0.987, 0.654, 0.321, ...]
 [0.111, 0.222, 0.333, ...]]

The result will be a DocumentArray object with the embedding of each input stored in the embedding attribute of each Document object.
You can refer to the DocArray documentation to learn more about how to construct a text Document or an image Document.

Sending Requests via JavaScript Client

The JavaScript client supports encoding single data inputs. You can encode text by specifying the text option, or encode images by providing the image path, bytes, or array using the image option.

import Client from 'inference-client';

const client = new Client('<your auth token>');
const model = await client.getModel'<model of your selection>');

const embedding1 = await model.encode({ text: 'hello world' });
const embedding2 = await model.encode({ image: 'https://picsum.photos/200' });
const embedding3 = await model.encode({ image: 'path/to/local/image' });

The returned result will be an array containing the embedding of the provided input.

Plain HTTP Requests via cURL

In addition to using the Inference Client package for encoding tasks, you can also send requests directly using cURL command-line tool. This approach can be useful if you prefer working with command-line tools or need to integrate the encoding process into scripts or automation workflows.

First you need to copy the HTTP address of the Inference API you created at Jina AI Cloud. You can find the endpoint in the detail page of the Inference API.

To encode, just send a POST request to the /encode endpoint with your access token and the data in the request body. Notice the /post endpoint at the end of the address.

curl \
  -X POST https://<your-inference-address>-http.wolf.jina.ai/post \
  -H 'Content-Type: application/json' \
  -H 'Authorization: <your access token>' \
  -d '{"data":[{"text": "First do it"},
      {"text": "then do it right"},
      {"uri": "https://picsum.photos/200"}],
      "execEndpoint":"/encode"}'

The response will be a JSON object containing the embeddings of the input data similar to the following. Some fields are omitted for brevity.

{
  "header": {
    "requestId": "836e8cbd082f48dcb67afea47563c399",
    "execEndpoint": "/encode"
  },
  "parameters": {},
  "routes": [],
  "data": [
    {
      "id": "ff49143a1b441385d30a0127705b3953",
      "mime_type": "text/plain",
      "text": "First do it",
      "embedding": [
        0.01368093490600586,
        -0.373085618019104,
        ...
      ],
    },
    {
      "id": "d1afbcefa52d48df8efa167780ef1ac3",
      "mime_type": "text/plain",
      "text": "then do it right",
      "embedding": [
        -0.1117687076330185,
        0.0024558454751968384,
        ...
      ],
    },
    {
      "id": "0ac991c856211141a374d58abfd49fee",
      "uri": "https://picsum.photos/200",
      "embedding": [
        0.5620652437210083,
        -0.015938282012939453,
        ...
      ],
    }
  ]
}

Best Practices for Sending Large Requests with Python Client

Control Batch Size

You can specify model.encode(..., batch_size=8) to control the number of documents processed in each request. Adjusting this number allows you to find an optimal balance between network transmission and resource usage.

Setting a larger batch_size, such as 1024, may intuitively lead to higher resource utilization on each request. However, it also means that each request will take longer to complete. Since the Inference Client is designed for request and response streaming, using a large batch size does not take advantage of the time overlap between sending requests and receiving responses.

Control Prefetch Size

To control the number of concurrent batches, you can utilize the model.encode(..., prefetch=100) option. The way this operates is that when you send a large request, the outgoing request stream will typically complete before the incoming response stream due to its asynchronous design. This is because handling the request usually requires significant time, which can delay the server's response and potentially lead to the connection being closed, as the server may consider the incoming channel idle. By default, the client is configured with a prefetch value of 100. However, it is advisable to use a lower value for resource-intensive operations and a higher value for faster response times.

For further details on client prefetching, please consult the Jina documentation's section on Rate Limit.

Show Progress Bar

You can use model.encode(..., show_progress=True) to turn on the progress bar.

Custom Callback

The Inference Client by default collects all the results and returns them to users. However, if you want to process the results on-the-fly, you can also pass a callback function when sending the request. For example, you can use the callback to save the results to a database, or render the results to a webpage. Specifically, you can specify any of the three callback functions: on_done, on_error, and on_always.

  • on_done is executed while streaming, after successful completion of each request
  • on_error is executed while streaming, whenever an error occurs in each request
  • on_always is always performed while streaming, no matter the success or failure of each request

Note that these callbacks only work for requests (and failures) inside the stream. For on_error, if the failure is due to an error happening outside of streaming, then it will not be triggered. For example, a SIGKILL from the client OS during the handling of the request, or a networking issue, will not trigger the callback. Learn more about handling exceptions in on_error.

Callback functions take a Response of the type DataRequest, which contains resulting Documents, parameters, and other information. Learn more about handling DataRequest in callbacks.

In the following example, we will use on_done to save the results to a database. We use a simple dict to simulate the database. The error is saved to log file using on_error. on_always will print the number of documents processed in each request.

from inference_client import Client

db = {}

def my_on_done(resp):
    for doc in resp.docs:
        db[doc.id] = doc

def my_on_error(resp):
    with open('error.log', 'a') as f:
        f.write(resp)

def my_on_always(resp):
    print(f'{len(resp.docs)} docs processed')

client = Client(token='<your access token>')
model = client.get_model('<model of your selection>')
model.encode(
    text=['hello', 'world'], on_done=my_on_done, on_error=my_on_error, on_always=my_on_always
)

๐Ÿ“˜

If either on_done or on_always is specified, the default behavior of returning the results is disabled. You need to handle the results yourself.