Skip to content

Machine Learning as a Service

We run several public workers that provide machine learning inference/training as a service. Under the hood each worker may run different backend, to support different types of machine learning models. We currently run the following backend:

  • Triteia (for large generative transformer models).
  • DeepSpeed-MII (for text-to-image and some other models).
  • Inferencia (for other HuggingFace models unsupported).

The endpoint for our public workers starts with https://api.research.computer/. For example, the endpoint for Triteia is https://api.research.computer/triteia/.

Triteia is an inference engine that supports OpenAI-compatible APIs for large generative transformer models. Simply replace the endpoint with https://api.research.computer/triteia/ to use Triteia.

Using the global dispatcher (OpenAI-compatible)

Section titled “Using the global dispatcher (OpenAI-compatible)”

If you are running your own global dispatcher, you can route OpenAI-compatible requests to any registered LLM worker via the llm service:

Terminal window
curl -sS -X POST \
http://<dispatcher-host>:8092/v1/service/llm/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{
"model": "gpt2",
"messages": [
{"role": "user", "content": "Say hello"}
]
}'

Or with the OpenAI Python client:

from openai import OpenAI
client = OpenAI(
base_url="http://<dispatcher-host>:8092/v1/service/llm/v1",
api_key="any-value",
)
resp = client.chat.completions.create(
model="gpt2", # selects a provider that registered this model
messages=[{"role": "user", "content": "Say hello"}],
)
print(resp)

Notes:

  • The dispatcher selects a provider that has registered the requested model (identity group match).
  • Long-running AI requests are supported with extended timeouts.
import requests
response = requests.post(
url="https://api.research.computer/inferencia/v1/predict",
json={
"model_name": "microsoft/deberta-large-mnli",
"data": [{
"text": ["You look amazing today,"],
"top_k": 3,
}]
},
)
print(response.json())

The expected output is

{
'model_name': 'microsoft:deberta-large-mnli',
'model_version': 'default',
'data': [
[
[
{'label': 'NEUTRAL', 'score': 0.9754309058189392}, {'label': 'CONTRADICTION', 'score': 0.016230667009949684}, {'label': 'ENTAILMENT', 'score': 0.00833841785788536}
]
]
]
}