Machine Learning as a Service
We run several public workers that provide machine learning inference/training as a service. Under the hood each worker may run different backend, to support different types of machine learning models. We currently run the following backend:
- Triteia (for large generative transformer models).
- DeepSpeed-MII (for text-to-image and some other models).
- Inferencia (for other HuggingFace models unsupported).
The endpoint for our public workers starts with https://api.research.computer/
. For example, the endpoint for Triteia is https://api.research.computer/triteia/
.
Triteia
Section titled “Triteia”Triteia is an inference engine that supports OpenAI-compatible APIs for large generative transformer models. Simply replace the endpoint with https://api.research.computer/triteia/
to use Triteia.
Using the global dispatcher (OpenAI-compatible)
Section titled “Using the global dispatcher (OpenAI-compatible)”If you are running your own global dispatcher, you can route OpenAI-compatible requests to any registered LLM worker via the llm
service:
curl -sS -X POST \ http://<dispatcher-host>:8092/v1/service/llm/v1/chat/completions \ -H 'Content-Type: application/json' \ -d '{ "model": "gpt2", "messages": [ {"role": "user", "content": "Say hello"} ] }'
Or with the OpenAI Python client:
from openai import OpenAI
client = OpenAI( base_url="http://<dispatcher-host>:8092/v1/service/llm/v1", api_key="any-value",)
resp = client.chat.completions.create( model="gpt2", # selects a provider that registered this model messages=[{"role": "user", "content": "Say hello"}],)print(resp)
Notes:
- The dispatcher selects a provider that has registered the requested model (identity group match).
- Long-running AI requests are supported with extended timeouts.
Inferencia
Section titled “Inferencia”import requests
response = requests.post( url="https://api.research.computer/inferencia/v1/predict", json={ "model_name": "microsoft/deberta-large-mnli", "data": [{ "text": ["You look amazing today,"], "top_k": 3, }] },)print(response.json())
The expected output is
{ 'model_name': 'microsoft:deberta-large-mnli', 'model_version': 'default', 'data': [ [ [ {'label': 'NEUTRAL', 'score': 0.9754309058189392}, {'label': 'CONTRADICTION', 'score': 0.016230667009949684}, {'label': 'ENTAILMENT', 'score': 0.00833841785788536} ] ] ]}