Skip to content


The inference endpoint is a POST endpoint that receives a JSON object on the body.

Input of the endpoint is a JSON object in the Body that contains the input to the model and the parameters to control the different modes for text generation.

    "input": "This is an input text",
    "max_length": 50,
    "mode": "sample-top-p",
    "top_p": 0.92,
    "top_k": 50,
    "temperature": 0.7,
    "num_beams": 5,
    "no_repeat_ngram_size": 2,
    "num_return_sequences": 3,
    "seed": -1

The response of the model is a list of generated sequences.

  "This is an input text file which is created from a file.\n\nThis is the input text file which will be converted to a PDF and used as the source text.\n\nThis is the text file that is used to convert the PDF to",
  "This is an input text field.\n\nYou can customize this field through the following parameters:",
  "This is an input text box which has a lot of options. You will be able to change the size of the text and change the font and line height.\n\nYou can also add or remove the \"back\" button. If you select a"

Querying this InvokeEndpoint requires to handle the AWS Signature Version 4. We provide some examples on how to query this endpoint:

Using the boto3 python library:

content_type = "application/json"

payload = json.dumps({"input": "This is the input of the algorithm"})

response = client.invoke_endpoint(
    EndpointName=endpoint_name, ContentType=content_type, Body=payload

    'This is the input of the algorithm, and you can see that it has a few iterations (3, 4, 5, 6) and ends with a "cursor" (and you can see that there is an arrow pointing right).\n\n',
    'This is the input of the algorithm, which is a string (a list of letters) separated by a period. The only thing that is really guaranteed here is that the input has a certain length, and no more than 255 characters. This is a',
    'This is the input of the algorithm. The values you enter should be numbers in the range of 0–999. You can use a number of different input types:\n\n1–1,000: One of these random numbers (0–999'

Using Python requests and the Python aws-requests-auth library to handle the AWS Signature v4 workflow.

import requests
from aws_requests_auth.boto_utils import BotoAWSRequestsAuth

auth = BotoAWSRequestsAuth(

data = {"input": "Now we are using Python requests"}

response =

    'Now we are using Python requests module. You need to install it as follows.\n\nFor Python 2.7 :\n\nsudo apt install python-requests\n\nFor Python 3.x :\n\nsudo pip install requests\n\nStep',
    "Now we are using Python requests library to make a request to We will make a GET request to the URL and get a list of all the site's pages.\n\n#include <Python.h",
    "Now we are using Python requests with Flask, with a few tweaks. I'll explain how to set up a simple API client with Python 3 and Flask 3 (which was just added as a module, just a week ago!).\n\nHere is the"

Using the AWS CLI.

aws sagemaker-runtime invoke-endpoint \
    --endpoint-name gpt-2 \
    --content-type application/json \
    --body '{"input": "Now we use the AWS CLI"} \
$ cat output.json
    "Now we use the AWS CLI to run the task to set up the environment:\n\n# set the AMI as the instance\n\naws ec2 describe-instance-role \"arn:aws:iam::123456789012:role",
    "Now we use the AWS CLI to create a new cloud resource. AWS CLI will generate a public key using the s3_key that we generated earlier. We'll use this to encrypt the data within this archive in order to protect the data from being",
    "Now we use the AWS CLI to get all of the public keys.\n\n$ aws ec2 describe-public-keys [{\"AccessKeyId\": \"your-access-key-id\", \"SecretAccessKey\": \"your-secret"

Using cURL it will be like doing doing:

curl -X POST "" \
  -H  "Content-Type: application/json" \
  --data '{"input": "Querying the model using curl."}'

Note that this command requires modification to authenticate the request with AWS Signature Version 4.


Param Deescription Default
input Text to be completed required
max_length Number of works to be generated 100
num_return_sequences Number of sequences to return 3
mode Options: sample-top-p, sample-top-k, sample, beam, greedy sample-top-p
top_p Top probability 0.92
top_k Top K 50
temperature Temperature 0.7
num_beams Number of beams 5
no_repeat_ngram_size No repeat n-gram size 2
seed Set random seed before making a prediction None

The input, max_length and num_return_sequences parameters can be used in all modes.

Mode: sample-top-p

This is the default mode to generate text. It works as a nice default for a variety of inputs.

These parameters can be used on this mode:

Param Default value
top_p 0.92
top_p 50

Mode: sample-top-k

These parameters can be used on this mode:

Param Default value
top_k 50

Mode: sample

This mode works as a general version of sample-top-p and sample-top-k.

These parameters can be used on this mode:

Param Default value
top_p 0.92
top_k 50
temperature 0.7

Mode: beam

These parameters can be used on this mode:

Param Default value
num_beams 5
no_repeat_ngram_size 2

Mode: greedy

There are no arguments on this mode.