Image Inference API
Introduction
Image inference is a powerful feature that allows you to generate images from text prompts or transform existing images according to your needs. This process is essential for creating high-quality visuals, whether you're looking to bring creative ideas to life or enhance existing images with new styles or subjects.
There are several types of image inference requests you can make using our API:
- Text-to-Image: Generate images from descriptive text prompts. This process translates your text into high-quality visuals, allowing you to create detailed and vivid images based on your ideas.
- Image-to-Image: Perform transformations on existing images, whether they are previously generated images or uploaded images. This process enables you to enhance, modify, or stylize images to create new and visually appealing content. With a single parameter you can control the strength of the transformation.
- Inpainting: Replace parts of an image with new content, allowing you to remove unwanted elements or improve the overall composition of an image. It's like Image-to-Image but with a mask that defines the area to be transformed.
- Outpainting: Extend the boundaries of an image by generating new content outside the original frame that seamlessly blends with the existing image. As Inpainting, it uses a mask to define the new area to be generated.
Our API also supports advanced features that allow developers to fine-tune the image generation process with precision:
- ControlNet: A feature that enables precise control over image generation by using additional input conditions, such as edge maps, poses, or segmentation masks. This allows for more accurate alignment with specific user requirements or styles.
- LoRA: A technique that helps in adapting models to specific styles or tasks by focusing on particular aspects of the data, enhancing the quality and relevance of the generated images.
Additionally, you can tweak numerous parameters to customize the output, such as adjusting the image dimension, steps, scheduler to use, and other generation settings, providing a high level of flexibility to suit your application's needs.
Our API is really fast because we have unique optimizations, custom-designed hardware, and many other elements that are part of our Sonic Inference Engine.
Request
Our API always accepts an array of objects as input, where each object represents a specific task to be performed. The structure of the object varies depending on the type of the task. For this section, we will focus on the parameters related to image inference tasks.
The following JSON snippets shows the basic structure of a request object. All properties are explained in detail in the next section.
You can mix multiple ControlNet and LoRA objects in the same request to achieve more complex control over the generation process.
taskType
The type of task to be performed. For this task, the value should be
imageInference
.taskUUID
When a task is sent to the API you must include a random UUID v4 string using the
taskUUID
parameter. This string is used to match the async responses to their corresponding tasks.If you send multiple tasks at the same time, the
taskUUID
will help you match the responses to the correct tasks.The
taskUUID
must be unique for each task you send to the API.outputType
Specifies the output type in which the image is returned. Supported values are:
dataURI
,URL
, andbase64Data
.base64Data
: The image is returned as a base64-encoded string using theimageBase64Data
parameter in the response object.dataURI
: The image is returned as a data URI string using theimageDataURI
parameter in the response object.URL
: The image is returned as a URL string using theimageURL
parameter in the response object.
outputFormat
Specifies the format of the output image. Supported formats are:
PNG
,JPG
andWEBP
.uploadEndpoint
This parameter allows you to specify a URL to which the generated image will be uploaded as binary image data using the HTTP PUT method. For example, an S3 bucket URL can be used as the upload endpoint.
When the image is ready, it will be uploaded to the specified URL.
checkNSFW
This parameter is used to enable or disable the NSFW check. When enabled, the API will check if the image contains NSFW (not safe for work) content. This check is done using a pre-trained model that detects adult content in images.
When the check is enabled, the API will return
NSFWContent: true
in the response object if the image is flagged as potentially sensitive content. If the image is not flagged, the API will returnNSFWContent: false
.If this parameter is not used, the parameter
NSFWContent
will not be included in the response object.Adds 0.1 seconds to image inference time and incurs additional costs.
The NSFW filter occasionally returns false positives and very rarely false negatives.
includeCost
If set to
true
, the cost to perform the task will be included in the response object.positivePrompt
A positive prompt is a text instruction to guide the model on generating the image. It is usually a sentence or a paragraph that provides positive guidance for the task. This parameter is essential to shape the desired results.
For example, if the positive prompt is "dragon drinking coffee", the model will generate an image of a dragon drinking coffee. The more detailed the prompt, the more accurate the results.
The length of the prompt must be between 2 and 2000 characters.
negativePrompt
A negative prompt is a text instruction to guide the model on generating the image. It is usually a sentence or a paragraph that provides negative guidance for the task. This parameter helps to avoid certain undesired results.
For example, if the negative prompt is "red dragon, cup", the model will follow the positive prompt but will avoid generating an image of a red dragon or including a cup. The more detailed the prompt, the more accurate the results.
The length of the prompt must be between 2 and 2000 characters.
seedImage
When doing Image-to-Image, Inpainting or Outpainting, this parameter is required.
Specifies the seed image to be used for the diffusion process. The image can be specified in one of the following formats:
- An UUID v4 string of a previously uploaded image or a generated image.
- A data URI string representing the image. The data URI must be in the format
data:<mediaType>;base64,
followed by the base64-encoded image. For example:data:image/png;base64,iVBORw0KGgo...
. - A base64 encoded image without the data URI prefix. For example:
iVBORw0KGgo...
. - A URL pointing to the image. The image must be accessible publicly.
Supported formats are: PNG, JPG and WEBP.
maskImage
When doing Inpainting or Outpainting, this parameter is required.
Specifies the mask image to be used for the inpainting process. The image can be specified in one of the following formats:
- An UUID v4 string of a previously uploaded image or a generated image.
- A data URI string representing the image. The data URI must be in the format
data:<mediaType>;base64,
followed by the base64-encoded image. For example:data:image/png;base64,iVBORw0KGgo...
. - A base64 encoded image without the data URI prefix. For example:
iVBORw0KGgo...
. - A URL pointing to the image. The image must be accessible publicly.
Supported formats are: PNG, JPG and WEBP.
maskMargin
Adds extra context pixels around the masked region during inpainting. When this parameter is present, the model will zoom into the masked area, considering these additional pixels to create more coherent and well-integrated details.
This parameter is particularly effective when used with masks generated by the Image Masking API, enabling enhanced detail generation while maintaining natural integration with the surrounding image.
strength
When doing Image-to-Image, Inpainting or Outpainting, this parameter is used to determine the influence of the
seedImage
image in the generated output. A higher value results in more influence from the original image, while a lower value allows more creative deviation.height
Used to define the height dimension of the generated image. Certain models perform better with specific dimensions.
The value must be divisible by 64, eg: 512, 576, 640...2048.
width
Used to define the width dimension of the generated image. Certain models perform better with specific dimensions.
The value must be divisible by 64, eg: 512, 576, 640...2048.
model
We make use of the AIR (Artificial Intelligence Resource) system to identify models. This identifier is a unique string that represents a specific model.
You can find the AIR identifier of the model you want to use in our Model Explorer, which is a tool that allows you to search for models based on their characteristics.
steps
The number of steps is the number of iterations the model will perform to generate the image. The higher the number of steps, the more detailed the image will be. However, increasing the number of steps will also increase the time it takes to generate the image and may not always result in a better image (some schedulers work differently).
When using your own models you can specify a new default value for the number of steps.
scheduler
An scheduler is a component that manages the inference process. Different schedulers can be used to achieve different results like more detailed images, faster inference, or more accurate results.
The default scheduler is the one that the model was trained with, but you can choose a different one to get different results.
Schedulers are explained in more detail in the Schedulers page.
seed
A seed is a value used to randomize the image generation. If you want to make images reproducible (generate the same image multiple times), you can use the same seed value.
When requesting multiple images with the same seed, the seed will be incremented by
1
(+1) for each image generated.CFGScale
Guidance scale represents how closely the images will resemble the prompt or how much freedom the AI model has. Higher values are closer to the prompt. Low values may reduce the quality of the results.
clipSkip
CLIP Skip is a feature that enables skipping layers of the CLIP embedding process, leading to quicker and more varied image generation.
usePromptWeighting
Allow setting different weights per words or expressions in prompts.
Adds 0.2 seconds to image inference time and incurs additional costs.
When weighting is enabled, you can use the following syntax in prompts:
Weighting
Syntax:
+
-
(word)0.9
Increase or decrease the attention given to specific words or phrases.
Examples:
- Single words:
small+ dog, pixar style
- Multiple words:
small dog, (pixar style)-
- Multiple symbols for more effect:
small+++ dog, pixar style
- Nested weighting:
(small+ dog)++, pixar style
- Explicit weight percentage:
small dog, (pixar)1.2 style
Blend
Syntax:
.blend()
Merge multiple conditioning prompts.
Example:
("small dog", "robot").blend(1, 0.8)
Conjunction
Syntax:
.and()
Break a prompt into multiple clauses and pass them separately.
Example:
("small dog", "pixar style").and()
- Single words:
numberResults
The number of images to generate from the specified prompt.
If seed is set, it will be incremented by
1
(+1) for each image generated.refiner
Refiner models help create higher quality image outputs by incorporating specialized models designed to enhance image details and overall coherence. This can be particularly useful when you need results with superior quality, photorealism, or specific aesthetic refinements. Note that refiner models are only SDXL based.
The
refiner
parameter is an object that contains properties defining how the refinement process should be configured. You can find the properties of the refiner object below.example of Refiner object
parameters
model
We make use of the AIR system to identify refinement models. This identifier is a unique string that represents a specific model. Note that refiner models are only SDXL based.
You can find the AIR identifier of the refinement model you want to use in our Model Explorer, which is a tool that allows you to search for models based on their characteristics.
More information about the AIR system can be found in the Models page.
startStep
Represents the step number at which the refinement process begins. The initial model will generate the image up to this step, after which the refiner model takes over to enhance the result.
It can take values from
1
to the number of steps specified.Alternative parameters:
refiner.startStepPercentage
startStepPercentage
Represents the percentage of total steps at which the refinement process begins. The initial model will generate the image up to this percentage of steps before the refiner takes over.
It can take values from
1
to99
.Alternative parameters:
refiner.startStep
stringrequiredintegerMin: 1Max: {steps}integerMin: 1Max: 99controlNet
With ControlNet, you can provide a guide image to help the model generate images that align with the desired structure. This guide image can be generated with our ControlNet preprocessing tool, extracting guidance information from an input image. The guide image can be in the form of an edge map, a pose, a depth estimation or any other type of control image that guides the generation process via the ControlNet model.
Multiple ControlNet models can be used at the same time to provide different types of guidance information to the model.
The
controlNet
parameter is an array of objects. Each object contains properties that define the configuration for a specific ControlNet model. You can find the properties of the ControlNet object below.example of ControlNet object
parameters
model
For basic/common ControlNet models, you can check the list of available models here.
For custom or specific ControlNet models, we make use of the AIR system to identify ControlNet models. This identifier is a unique string that represents a specific model.
You can find the AIR identifier of the ControlNet model you want to use in our Model Explorer, which is a tool that allows you to search for models based on their characteristics.
More information about the AIR system can be found in the Models page.
guideImage
Specifies the preprocessed image to be used as guide to control the image generation process. The image can be specified in one of the following formats:
- An UUID v4 string of a previously uploaded image or a generated image.
- A data URI string representing the image. The data URI must be in the format
data:<mediaType>;base64,
followed by the base64-encoded image. For example:data:image/png;base64,iVBORw0KGgo...
. - A base64 encoded image without the data URI prefix. For example:
iVBORw0KGgo...
. - A URL pointing to the image. The image must be accessible publicly.
Supported formats are: PNG, JPG and WEBP.
weight
Represents the weight (strength) of the ControlNet model in the image.
startStep
Represents the step number at which the ControlNet model starts to control the inference process.
It can take values from
0
(first step) to the number of steps specified.Alternative parameters:
controlNet.startStepPercentage
startStepPercentage
Represents the percentage of steps at which the ControlNet model starts to control the inference process.
It can take values from
0
to100
.Alternative parameters:
controlNet.startStep
endStep
Represents the step number at which the ControlNet preprocessor ends to control the inference process.
It can take values higher than startStep and less than or equal to the number of steps specified.
Alternative parameters:
controlNet.endStepPercentage
endStepPercentage
Represents the percentage of steps at which the ControlNet model ends to control the inference process.
It can take values higher than startStepPercentage and lower than or equal to
100
.Alternative parameters:
controlNet.endStep
controlMode
This parameter has 3 options:
prompt
,controlnet
andbalanced
.prompt
: Prompt is more important in guiding image generation.controlnet
: ControlNet is more important in guiding image generation.balanced
: Balanced operation of prompt and ControlNet.
stringrequiredstringrequiredfloatMin: 0Max: 1Default: 1integerMin: 0Max: {steps}integerMin: 0Max: 99integerMin: {startStep + 1}Max: {steps}integerMin: {startStepPercentage + 1}Max: 100stringlora
With LoRA (Low-Rank Adaptation), you can adapt a model to specific styles or features by emphasizing particular aspects of the data. This technique enhances the quality and relevance of the generated images and can be especially useful in scenarios where the generated images need to adhere to a specific artistic style or follow particular guidelines.
Multiple LoRA models can be used at the same time to achieve different adaptation goals.
The
lora
parameter is an array of objects. Each object contains properties that define the configuration for a specific LoRA model. You can find the properties of the LoRA object below.example of LoRA object
parameters
model
We make use of the AIR system to identify LoRA models. This identifier is a unique string that represents a specific model.
You can find the AIR identifier of the LoRA model you want to use in our Model Explorer, which is a tool that allows you to search for models based on their characteristics.
More information about the AIR system can be found in the Models page.
Example:
civitai:132942@146296
.weight
Defines the strength or influence of the LoRA model in the generation process. The value can range from -4 (negative influence) to +4 (maximum influence).
It is possible to use multiple LoRAs at the same time.
Example:
stringrequiredfloatMin: -4Max: 4Default: 1
Response
Results will be delivered in the format below. It's possible to receive one or multiple images per message. This is due to the fact that images are generated in parallel, and generation time varies across nodes or the network.
taskUUID
The API will return the
taskUUID
you sent in the request. This way you can match the responses to the correct request tasks.imageUUID
The unique identifier of the image.
imageURL
If
outputType
is set toURL
, this parameter contains the URL of the image to be downloaded.imageBase64Data
If
outputType
is set tobase64Data
, this parameter contains the base64-encoded image data.imageDataURI
If
outputType
is set todataURI
, this parameter contains the data URI of the image.NSFWContent
If checkNSFW parameter is used,
NSFWContent
is included informing if the image has been flagged as potentially sensitive content.true
indicates the image has been flagged (is a sensitive image).false
indicates the image has not been flagged.
The filter occasionally returns false positives and very rarely false negatives.
cost
if
includeCost
is set totrue
, the response will include acost
field for each task object. This field indicates the cost of the request in USD.