Image Inference API

Generate images from text prompts or transform existing ones using Runware's API. Learn how to do image inference for creative and high-quality results.

Introduction

Image inference is a powerful feature that allows you to generate images from text prompts or transform existing images according to your needs. This process is essential for creating high-quality visuals, whether you're looking to bring creative ideas to life or enhance existing images with new styles or subjects.

There are several types of image inference requests you can make using our API:

Text-to-Image: Generate images from descriptive text prompts. This process translates your text into high-quality visuals, allowing you to create detailed and vivid images based on your ideas.
Image-to-Image: Perform transformations on existing images, whether they are previously generated images or uploaded images. This process enables you to enhance, modify, or stylize images to create new and visually appealing content. With a single parameter you can control the strength of the transformation.
Inpainting: Replace parts of an image with new content, allowing you to remove unwanted elements or improve the overall composition of an image. It's like Image-to-Image but with a mask that defines the area to be transformed.
Outpainting: Extend the boundaries of an image by generating new content outside the original frame that seamlessly blends with the existing image. As Inpainting, it uses a mask to define the new area to be generated.

Our API also supports advanced features that allow developers to fine-tune the image generation process with precision:

ControlNet: A feature that enables precise control over image generation by using additional input conditions, such as edge maps, poses, or segmentation masks. This allows for more accurate alignment with specific user requirements or styles.
LoRA: A technique that helps in adapting models to specific styles or tasks by focusing on particular aspects of the data, enhancing the quality and relevance of the generated images.

Additionally, you can tweak numerous parameters to customize the output, such as adjusting the image dimension, steps, scheduler to use, and other generation settings, providing a high level of flexibility to suit your application's needs.

Our API is really fast because we have unique optimizations, custom-designed hardware, and many other elements that are part of our Sonic Inference Engine.

Request

Our API always accepts an array of objects as input, where each object represents a specific task to be performed. The structure of the object varies depending on the type of the task. For this section, we will focus on the parameters related to image inference tasks.

The following JSON snippets shows the basic structure of a request object. All properties are explained in detail in the next section.

Text to Image

[
  {
    "taskType": "imageInference",
    "taskUUID": "string",
    "outputType": "string",
    "outputFormat": "string",
    "positivePrompt": "string",
    "negativePrompt": "string",
    "height": int,
    "width": int,
    "model": "string",
    "steps": int,
    "CFGScale": float,
    "numberResults": int
  }
]

Image to Image

[
  {
    "taskType": "imageInference",
    "taskUUID": "string",
    "positivePrompt": "string",
    "seedImage": "string",
    "model": "string",
    "height": int,
    "width": int,
    "strength": float,
    "numberResults": int
  }
]

In/Outpainting

[
  {
    "taskType": "imageInference",
    "taskUUID": "string",
    "positivePrompt": "string",
    "seedImage": "string",
    "maskImage": "string",
    "model": "string",
    "height": int,
    "width": int,
    "strength": float,
    "numberResults": int
  }
]

Refiner

[
  {
    "taskType": "imageInference",
    "taskUUID": "string",
    "positivePrompt": "string",
    "model": "string",
    "refiner": { 
      "model": "string",
      "startStep": int 
    },
    "height": int,
    "width": int,
    "numberResults": int
  }
]

ControlNet

[
  {
    "taskType": "imageInference",
    "taskUUID": "string",
    "positivePrompt": "string",
    "model": "string",
    "height": int,
    "width": int,
    "numberResults": int,
    "controlNet": [ 
      { 
        "model": "string",
        "guideImage": "string",
        "weight": float,
        "startStep": int,
        "endStep": int,
        "controlMode": "string"
      },
      { 
        "model": "string",
        "guideImage": "string",
        "weight": float,
        "startStep": int,
        "endStep": int,
        "controlMode": "string"
      } 
    ] 
  }
]

LoRA

[
  {
    "taskType": "imageInference",
    "taskUUID": "string",
    "positivePrompt": "string",
    "model": "string",
    "height": int,
    "width": int,
    "numberResults": int,
    "lora": [ 
      { 
        "model": "string",
        "weight": float 
      },
      { 
        "model": "string",
        "weight": float 
      } 
    ] 
  }
]

You can mix multiple ControlNet and LoRA objects in the same request to achieve more complex control over the generation process.

taskType: The type of task to be performed. For this task, the value should be imageInference.
taskUUID: When a task is sent to the API you must include a random UUID v4 string using the taskUUID parameter. This string is used to match the async responses to their corresponding tasks.

If you send multiple tasks at the same time, the taskUUID will help you match the responses to the correct tasks.

The taskUUID must be unique for each task you send to the API.
outputType: Specifies the output type in which the image is returned. Supported values are: dataURI, URL, and base64Data.

base64Data: The image is returned as a base64-encoded string using the imageBase64Data parameter in the response object.

dataURI: The image is returned as a data URI string using the imageDataURI parameter in the response object.

URL: The image is returned as a URL string using the imageURL parameter in the response object.
outputFormat: Specifies the format of the output image. Supported formats are: PNG, JPG and WEBP.
uploadEndpoint: This parameter allows you to specify a URL to which the generated image will be uploaded as binary image data using the HTTP PUT method. For example, an S3 bucket URL can be used as the upload endpoint.

When the image is ready, it will be uploaded to the specified URL.
checkNSFW: This parameter is used to enable or disable the NSFW check. When enabled, the API will check if the image contains NSFW (not safe for work) content. This check is done using a pre-trained model that detects adult content in images.

When the check is enabled, the API will return NSFWContent: true in the response object if the image is flagged as potentially sensitive content. If the image is not flagged, the API will return NSFWContent: false.

If this parameter is not used, the parameter NSFWContent will not be included in the response object.

Adds 0.1 seconds to image inference time and incurs additional costs.

The NSFW filter occasionally returns false positives and very rarely false negatives.
includeCost: If set to true, the cost to perform the task will be included in the response object.
positivePrompt: A positive prompt is a text instruction to guide the model on generating the image. It is usually a sentence or a paragraph that provides positive guidance for the task. This parameter is essential to shape the desired results.

For example, if the positive prompt is "dragon drinking coffee", the model will generate an image of a dragon drinking coffee. The more detailed the prompt, the more accurate the results.

The length of the prompt must be between 2 and 2000 characters.
negativePrompt: A negative prompt is a text instruction to guide the model on generating the image. It is usually a sentence or a paragraph that provides negative guidance for the task. This parameter helps to avoid certain undesired results.

For example, if the negative prompt is "red dragon, cup", the model will follow the positive prompt but will avoid generating an image of a red dragon or including a cup. The more detailed the prompt, the more accurate the results.

The length of the prompt must be between 2 and 2000 characters.
seedImage: When doing Image-to-Image, Inpainting or Outpainting, this parameter is required.

Specifies the seed image to be used for the diffusion process. The image can be specified in one of the following formats:

An UUID v4 string of a previously uploaded image or a generated image.

A data URI string representing the image. The data URI must be in the format data:<mediaType>;base64, followed by the base64-encoded image. For example: data:image/png;base64,iVBORw0KGgo....

A base64 encoded image without the data URI prefix. For example: iVBORw0KGgo....

A URL pointing to the image. The image must be accessible publicly.

Supported formats are: PNG, JPG and WEBP.
maskImage: When doing Inpainting or Outpainting, this parameter is required.

Specifies the mask image to be used for the inpainting process. The image can be specified in one of the following formats:

An UUID v4 string of a previously uploaded image or a generated image.

A data URI string representing the image. The data URI must be in the format data:<mediaType>;base64, followed by the base64-encoded image. For example: data:image/png;base64,iVBORw0KGgo....

A base64 encoded image without the data URI prefix. For example: iVBORw0KGgo....

A URL pointing to the image. The image must be accessible publicly.

Supported formats are: PNG, JPG and WEBP.
maskMargin: Adds extra context pixels around the masked region during inpainting. When this parameter is present, the model will zoom into the masked area, considering these additional pixels to create more coherent and well-integrated details.

This parameter is particularly effective when used with masks generated by the Image Masking API, enabling enhanced detail generation while maintaining natural integration with the surrounding image.
strength: When doing Image-to-Image, Inpainting or Outpainting, this parameter is used to determine the influence of the seedImage image in the generated output. A higher value results in more influence from the original image, while a lower value allows more creative deviation.
height: Used to define the height dimension of the generated image. Certain models perform better with specific dimensions.

The value must be divisible by 64, eg: 512, 576, 640...2048.
width: Used to define the width dimension of the generated image. Certain models perform better with specific dimensions.

The value must be divisible by 64, eg: 512, 576, 640...2048.
model: We make use of the AIR (Artificial Intelligence Resource) system to identify models. This identifier is a unique string that represents a specific model.

You can find the AIR identifier of the model you want to use in our Model Explorer, which is a tool that allows you to search for models based on their characteristics.
steps: The number of steps is the number of iterations the model will perform to generate the image. The higher the number of steps, the more detailed the image will be. However, increasing the number of steps will also increase the time it takes to generate the image and may not always result in a better image (some schedulers work differently).

When using your own models you can specify a new default value for the number of steps.
scheduler: An scheduler is a component that manages the inference process. Different schedulers can be used to achieve different results like more detailed images, faster inference, or more accurate results.

The default scheduler is the one that the model was trained with, but you can choose a different one to get different results.

Schedulers are explained in more detail in the Schedulers page.
seed: A seed is a value used to randomize the image generation. If you want to make images reproducible (generate the same image multiple times), you can use the same seed value.

When requesting multiple images with the same seed, the seed will be incremented by 1 (+1) for each image generated.
CFGScale: Guidance scale represents how closely the images will resemble the prompt or how much freedom the AI model has. Higher values are closer to the prompt. Low values may reduce the quality of the results.
clipSkip: CLIP Skip is a feature that enables skipping layers of the CLIP embedding process, leading to quicker and more varied image generation.
usePromptWeighting: Allow setting different weights per words or expressions in prompts.

Adds 0.2 seconds to image inference time and incurs additional costs.

When weighting is enabled, you can use the following syntax in prompts:

Weighting

Syntax: + - (word)0.9

Increase or decrease the attention given to specific words or phrases.

Examples:

Single words: small+ dog, pixar style

Multiple words: small dog, (pixar style)-

Multiple symbols for more effect: small+++ dog, pixar style

Nested weighting: (small+ dog)++, pixar style

Explicit weight percentage: small dog, (pixar)1.2 style

Blend

Syntax: .blend()

Merge multiple conditioning prompts.

Example: ("small dog", "robot").blend(1, 0.8)

Conjunction

Syntax: .and()

Break a prompt into multiple clauses and pass them separately.

Example: ("small dog", "pixar style").and()
numberResults: The number of images to generate from the specified prompt.

If seed is set, it will be incremented by 1 (+1) for each image generated.
refiner: Refiner models help create higher quality image outputs by incorporating specialized models designed to enhance image details and overall coherence. This can be particularly useful when you need results with superior quality, photorealism, or specific aesthetic refinements. Note that refiner models are only SDXL based.

The refiner parameter is an object that contains properties defining how the refinement process should be configured. You can find the properties of the refiner object below.

example of Refiner object

[ { "taskType": "imageInference", "taskUUID": "string", "positivePrompt": "string", "model": "string", "height": int, "width": int, "numberResults": int, "refiner": { "model": "string", "startStep": int } } ]

parameters

model
stringrequired
We make use of the AIR system to identify refinement models. This identifier is a unique string that represents a specific model. Note that refiner models are only SDXL based.

You can find the AIR identifier of the refinement model you want to use in our Model Explorer, which is a tool that allows you to search for models based on their characteristics.

More information about the AIR system can be found in the Models page.
startStep
integerMin: 1Max: {steps}
Represents the step number at which the refinement process begins. The initial model will generate the image up to this step, after which the refiner model takes over to enhance the result.

It can take values from 1 to the number of steps specified.
Alternative parameters:refiner.startStepPercentage
startStepPercentage
integerMin: 1Max: 99
Represents the percentage of total steps at which the refinement process begins. The initial model will generate the image up to this percentage of steps before the refiner takes over.

It can take values from 1 to 99.
Alternative parameters:refiner.startStep
controlNet: With ControlNet, you can provide a guide image to help the model generate images that align with the desired structure. This guide image can be generated with our ControlNet preprocessing tool, extracting guidance information from an input image. The guide image can be in the form of an edge map, a pose, a depth estimation or any other type of control image that guides the generation process via the ControlNet model.

Multiple ControlNet models can be used at the same time to provide different types of guidance information to the model.

The controlNet parameter is an array of objects. Each object contains properties that define the configuration for a specific ControlNet model. You can find the properties of the ControlNet object below.

example of ControlNet object

[ { "taskType": "imageInference", "taskUUID": "string", "positivePrompt": "string", "model": "string", "height": int, "width": int, "numberResults": int, "controlNet": [ { "model": "string", "guideImage": "string", "weight": float, "startStep": int, "endStep": int, "controlMode": "string" }, { "model": "string", "guideImage": "string", "weight": float, "startStep": int, "endStep": int, "controlMode": "string" } ] } ]

parameters

model
stringrequired
For basic/common ControlNet models, you can check the list of available models here.

For custom or specific ControlNet models, we make use of the AIR system to identify ControlNet models. This identifier is a unique string that represents a specific model.

You can find the AIR identifier of the ControlNet model you want to use in our Model Explorer, which is a tool that allows you to search for models based on their characteristics.

More information about the AIR system can be found in the Models page.
guideImage
stringrequired
Specifies the preprocessed image to be used as guide to control the image generation process. The image can be specified in one of the following formats:

An UUID v4 string of a previously uploaded image or a generated image.

A data URI string representing the image. The data URI must be in the format data:<mediaType>;base64, followed by the base64-encoded image. For example: data:image/png;base64,iVBORw0KGgo....

A base64 encoded image without the data URI prefix. For example: iVBORw0KGgo....

A URL pointing to the image. The image must be accessible publicly.

Supported formats are: PNG, JPG and WEBP.
weight
floatMin: 0Max: 1Default: 1
Represents the weight (strength) of the ControlNet model in the image.
startStep
integerMin: 0Max: {steps}
Represents the step number at which the ControlNet model starts to control the inference process.

It can take values from 0 (first step) to the number of steps specified.
Alternative parameters:controlNet.startStepPercentage
startStepPercentage
integerMin: 0Max: 99
Represents the percentage of steps at which the ControlNet model starts to control the inference process.

It can take values from 0 to 100.
Alternative parameters:controlNet.startStep
endStep
integerMin: {startStep + 1}Max: {steps}
Represents the step number at which the ControlNet preprocessor ends to control the inference process.

It can take values higher than startStep and less than or equal to the number of steps specified.
Alternative parameters:controlNet.endStepPercentage
endStepPercentage
integerMin: {startStepPercentage + 1}Max: 100
Represents the percentage of steps at which the ControlNet model ends to control the inference process.

It can take values higher than startStepPercentage and lower than or equal to 100.
Alternative parameters:controlNet.endStep
controlMode
string
This parameter has 3 options: prompt, controlnet and balanced.

prompt: Prompt is more important in guiding image generation.

controlnet: ControlNet is more important in guiding image generation.

balanced: Balanced operation of prompt and ControlNet.
lora: With LoRA (Low-Rank Adaptation), you can adapt a model to specific styles or features by emphasizing particular aspects of the data. This technique enhances the quality and relevance of the generated images and can be especially useful in scenarios where the generated images need to adhere to a specific artistic style or follow particular guidelines.

Multiple LoRA models can be used at the same time to achieve different adaptation goals.

The lora parameter is an array of objects. Each object contains properties that define the configuration for a specific LoRA model. You can find the properties of the LoRA object below.

example of LoRA object

[ { "taskType": "imageInference", "taskUUID": "string", "positivePrompt": "string", "model": "string", "height": int, "width": int, "numberResults": int, "lora": [ { "model": "string", "weight": float }, { "model": "string", "weight": float } ] } ]

parameters

model
stringrequired
We make use of the AIR system to identify LoRA models. This identifier is a unique string that represents a specific model.

You can find the AIR identifier of the LoRA model you want to use in our Model Explorer, which is a tool that allows you to search for models based on their characteristics.

More information about the AIR system can be found in the Models page.

Example: civitai:132942@146296.
weight
floatMin: -4Max: 4Default: 1
Defines the strength or influence of the LoRA model in the generation process. The value can range from -4 (negative influence) to +4 (maximum influence).

It is possible to use multiple LoRAs at the same time.

Example:

"lora": [ { "model": "runware:13090@1", "weight": 1.5 }, { "model": "runware:6638@1", "weight": 0.8 } ]

Response

Results will be delivered in the format below. It's possible to receive one or multiple images per message. This is due to the fact that images are generated in parallel, and generation time varies across nodes or the network.

{
  "data": [
    {
      "taskType": "imageInference",
      "taskUUID": "a770f077-f413-47de-9dac-be0b26a35da6",
      "imageUUID": "77da2d99-a6d3-44d9-b8c0-ae9fb06b6200",
      "imageURL": "https://im.runware.ai/image/ws/0.5/ii/a770f077-f413-47de-9dac-be0b26a35da6.jpg",
      "cost": 0.0013
    }
  ]
}

taskUUID: The API will return the taskUUID you sent in the request. This way you can match the responses to the correct request tasks.
imageUUID: The unique identifier of the image.
imageURL: If outputType is set to URL, this parameter contains the URL of the image to be downloaded.
imageBase64Data: If outputType is set to base64Data, this parameter contains the base64-encoded image data.
imageDataURI: If outputType is set to dataURI, this parameter contains the data URI of the image.
NSFWContent: If checkNSFW parameter is used, NSFWContent is included informing if the image has been flagged as potentially sensitive content.

true indicates the image has been flagged (is a sensitive image).

false indicates the image has not been flagged.

The filter occasionally returns false positives and very rarely false negatives.
cost: if includeCost is set to true, the response will include a cost field for each task object. This field indicates the cost of the request in USD.