Image Masking

Generate precise masks automatically for faces, hands, and people using AI detection. Enhance your inpainting workflow with smart, automated masking features.

Introduction

The Image Masking API provides intelligent detection and mask generation for specific elements in images, particularly optimized for faces, hands, and people. Built on advanced detection models, this feature enhances the inpainting workflow by automatically creating precise masks around detected elements, enabling targeted enhancement and detailing.

How it works

The masks generated by this API can be utilized in two powerful inpainting workflows, offering both convenience and advanced detail enhancement capabilities.

In the standard inpainting workflow, these automatically generated masks eliminate the need for manual mask creation. Once the mask is generated by selecting the appropriate detection model (face, hand, or person), it can be directly used in an inpainting request. This automation is particularly valuable for batch processing or when consistent mask creation is needed across multiple images.

However, the most powerful application comes when using these masks for detail enhancement through the inpainting process's zooming capability. In this workflow, after the mask is automatically generated around detected elements (like faces or hands), the inpainting model will zoom into the masked area when the maskMargin parameter is present. This parameter is crucial as it adds extra context pixels around the masked region. For example, if you're enhancing a face, `maskMargin ensures the model can see enough of the surrounding area to create coherent and well-integrated details.

The full process typically follows these steps:

Submit the original image to the image masking API, specifying the desired detection model.
Receive a precise mask identifying the target area (face, hand, or person).
Use both the original image and the generated mask in an inpainting request, including the maskMargin parameter to enable zooming.
The inpainting model will zoom into every masked region, considering the extra context area specified by maskMargin. This allows the model to add finer details to the masked areas and blend them smoothly into your original image for a natural, refined look.

This combination of automatic masking and zoom-enabled inpainting is particularly effective for enhancing specific features while maintaining natural integration with the surrounding image context.

Request

Our API always accepts an array of objects as input, where each object represents a specific task to be performed. The structure of the object varies depending on the type of the task. For this section, we will focus on the parameters related to image masking task.

The following JSON snippets shows the basic structure of a request object. All properties are explained in detail in the next section.

[
  {
    "taskType": "imageMasking",
    "taskUUID": "string",
    "inputImage": "string",
    "model": "string",
    "confidence": float,
    "maxDetections": int,
    "maskPadding": int,
    "maskBlur": int,
    "outputFormat": "string",
    "outputType": "string"
  }
]

taskType: The type of task to be performed. For this task, the value should be imageMasking.
taskUUID: When a task is sent to the API you must include a random UUID v4 string using the taskUUID parameter. This string is used to match the async responses to their corresponding tasks.

If you send multiple tasks at the same time, the taskUUID will help you match the responses to the correct tasks.

The taskUUID must be unique for each task you send to the API.
outputType: Specifies the output type in which the image is returned. Supported values are: dataURI, URL, and base64Data.

base64Data: The image is returned as a base64-encoded string using the maskImageBase64Data parameter in the response object.

dataURI: The image is returned as a data URI string using the maskImageDataURI parameter in the response object.

URL: The image is returned as a URL string using the maskImageURL parameter in the response object.
outputFormat: Specifies the format of the output image. Supported formats are: PNG, JPG and WEBP.
uploadEndpoint: This parameter allows you to specify a URL to which the generated image will be uploaded as binary image data using the HTTP PUT method. For example, an S3 bucket URL can be used as the upload endpoint.

When the image is ready, it will be uploaded to the specified URL.
includeCost: If set to true, the cost to perform the task will be included in the response object.
inputImage: Specifies the input image to be processed for mask generation. The generated mask will identify specific elements in the image (faces, hands, or people) based on the selected detection model. The input image can be specified in one of the following formats:

An UUID v4 string of a previously uploaded image or a generated image.

A data URI string representing the image. The data URI must be in the format data:<mediaType>;base64, followed by the base64-encoded image. For example: data:image/png;base64,iVBORw0KGgo....

A base64 encoded image without the data URI prefix. For example: iVBORw0KGgo....

A URL pointing to the image. The image must be accessible publicly.

Supported formats are: PNG, JPG and WEBP.
model: Specifies the specialized detection model to use for mask generation. Currently supported models:

YOLOv8 Models:

face_yolov8n: Lightweight model for 2D/realistic face detection.

face_yolov8s: Enhanced face detection with improved accuracy.

hand_yolov8n: Specialized for 2D/realistic hand detection.

person_yolov8n-seg: Person detection and segmentation.

person_yolov8s-seg: Advanced person detection with higher precision.

MediaPipe Models:

mediapipe_face_full: Specialized for realistic face detection.

mediapipe_face_short: Optimized face detection with reduced complexity.

mediapipe_face_mesh: Advanced face detection with mesh mapping capabilities.

Each model is optimized for specific use cases and offers different trade-offs between speed and accuracy.
confidence: Confidence threshold for detections. Only detections with confidence scores above this threshold will be included in the mask.

Lower confidence values will detect more objects but may introduce false positives.
maxDetections: Limits the maximum number of elements (faces, hands, or people) that will be detected and masked in the image. If there are more elements than this value, only the ones with highest confidence scores will be included.
maskPadding: Extends or reduces the detected mask area by the specified number of pixels. Positive values create a larger masked region (useful when you want to ensure complete coverage of the element), while negative values shrink the mask (useful for tighter, more focused areas).
maskBlur: Extends the mask by the specified number of pixels with a gradual fade-out effect, creating smooth transitions between masked and unmasked regions in the final result.

Note: The blur is always applied to the outer edge of the mask, regardless of whether maskPadding is used.

Response

Results will be delivered in the format below. It's possible to receive one or multiple images per message. This is due to the fact that images are generated in parallel, and generation time varies across nodes or the network.

{
  "data": [
    {
      "taskType": "imageMasking",
      "taskUUID": "d06e972d-dbfe-47d5-955f-c26e00ce4960",
      "imageUUID": "90422a52-f186-4bf4-a73b-0a46016a8330",
      "detections": [
        {
          "x_min": 505,
          "y_min": 237,
          "x_max": 588,
          "y_max": 337
        }
      ],
      "maskImageURL": "https://im.runware.ai/image/ws/0.5/ii/a770f077-f413-47de-9dac-be0b26a35da6.jpg",
      "cost": 0.0013
    }
  ]
}

taskType: The API will return the taskType you sent in the request. In this case, it will be imageMasking. This helps match the responses to the correct task type.
taskUUID: The API will return the taskUUID you sent in the request. This way you can match the responses to the correct request tasks.
inputImageUUID: The unique identifier of the original image used as input for the masking task.
detections: An array of objects containing the coordinates of each detected element in the image. Each object provides the bounding box coordinates of a detected face, hand, or person (depending on the model used).

Each detection object includes:

x_min: Leftmost coordinate of the detected area

y_min: Topmost coordinate of the detected area

x_max: Rightmost coordinate of the detected area

y_max: Bottommost coordinate of the detected area

These coordinates can be useful for further processing or for understanding the exact location of detected elements in the image.
maskImageUUID: The unique identifier of the mask image.
maskImageURL: If outputType is set to URL, this parameter contains the URL of the mask image to be downloaded.
maskImageDataURI: If outputType is set to dataURI, this parameter contains the data URI of the mask image.
cost: if includeCost is set to true, the response will include a cost field for each task object. This field indicates the cost of the request in USD.