Image Masking
Introduction
The Image Masking API provides intelligent detection and mask generation for specific elements in images, particularly optimized for faces, hands, and people. Built on advanced detection models, this feature enhances the inpainting workflow by automatically creating precise masks around detected elements, enabling targeted enhancement and detailing.
How it works
The masks generated by this API can be utilized in two powerful inpainting workflows, offering both convenience and advanced detail enhancement capabilities.
In the standard inpainting workflow, these automatically generated masks eliminate the need for manual mask creation. Once the mask is generated by selecting the appropriate detection model (face, hand, or person), it can be directly used in an inpainting request. This automation is particularly valuable for batch processing or when consistent mask creation is needed across multiple images.
However, the most powerful application comes when using these masks for detail enhancement through the inpainting process's zooming capability. In this workflow, after the mask is automatically generated around detected elements (like faces or hands), the inpainting model will zoom into the masked area when the maskMargin parameter is present. This parameter is crucial as it adds extra context pixels around the masked region. For example, if you're enhancing a face, `maskMargin ensures the model can see enough of the surrounding area to create coherent and well-integrated details.
The full process typically follows these steps:
- Submit the original image to the image masking API, specifying the desired detection model.
- Receive a precise mask identifying the target area (face, hand, or person).
- Use both the original image and the generated mask in an inpainting request, including the
maskMargin
parameter to enable zooming. - The inpainting model will zoom into every masked region, considering the extra context area specified by
maskMargin
. This allows the model to add finer details to the masked areas and blend them smoothly into your original image for a natural, refined look.
This combination of automatic masking and zoom-enabled inpainting is particularly effective for enhancing specific features while maintaining natural integration with the surrounding image context.
Request
Our API always accepts an array of objects as input, where each object represents a specific task to be performed. The structure of the object varies depending on the type of the task. For this section, we will focus on the parameters related to image masking task.
The following JSON snippets shows the basic structure of a request object. All properties are explained in detail in the next section.
taskType
The type of task to be performed. For this task, the value should be
imageMasking
.taskUUID
When a task is sent to the API you must include a random UUID v4 string using the
taskUUID
parameter. This string is used to match the async responses to their corresponding tasks.If you send multiple tasks at the same time, the
taskUUID
will help you match the responses to the correct tasks.The
taskUUID
must be unique for each task you send to the API.outputType
Specifies the output type in which the image is returned. Supported values are:
dataURI
,URL
, andbase64Data
.base64Data
: The image is returned as a base64-encoded string using themaskImageBase64Data
parameter in the response object.dataURI
: The image is returned as a data URI string using themaskImageDataURI
parameter in the response object.URL
: The image is returned as a URL string using themaskImageURL
parameter in the response object.
outputFormat
Specifies the format of the output image. Supported formats are:
PNG
,JPG
andWEBP
.uploadEndpoint
This parameter allows you to specify a URL to which the generated image will be uploaded as binary image data using the HTTP PUT method. For example, an S3 bucket URL can be used as the upload endpoint.
When the image is ready, it will be uploaded to the specified URL.
includeCost
If set to
true
, the cost to perform the task will be included in the response object.inputImage
Specifies the input image to be processed for mask generation. The generated mask will identify specific elements in the image (faces, hands, or people) based on the selected detection model. The input image can be specified in one of the following formats:
- An UUID v4 string of a previously uploaded image or a generated image.
- A data URI string representing the image. The data URI must be in the format
data:<mediaType>;base64,
followed by the base64-encoded image. For example:...
. - A base64 encoded image without the data URI prefix. For example:
iVBORw0KGgo...
. - A URL pointing to the image. The image must be accessible publicly.
Supported formats are: PNG, JPG and WEBP.
model
Specifies the specialized detection model to use for mask generation. Currently supported models:
YOLOv8 Models:
face_yolov8n
: Lightweight model for 2D/realistic face detection.face_yolov8s
: Enhanced face detection with improved accuracy.hand_yolov8n
: Specialized for 2D/realistic hand detection.person_yolov8n-seg
: Person detection and segmentation.person_yolov8s-seg
: Advanced person detection with higher precision.
MediaPipe Models:
mediapipe_face_full
: Specialized for realistic face detection.mediapipe_face_short
: Optimized face detection with reduced complexity.mediapipe_face_mesh
: Advanced face detection with mesh mapping capabilities.
Each model is optimized for specific use cases and offers different trade-offs between speed and accuracy.
confidence
Confidence threshold for detections. Only detections with confidence scores above this threshold will be included in the mask.
Lower confidence values will detect more objects but may introduce false positives.
maxDetections
Limits the maximum number of elements (faces, hands, or people) that will be detected and masked in the image. If there are more elements than this value, only the ones with highest confidence scores will be included.
maskPadding
Extends or reduces the detected mask area by the specified number of pixels. Positive values create a larger masked region (useful when you want to ensure complete coverage of the element), while negative values shrink the mask (useful for tighter, more focused areas).
maskBlur
Extends the mask by the specified number of pixels with a gradual fade-out effect, creating smooth transitions between masked and unmasked regions in the final result.
Note: The blur is always applied to the outer edge of the mask, regardless of whether
maskPadding
is used.
Response
Results will be delivered in the format below. It's possible to receive one or multiple images per message. This is due to the fact that images are generated in parallel, and generation time varies across nodes or the network.
taskType
The API will return the
taskType
you sent in the request. In this case, it will beimageMasking
. This helps match the responses to the correct task type.taskUUID
The API will return the
taskUUID
you sent in the request. This way you can match the responses to the correct request tasks.inputImageUUID
The unique identifier of the original image used as input for the masking task.
detections
An array of objects containing the coordinates of each detected element in the image. Each object provides the bounding box coordinates of a detected face, hand, or person (depending on the model used).
Each detection object includes:
x_min
: Leftmost coordinate of the detected areay_min
: Topmost coordinate of the detected areax_max
: Rightmost coordinate of the detected areay_max
: Bottommost coordinate of the detected area
These coordinates can be useful for further processing or for understanding the exact location of detected elements in the image.
maskImageUUID
The unique identifier of the mask image.
maskImageURL
If
outputType
is set toURL
, this parameter contains the URL of the mask image to be downloaded.maskImageDataURI
If
outputType
is set todataURI
, this parameter contains the data URI of the mask image.cost
if
includeCost
is set totrue
, the response will include acost
field for each task object. This field indicates the cost of the request in USD.