Image tagging

The main purpose is the processing of raw predictions given by served models. TF serving models return only scores as a list without any other hint, like class names and most likely tags. This API processed that scores and returns more human-readable responses in forms of dictionaries.

The process_house_images endpoint goes further by combining predictions of images of the house and, for example, returning the Exterior Style score averaged over all facade images:

{
  "core_listing_id": 12454121,
  "status": true,
  "house_info": {
    "exterior_styles": {
      "a-frame": 0.0031661360922315,
      "american_craftsman": 0.0115414682475,
      "american_foursquare": 0.0029405293115,
      "brownstone": 6.2809734263e-06,
      "cape_cod": 0.043831701179999995,
      "chicago_school": 2.62403617567e-05,
      "condo": 0.000197127476505,
      "farmhouse": 0.078591498385,
      "french_provincial": 0.01015316158,
      "georgian": 0.48202647574499996,
      "international": 2.2621380555e-06,
      "log": 0.006821043926,
      "mediterranean": 0.0013629792955,
      "modern": 0.0003508611555,
      "queen_anne": 0.0021497522855,
      "raised_ranch": 0.00812103576,
      "ranch": 0.33468119285000003,
      "split_level": 0.00587461039,
      "townhouse": 0.0006853706710000001,
      "tudor_revival": 0.0062720832475,
      "not_facade": 0.0011981980139999999
    },
    "scene_ranking": {
      "facade": [
        2,
        1
      ],
      "attached_garage": [
        1
      ],
      "facade_bricks": [
        1
      ],
      "exterior": [
        2,
        1
      ],
      "backyard": [
        1,
        2
      ]
    }
  },
  "results_per_image": [
    {
      "id": 1,
      "scores": {},
      "message": "2 root error(s) found.\n  (0) INVALID_ARGUMENT: jpeg::Uncompress failed. Invalid JPEG data or crop window.\n\t [[{{function_node __inference_base64_to_img_33065}}{{node DecodeJpeg}}]]\n\t [[StatefulPartitionedCall/StatefulPartitionedCall/map/while/body/_635/map/while/PartitionedCall/resize/ResizeBilinear/_683]]\n  (1) INVALID_ARGUMENT: jpeg::Uncompress failed. Invalid JPEG data or crop window.\n\t [[{{function_node __inference_base64_to_img_33065}}{{node DecodeJpeg}}]]\n0 successful operations.\n0 derived errors ignored.",
      "status": false
    },
    {
      "id": 1,
      "scores": {
        "exterior": 0.99818182,
        "facade": 0.627584457,
        "attached_garage": 0.831839,
        "facade_bricks": 0.913291872,
        "backyard": 0.88838768
      },
      "status": true
    },
    {
      "id": 2,
      "scores": {
        "exterior": 0.999164939,
        "facade": 0.885006249,
        "backyard": 0.772896
      },
      "status": true
    }
  ]
}

all predict endpoints have the same request schema:

Request fields

images: List of dict with url safe encoded base64 image & id,
  images to predict on
core_listing_id: Optional, int,
  Identifier of the house the images belong to
  Used only in process_house_data, ignored in rest of the cases
thresholds: Optional, float, List of float, Dict of {class_name, float} pairs,
  Probability threshold to use when extracting tags via multilabel network.
  If not given, defaults from the config file will be used

Notes

images is a required field core_listing_id is used only in process_house_data thresholds is used only in multilabel networks (only tagger for now)

Example:

{
  "core_listing_id": 123,
  "images": [
    {
      "image": "base_64_string",
      "id": 0
    },
    {
      "image": "base_64_string",
      "id": 1
    }
  ],
  "thresholds": {
    "facade": 0.6,
    "interior": 0.55
  }
}

/predict/exterior_styles

This endpoint is wrapped around ExteriorStyles network, which takes an image of a house and predicts its exterior style. There are 21 total styles for now, including one class for non-facade images (for example, if you sent an image of interior).

This endpoint returns EXACTLY one class per image -> one with the highest probability of all 21.

id: int, "Identifier of the image"
results: dictionary, contains
  a list of present labels and their probabilities,
  status of the image
  message of the error if the status if false

example:

{
  "predictions": [
    {
      "id": 0,
      "results": {
        "exterior_styles": {
          "scores": [
            {
              "name": "georgian",
              "probability": 0.956090391
            }
          ],
          "status": true
        }
      }
    },
    {
      "id": 1,
      "results": {
        "exterior_styles": {
          "scores": [
            {
              "name": "ranch",
              "probability": 0.368982
            }
          ],
          "status": true
        }
      }
    }
  ]
}

Notes

If the image is successfully processed, this endpoint should return 1 and only 1 predicted class.
This endpoint ignores core_listing_id & thresholds parameters

/predict/tagger

This endpoint is wrapped around Tagger network, which takes image as input and returns a name of the present classes with their probabilities (0-1 float). By default, we say that image contains class "X", if it's probability is higher than 0.5

Tagger is a multilabel network, which means you can get MORE than one class per image or get none.

Response schema

id: int, "Identifier of the image"
results: dictionary, contains
  a list of present labels and their probabilities,
  status of the image
  message of the error if the status if false

example:

{
  "predictions": [
    {
      "id": 0,
      "results": {
        "tagger": {
          "scores": [],
          "message": "2 root error(s) found.\n  (0) INVALID_ARGUMENT: jpeg::Uncompress failed. Invalid JPEG data or crop window.\n\t [[{{function_node __inference_base64_to_img_33065}}{{node DecodeJpeg}}]]\n\t [[StatefulPartitionedCall/StatefulPartitionedCall/map/while/body/_635/map/while/PartitionedCall/resize/ResizeBilinear/_683]]\n  (1) INVALID_ARGUMENT: jpeg::Uncompress failed. Invalid JPEG data or crop window.\n\t [[{{function_node __inference_base64_to_img_33065}}{{node DecodeJpeg}}]]\n0 successful operations.\n0 derived errors ignored.",
          "status": false
        }
      }
    },
    {
      "id": 1,
      "results": {
        "tagger": {
          "scores": [
            {
              "name": "exterior",
              "probability": 0.998181343
            },
            {
              "name": "facade",
              "probability": 0.627602637
            },
            {
              "name": "attached_garage",
              "probability": 0.831817389
            },
            {
              "name": "facade_bricks",
              "probability": 0.913271
            },
            {
              "name": "backyard",
              "probability": 0.888361812
            }
          ],
          "status": true
        }
      }
    }
  ]
}

Notes

Since Tagger is a multilabel network, it can use thresholds parameter. Default value is 0.5 for all classes
Tagger CAN return 0 classes for image, if their probability is low. It can also return many classes per image - no restrictions here

/predict/encoder

This endpoint is wrapped around Encoder network. It transforms image into a 1280 dimensional vector. It is not testable for now.

/predict/process_house_images

Main endpoint, which uses both tagger and exterior styles under the hood. How it works:

Predict all images with tagger network using provided thresholds
Filter out images with "facade" class and pass them to exterior styles network. Add this predictions into image prediction dictionary too
Combine predictions for all images into house_info field of the response:
- Average all exterior styles over all images -> field exterior_styles
- Rank all classes by score in "prob_ranking" field. For example,
Optionally computes image embeddings and sends it to RabbitMQ.

[
  {
    "name": "facade",
    "ordering": [
      2,
      1
    ]
  },
  {
    "name": "backyard",
    "ordering": [
      3
    ]
  }
]

means "facade" is contained in images with id 1 & 2, and it's score in image with id=2 is higher than in image with id=1, while "backyard" is contained only in image with id=3 and isn't present in the rest of the images

Response schema

results_per_image: id of image and
  a list of present labels and their probabilities for each network it passed, i.e. tagger and
  exterior styles;
  status of the image;
  message of the error if the status if false
house_info: dictionary with prob_ranking and exterior_styles as described above
core_listing_id: id of a house
status: status of the house. True if at least one image is successfully processed

example:

{
  "core_listing_id": 123,
  "status": true,
  "house_info": {
    "exterior_styles": [
      {
        "name": "a-frame",
        "probability": 0.003163475386826
      },
      {
        "name": "american_craftsman",
        "probability": 0.011543401577500001
      },
      {
        "name": "american_foursquare",
        "probability": 0.002938385385
      },
      {
        "name": "brownstone",
        "probability": 6.28945050305e-06
      },
      {
        "name": "cape_cod",
        "probability": 0.043830761205
      },
      {
        "name": "chicago_school",
        "probability": 2.6277779165400002e-05
      },
      {
        "name": "condo",
        "probability": 0.00019724981487000002
      },
      {
        "name": "farmhouse",
        "probability": 0.078569122575
      },
      {
        "name": "french_provincial",
        "probability": 0.010154600239999999
      },
      {
        "name": "georgian",
        "probability": 0.48202906658
      },
      {
        "name": "international",
        "probability": 2.263999825e-06
      },
      {
        "name": "log",
        "probability": 0.0068244273335
      },
      {
        "name": "mediterranean",
        "probability": 0.001364317026
      },
      {
        "name": "modern",
        "probability": 0.00035098307099999996
      },
      {
        "name": "queen_anne",
        "probability": 0.0021490082739999998
      },
      {
        "name": "raised_ranch",
        "probability": 0.008122037165
      },
      {
        "name": "ranch",
        "probability": 0.33469547385049997
      },
      {
        "name": "split_level",
        "probability": 0.005873178365
      },
      {
        "name": "townhouse",
        "probability": 0.0006862457815000001
      },
      {
        "name": "tudor_revival",
        "probability": 0.006275861867
      },
      {
        "name": "not_facade",
        "probability": 0.0011976235205
      }
    ],
    "prob_ranking": [
      {
        "name": "facade_bricks",
        "ordering": [
          1
        ]
      },
      {
        "name": "facade",
        "ordering": [
          2,
          1
        ]
      },
      {
        "name": "attached_garage",
        "ordering": [
          1
        ]
      },
      {
        "name": "exterior",
        "ordering": [
          2,
          1
        ]
      },
      {
        "name": "backyard",
        "ordering": [
          1,
          2
        ]
      }
    ]
  },
  "results_per_image": [
    {
      "id": 1,
      "results": {
        "tagger": {
          "scores": [
            {
              "name": "exterior",
              "probability": 0.998181343
            },
            {
              "name": "facade",
              "probability": 0.627602637
            },
            {
              "name": "attached_garage",
              "probability": 0.831817389
            },
            {
              "name": "facade_bricks",
              "probability": 0.913271
            },
            {
              "name": "backyard",
              "probability": 0.888361812
            }
          ],
          "status": true
        },
        "exterior_styles": {
          "scores": [
            {
              "name": "ranch",
              "probability": 0.668982
            }
          ],
          "status": true
        }
      }
    },
    {
      "id": 2,
      "results": {
        "tagger": {
          "scores": [
            {
              "name": "exterior",
              "probability": 0.99916482
            },
            {
              "name": "facade",
              "probability": 0.884995699
            },
            {
              "name": "backyard",
              "probability": 0.772882104
            }
          ],
          "status": true
        },
        "exterior_styles": {
          "scores": [
            {
              "name": "georgian",
              "probability": 0.956090391
            }
          ],
          "status": true
        }
      }
    }
  ]
}

Sending embeddings to Queue

The endpoint can also calculate image embeddings and send them to RabbitMQ. To enable this, set the publish_embeddings flag to true. Messages will be saved using the ListingEmbeddings schema, which contains embeddings for each image, image ids and listing id. You can also apply GZIP compression to the embeddings by setting embeddings_compression="gzip", which reduces the payload size by 2–3 times.

Models

API connects with tf models with the configuration file stored in src/model_config.yaml

Tagger

Multilabel network that tags images with one or more from 101 possible classes, including:

Room types, like living room or sun room
Floor, wall, ceiling types and texture
Interior features like fireplace and kitchen island
Outdoor features like backyard or tennis court
General scenes, like view from balcony, drone view or facade

A report with more detailed info can be found here

NOTE: We decided to not show some of our worse performing classes by tweaking their thresholds - only 93 of 101 classes are shown by default. To receive the scores for those classes, tune the thresholds manually.

Exterior Styles Classifier

Multiclass (single-label) classifier that outputs the architectural style of the house based on its image. Total 23 possible classes:

Condo
Townhouse
20 architectural styles for SFR and apartment buildings
not_facade for non-fitting images

A report with more detailed info can be found here

Encoder

Receives an image and outputs a vector representation of it. Currently, we use the backbone of the Tagger with a l2 normalization layer on top of it.

Postprocessing

model_config.yaml also names of classes for classifier networks (Tagger, Exterior Styles), optional thresholds for multilabel network output filtering (default is 0.5) and optional postprocessing configuration for network's outputs.

Postprocessing includes:

Deleting specified classes if given class is present. For example, delete wooden_walls if sauna_room tag is present
Deleting all classes except for specified ones if given class is present. For example, all classes except for exterior are deleted when "drone" tag is present
Keep a class if other specified class is present, otherwise delete it. For example, keep bathtubs only in bathroom
Renaming the class if other class is present. For example, rename "pool" to "indoor_pool" if "interior" is present

Currently, we have postprocessing only for Tagger network which works both in /api/predict/tagger & api/predict/process_house_images endpoints:

Example of postprocessing config:

postprocessing_config:
remove_all_keys_except:
  - condition: drone
    exceptions:
      - exterior
  - condition: view
    exceptions:
      - water_view
      - exterior
  - condition: other
remove_keys_by_condition:
  - condition: sauna_room
    to_remove:
      - wood_texture_floors
      - wooden_ceilings
      - wooden_walls
rename_key_by_condition:
  - to_rename: outdoor_kitchen
    new_name: summer_kitchen
    condition: interior
keep_only_by_conditions:
  - conditions:
      - kitchen
      - bathroom
    to_filter:
      - tile_countertops
      - granite_countertops
      - marble_countertops

After postprocessing we might have classes that aren't in core network output list. In info endpoint one can get network.raw_output_names, which shows the original names of network outputs in original order, and outputs.all_classes, which includes every possible output the service can give (101 classes + "summer_kitchen" for Tagger for now)