Image tagging
The main purpose is the processing of raw predictions given by served models. TF serving models return only scores as a list without any other hint, like class names and most likely tags. This API processed that scores and returns more human-readable responses in forms of dictionaries.
The process_house_images endpoint goes further by combining predictions of images of the house and, for example, returning the Exterior Style score averaged over all facade images:
{
"core_listing_id": 12454121,
"status": true,
"house_info": {
"exterior_styles": {
"a-frame": 0.0031661360922315,
"american_craftsman": 0.0115414682475,
"american_foursquare": 0.0029405293115,
"brownstone": 6.2809734263e-06,
"cape_cod": 0.043831701179999995,
"chicago_school": 2.62403617567e-05,
"condo": 0.000197127476505,
"farmhouse": 0.078591498385,
"french_provincial": 0.01015316158,
"georgian": 0.48202647574499996,
"international": 2.2621380555e-06,
"log": 0.006821043926,
"mediterranean": 0.0013629792955,
"modern": 0.0003508611555,
"queen_anne": 0.0021497522855,
"raised_ranch": 0.00812103576,
"ranch": 0.33468119285000003,
"split_level": 0.00587461039,
"townhouse": 0.0006853706710000001,
"tudor_revival": 0.0062720832475,
"not_facade": 0.0011981980139999999
},
"scene_ranking": {
"facade": [
2,
1
],
"attached_garage": [
1
],
"facade_bricks": [
1
],
"exterior": [
2,
1
],
"backyard": [
1,
2
]
}
},
"results_per_image": [
{
"id": 1,
"scores": {},
"message": "2 root error(s) found.\n (0) INVALID_ARGUMENT: jpeg::Uncompress failed. Invalid JPEG data or crop window.\n\t [[{{function_node __inference_base64_to_img_33065}}{{node DecodeJpeg}}]]\n\t [[StatefulPartitionedCall/StatefulPartitionedCall/map/while/body/_635/map/while/PartitionedCall/resize/ResizeBilinear/_683]]\n (1) INVALID_ARGUMENT: jpeg::Uncompress failed. Invalid JPEG data or crop window.\n\t [[{{function_node __inference_base64_to_img_33065}}{{node DecodeJpeg}}]]\n0 successful operations.\n0 derived errors ignored.",
"status": false
},
{
"id": 1,
"scores": {
"exterior": 0.99818182,
"facade": 0.627584457,
"attached_garage": 0.831839,
"facade_bricks": 0.913291872,
"backyard": 0.88838768
},
"status": true
},
{
"id": 2,
"scores": {
"exterior": 0.999164939,
"facade": 0.885006249,
"backyard": 0.772896
},
"status": true
}
]
}
all predict endpoints have the same request schema:
Request fields
images: List of dict with url safe encoded base64 image & id,
images to predict on
core_listing_id: Optional, int,
Identifier of the house the images belong to
Used only in process_house_data, ignored in rest of the cases
thresholds: Optional, float, List of float, Dict of {class_name, float} pairs,
Probability threshold to use when extracting tags via multilabel network.
If not given, defaults from the config file will be used
Notes
images is a required field core_listing_id is used only in process_house_data thresholds is used only in multilabel networks (only tagger for now)
Example:
{
"core_listing_id": 123,
"images": [
{
"image": "base_64_string",
"id": 0
},
{
"image": "base_64_string",
"id": 1
}
],
"thresholds": {
"facade": 0.6,
"interior": 0.55
}
}
/predict/exterior_styles
This endpoint is wrapped around ExteriorStyles network, which takes an image of a house and predicts its exterior style. There are 21 total styles for now, including one class for non-facade images (for example, if you sent an image of interior).
This endpoint returns EXACTLY one class per image -> one with the highest probability of all 21.
id: int, "Identifier of the image"
results: dictionary, contains
a list of present labels and their probabilities,
status of the image
message of the error if the status if false
example:
{
"predictions": [
{
"id": 0,
"results": {
"exterior_styles": {
"scores": [
{
"name": "georgian",
"probability": 0.956090391
}
],
"status": true
}
}
},
{
"id": 1,
"results": {
"exterior_styles": {
"scores": [
{
"name": "ranch",
"probability": 0.368982
}
],
"status": true
}
}
}
]
}
Notes
- If the image is successfully processed, this endpoint should return 1 and only 1 predicted class.
- This endpoint ignores core_listing_id & thresholds parameters
/predict/tagger
This endpoint is wrapped around Tagger network, which takes image as input and returns a name of the present classes with their probabilities (0-1 float). By default, we say that image contains class "X", if it's probability is higher than 0.5
Tagger is a multilabel network, which means you can get MORE than one class per image or get none.
Response schema
id: int, "Identifier of the image"
results: dictionary, contains
a list of present labels and their probabilities,
status of the image
message of the error if the status if false
example:
{
"predictions": [
{
"id": 0,
"results": {
"tagger": {
"scores": [],
"message": "2 root error(s) found.\n (0) INVALID_ARGUMENT: jpeg::Uncompress failed. Invalid JPEG data or crop window.\n\t [[{{function_node __inference_base64_to_img_33065}}{{node DecodeJpeg}}]]\n\t [[StatefulPartitionedCall/StatefulPartitionedCall/map/while/body/_635/map/while/PartitionedCall/resize/ResizeBilinear/_683]]\n (1) INVALID_ARGUMENT: jpeg::Uncompress failed. Invalid JPEG data or crop window.\n\t [[{{function_node __inference_base64_to_img_33065}}{{node DecodeJpeg}}]]\n0 successful operations.\n0 derived errors ignored.",
"status": false
}
}
},
{
"id": 1,
"results": {
"tagger": {
"scores": [
{
"name": "exterior",
"probability": 0.998181343
},
{
"name": "facade",
"probability": 0.627602637
},
{
"name": "attached_garage",
"probability": 0.831817389
},
{
"name": "facade_bricks",
"probability": 0.913271
},
{
"name": "backyard",
"probability": 0.888361812
}
],
"status": true
}
}
}
]
}
Notes
- Since Tagger is a multilabel network, it can use thresholds parameter. Default value is 0.5 for all classes
- Tagger CAN return 0 classes for image, if their probability is low. It can also return many classes per image - no restrictions here
/predict/encoder
This endpoint is wrapped around Encoder network. It transforms image into a 1280 dimensional vector. It is not testable for now.
/predict/process_house_images
Main endpoint, which uses both tagger and exterior styles under the hood. How it works:
- Predict all images with tagger network using provided thresholds
- Filter out images with "facade" class and pass them to exterior styles network. Add this predictions into image prediction dictionary too
- Combine predictions for all images into house_info field of the response:
- Average all exterior styles over all images -> field exterior_styles
- Rank all classes by score in "prob_ranking" field. For example,
- Optionally computes image embeddings and sends it to RabbitMQ.
means "facade" is contained in images with id 1 & 2, and it's score in image with id=2 is higher than in image with id=1, while "backyard" is contained only in image with id=3 and isn't present in the rest of the images
Response schema
results_per_image: id of image and
a list of present labels and their probabilities for each network it passed, i.e. tagger and
exterior styles;
status of the image;
message of the error if the status if false
house_info: dictionary with prob_ranking and exterior_styles as described above
core_listing_id: id of a house
status: status of the house. True if at least one image is successfully processed
example:
{
"core_listing_id": 123,
"status": true,
"house_info": {
"exterior_styles": [
{
"name": "a-frame",
"probability": 0.003163475386826
},
{
"name": "american_craftsman",
"probability": 0.011543401577500001
},
{
"name": "american_foursquare",
"probability": 0.002938385385
},
{
"name": "brownstone",
"probability": 6.28945050305e-06
},
{
"name": "cape_cod",
"probability": 0.043830761205
},
{
"name": "chicago_school",
"probability": 2.6277779165400002e-05
},
{
"name": "condo",
"probability": 0.00019724981487000002
},
{
"name": "farmhouse",
"probability": 0.078569122575
},
{
"name": "french_provincial",
"probability": 0.010154600239999999
},
{
"name": "georgian",
"probability": 0.48202906658
},
{
"name": "international",
"probability": 2.263999825e-06
},
{
"name": "log",
"probability": 0.0068244273335
},
{
"name": "mediterranean",
"probability": 0.001364317026
},
{
"name": "modern",
"probability": 0.00035098307099999996
},
{
"name": "queen_anne",
"probability": 0.0021490082739999998
},
{
"name": "raised_ranch",
"probability": 0.008122037165
},
{
"name": "ranch",
"probability": 0.33469547385049997
},
{
"name": "split_level",
"probability": 0.005873178365
},
{
"name": "townhouse",
"probability": 0.0006862457815000001
},
{
"name": "tudor_revival",
"probability": 0.006275861867
},
{
"name": "not_facade",
"probability": 0.0011976235205
}
],
"prob_ranking": [
{
"name": "facade_bricks",
"ordering": [
1
]
},
{
"name": "facade",
"ordering": [
2,
1
]
},
{
"name": "attached_garage",
"ordering": [
1
]
},
{
"name": "exterior",
"ordering": [
2,
1
]
},
{
"name": "backyard",
"ordering": [
1,
2
]
}
]
},
"results_per_image": [
{
"id": 1,
"results": {
"tagger": {
"scores": [
{
"name": "exterior",
"probability": 0.998181343
},
{
"name": "facade",
"probability": 0.627602637
},
{
"name": "attached_garage",
"probability": 0.831817389
},
{
"name": "facade_bricks",
"probability": 0.913271
},
{
"name": "backyard",
"probability": 0.888361812
}
],
"status": true
},
"exterior_styles": {
"scores": [
{
"name": "ranch",
"probability": 0.668982
}
],
"status": true
}
}
},
{
"id": 2,
"results": {
"tagger": {
"scores": [
{
"name": "exterior",
"probability": 0.99916482
},
{
"name": "facade",
"probability": 0.884995699
},
{
"name": "backyard",
"probability": 0.772882104
}
],
"status": true
},
"exterior_styles": {
"scores": [
{
"name": "georgian",
"probability": 0.956090391
}
],
"status": true
}
}
}
]
}
Sending embeddings to Queue
The endpoint can also calculate image embeddings and send them to RabbitMQ. To enable this, set the
publish_embeddings flag to true.
Messages will be saved using the ListingEmbeddings schema, which contains embeddings for each
image, image ids and listing id.
You can also apply GZIP compression to the embeddings by setting embeddings_compression="gzip", which reduces the
payload size by 2–3 times.
Models
API connects with tf models with the configuration file stored in src/model_config.yaml
Tagger
Multilabel network that tags images with one or more from 101 possible classes, including:
- Room types, like living room or sun room
- Floor, wall, ceiling types and texture
- Interior features like fireplace and kitchen island
- Outdoor features like backyard or tennis court
- General scenes, like view from balcony, drone view or facade
A report with more detailed info can be found here
NOTE: We decided to not show some of our worse performing classes by tweaking their thresholds - only 93 of 101 classes are shown by default. To receive the scores for those classes, tune the thresholds manually.
Exterior Styles Classifier
Multiclass (single-label) classifier that outputs the architectural style of the house based on its image. Total 23 possible classes:
- Condo
- Townhouse
- 20 architectural styles for SFR and apartment buildings
- not_facade for non-fitting images
A report with more detailed info can be found here
Encoder
Receives an image and outputs a vector representation of it. Currently, we use the backbone of the Tagger with a l2 normalization layer on top of it.
Postprocessing
model_config.yaml also names of classes for classifier networks (Tagger, Exterior Styles),
optional thresholds for multilabel network output filtering
(default is 0.5) and optional postprocessing configuration for network's outputs.
Postprocessing includes:
- Deleting specified classes if given class is present. For example, delete wooden_walls if sauna_room tag is present
- Deleting all classes except for specified ones if given class is present. For example, all classes except for exterior are deleted when "drone" tag is present
- Keep a class if other specified class is present, otherwise delete it. For example, keep bathtubs only in bathroom
- Renaming the class if other class is present. For example, rename "pool" to "indoor_pool" if "interior" is present
Currently, we have postprocessing only for Tagger network which works both in /api/predict/tagger & api/predict/process_house_images endpoints:
Example of postprocessing config:
postprocessing_config:
remove_all_keys_except:
- condition: drone
exceptions:
- exterior
- condition: view
exceptions:
- water_view
- exterior
- condition: other
remove_keys_by_condition:
- condition: sauna_room
to_remove:
- wood_texture_floors
- wooden_ceilings
- wooden_walls
rename_key_by_condition:
- to_rename: outdoor_kitchen
new_name: summer_kitchen
condition: interior
keep_only_by_conditions:
- conditions:
- kitchen
- bathroom
to_filter:
- tile_countertops
- granite_countertops
- marble_countertops
After postprocessing we might have classes that aren't in core network output list. In info endpoint one can get network.raw_output_names, which shows the original names of network outputs in original order, and outputs.all_classes, which includes every possible output the service can give (101 classes + "summer_kitchen" for Tagger for now)