Pinterest relies heavily on AI to achieve its core business goal of inspiring people to create a life they love. For anyone who has used the Lens feature on Pinterest, using the camera on a mobile phone to search for objects within images is a standout feature and ready-set for commerce.
Pinterest has highly differentiated AI. There are two difficult AI challenges that Pinterest tackles – visual experience and context and personalization.
A novel way of creating context
People often can’t describe what they want but they know it when they see it. Pinterest has invested heavily in their own unique AI to optimize visual search. With computer vision technology, the AI “sees” the objects that people pin rather than relying on text descriptions.
But it’s not AI working on its own. The curation of Pinterest is a collaborative exercise between human and machine. Most pins are handpicked and organized by people who then save that same image to their own boards, and therefore to other ideas and concepts. Pinterest calls this body of knowledge the “taste graph.”
What matters about this process is that the AI is given much more information about how humans contextualize information. Take an image of Machu Picchu, which may be pinned by one person as “bucket list” but by others as concepts as broad as “mystical,” “stone steps,” or “llamas.” The sophistication of this context setting is unique in our research and approaches a level of commonsense reasoning that is a bit of a “holy grail” in AI.
What this taste graph also represents is the future – the intent of people to work on something. As AI researchers, we think this is particularly important because this represents a prediction – one that is valuable to advertisers. The obvious limitation is therefore visual. Pinterest AI researchers note in a research paper that engagement on text-based content, such as quotes or fitness planning, is only around 25% of the engagement that occurs on highly visual content such as tattoos or art. Given that Pinterest is looking to expand into other verticals such as technology and financial services, we will be looking to see if they can pull off as much impressive AI development in more text-heavy fields.
Dealing with subjectivity in a clever way
Another application of AI at Pinterest that is interesting to understand is the company’s “complete the look” recommendation system. Modelling fashion compatibility is challenging due to its complexity and subjectivity. The AI uses attention mechanisms – a relatively new AI component that forms part of their convolutional neural networks – that learns scene-product compatibility by essentially focusing in on a part of an image or scene, rather than the whole scene. The company’s AI researchers show a visualization of this in their paper and it’s fascinating to see what the AI thinks is important for the “complete the look” feature. The attention AI tends to ignore faces which means the AI discovers that what a person is wearing on one specific part of their body is more relevant than who they are or the entire outfit. The visualizations are interesting in themselves – our human interpretations of what might “complete a look” seem far from what the attention AI tells us is the case.
Figure: visualization attention maps (A) versus saliency maps (S). Saliency is a direct relevance measure while attention detects the compatibility between scene and product.
Impressive scale that demonstrates AI’s superpowers
Pinterest is essentially a visual search and recommendation engine. This means that one of the major challenges is performance at scale. This is another realm where Pinterest is quietly impressive. In a paper released in 2018, researchers describe impressive increases in engagement – across 3 billion nodes and 17 billion edges, a new system specifically designed to provide instantaneous visual personalization at scale, increased engagement by 50%. This system now powers 80% of user engagement.
In the world of AI, context and intent are very difficult for machines to learn. What Pinterest has achieved is impressive because they are unique in their ability to serve up a personalized and contextually relevant visual experience, at scale and at speed.