eBay groups its products into catalogue trees (ie. taxonomies of product categories, where more general categories come near the root note and near the leaves of the tree are located more specific ones; products are encoded in the leaves) for the various languages they operate in. The challenge was aligning and unifying multiple descriptions in two languages for items and products from the e-commerce domain, both from a document perspective (by locating comparable descriptions within a specific product category) as well as from a sentence/token perspective (buy determining the alignments and gaps of comparable documents).
ADAPT developed a system that trains a model to encode products (represented by product title and associated aspect values) as vectors. This model generalises the relations between the products so that products that are similar have a smaller (vectorial) distance while dissimilar products have a larger distance. Next, this model is used to test the similarity of a new product with the ones used during training and identify the most similar product. The system exploits machine translation methods during training and testing in order to match a product in one language (L1) to a product in another language (L2).
The data that was collected consisted of items in English with a UPC (unique product code) number for which an item in German with the same UPC number exists and vice versa. However, while the collected data covered a huge amount of items and various categories, we focused on three meta categories: home and gardening, toys, cameras and photo unique items in English though the solution could easily be applied to other categories.
A software application prototype and code to match products linked to one particular language-specific catalogue (eg. language A) to products linked to one different catalogue (eg. language B).