=======================================
A Collection of Fashion Outfit Datasets
=======================================

Introduction
============

Welcome to our guide on utilizing the fashion outfit datasets for recommendations. We differentiate between non-personalized and personalized datasets, with the latter linking each outfit to a specific user for tailored recommendations. Essentially, non-personalized recommendations can be viewed as a special case of personalized recommendations with a single universal user. Imagine an outfit as a collection of :math:`n` items from diverse categories, represented as :math:`\mathcal{O}=\{x_1, \ldots, x_n\}`, where each outfit is associated with a user index :math:`u` from the set :math:`\mathcal{U}=\{1, \ldots, m\}`. This document outlines the dataset's structure and demonstrates how to efficiently load and use it.

Data Encoding
=============

Fashion items
-------------
Fashion items are categorized into different types, such as tops, bottoms, shoes, and accessories. We use ``item_list`` to store a map for items across categories, where ``item_list[c]`` encompasses items within the :math:`c`-th category. Each item's key within ``item_list[c]`` is essential for data loading.

.. code-block:: python

   item_list = [
      [item_key_1_1, item_key_1_2, ...], # category 1
      [item_key_2_1, item_key_2_2, ...], # category 2
      # ...
      [item_key_n_1, item_key_n_2, ...]  # category n
   ]

With ``item_list``, we can encode each item as a tuple: :math:`(c, n)`, where :math:`c` is the category index and :math:`n` is the item index within the category. This abstraction allows to generate new outfits efficiently. ``item_list`` is usually used globally across different splits.

Fashion outfits
---------------

Outfits are encoded as follows:

.. code-block:: python

   data = [uid, size, *items, *types]

- ``uid``: Interger. User index.
- ``size``: Interger. Outfit length.
- ``items``: List of intergers. Indexes of items. 
- ``types``: List of intergers. Categories of items.


``items`` and ``types`` are set to a predefined ``max_size``. If an outfit's size is below ``max_size``, we append ``-1`` to indicate absent items. For example, ``[0, 3, 1, 4, 6, -1, 0, 1, 2, -1]`` represents an outfit with user index ``0``, size ``3``, and items :math:`(i_{01}, i_{14}, i_{26})` from categories 0, 1, 2, respectively. 

To find the corresponding unique key for each item, we can use the following code:

.. code-block:: python
   
   max_size = len(outfit) // 2 - 1
   uid, size, outfit = data[:2], data[2:]
   items, types = outfit[:max_size], outfit[max_size:]
   for i in range(size):
      c = types[i]
      n = items[i]
      print(f"Category: {c}, Item key: {item_list[c][n]}")

With such encoding, each split is simply an array:

.. code-block:: python

   train_data # shape (n_train, max_size*2+2)
   val_data   # shape (n_val,   max_size*2+2)
   test_data  # shape (n_test,  max_size*2+2)

And it is the basic format for outfit generation.

Outfit Generation
=================

.. currentmodule:: outfit_datasets.generator

Given the above tuple format, we can easily generate new tuples with different subclasses of :class:`Generator`:

- ``generator = Generator(init_data=None, **kwargs)``: the interface for all generators.
- ``generator(input_data)``: generate tuples with given input data.

Supported types of generators are:

- :class:`Fix` always returns ``init_data``, ingore the ``input_data`` during each call.
- :class:`Identity` always returns ``input_data``.
- :class:`RandomMix` returns randomly mixed tuples of ``input_data``.
- :class:`RandomReplace` randomly replace :math:`k` items in ``input_data``.
- :class:`FITB` randomly replace one item in ``input_data`` for FITB task.

Outfit Datadaset
----------------
.. currentmodule:: outfit_datasets


Basic Dataset
-------------


The :class:`BaseOutfitData` defines different outfit data.
For all datasets, we have the positive tuples and the negative tuples if existed.


- Datum: Given a key, return the data of the corresponding item.
- positive tuples: The :math:`n\times m` array for outfits. Required for all dataset.
- negative tuples: The :math:`n\times m` array for outfits. Optional.
- positive data mode: how to generate positive data, usually fixed.
- positive data param: configuration for specific mode.
- negative data mode: how to generate negative data, usually randomly mixed.
- negative data param: configuration for specific mode.


Outfit DataLoader
-----------------


A high-level configuration for outfit data. We introduce :class:`OutfitLoader` as a high-level implementation for outfit data.


.. toctree::
   :hidden:
   :maxdepth: 2
   :caption: Introduction

   intro

.. toctree::
   :hidden:
   :maxdepth: 2
   :caption: Outfit Datasets

   iqon_3000
   maryland_polyvore
   polyvore_outfits
   polyvore_u
   shift15m


.. toctree::
   :hidden:
   :maxdepth: 2
   :caption: Modules

   modules