[🐛BUG] 负采样未利用显反馈样本 #2066

HowardZJU · 2024-07-17T02:45:34Z

描述这个 bug
以ML-1M数据集为例，评分【1-5】。

生成的稀疏inter矩阵只存储了评分大于threshold的user-item。评分小于threshold的user-item，和未观测的user-item一同设为0。

这种做法没有有效利用显反馈负样本。把显反馈负样本和未观测样本都视作负样本。

问题和诉求

是否可以在训练阶段获取显反馈负样本，即rating<threshold的样本
是否可以在训练阶段同时获取显反馈负样本，以及负采样得到的未观测样本，并有效区分？

如何复现
复现这个 bug 的步骤：
在quick start中，于下列代码打断点观察即可。
train_data, valid_data, test_data = data_preparation(config, dataset)

实验环境：

操作系统: Linux

The text was updated successfully, but these errors were encountered:

HowardZJU · 2024-07-17T05:51:17Z

For example, to address the problems issued, whether it is feasible to change the _set_label_by_threshold(self) function, by setting negative labels to -1?

  def _set_label_by_threshold(self):
      """Generate 0/1 labels according to value of features.

      According to ``config['threshold']``, those rows with value lower than threshold will
      be given negative label, while the other will be given positive label.
      See :doc:`../user_guide/data/data_args` for detail arg setting.

      Note:
          Key of ``config['threshold']`` if a field name.
          This field will be dropped after label generation.
      """
      threshold = self.config["threshold"]
      if threshold is None:
          return

      self.logger.debug(f"Set label by {threshold}.")

      if len(threshold) != 1:
          raise ValueError("Threshold length should be 1.")

      self.set_field_property(
          self.label_field, FeatureType.FLOAT, FeatureSource.INTERACTION, 1
      )
      for field, value in threshold.items():
          if field in self.inter_feat:
              self.inter_feat[self.label_field] = (
                  self.inter_feat[field] >= value
              ).astype(int)
          else:
              raise ValueError(f"Field [{field}] not in inter_feat.")
          if field != self.label_field:
              self._del_col(self.inter_feat, field)

HowardZJU added the bug Something isn't working label Jul 17, 2024

zhengbw0324 assigned BoXiaohe Aug 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[🐛BUG] 负采样未利用显反馈样本 #2066

[🐛BUG] 负采样未利用显反馈样本 #2066

HowardZJU commented Jul 17, 2024 •

edited

Loading

HowardZJU commented Jul 17, 2024 •

edited

Loading

[🐛BUG] 负采样未利用显反馈样本 #2066

[🐛BUG] 负采样未利用显反馈样本 #2066

Comments

HowardZJU commented Jul 17, 2024 • edited Loading

HowardZJU commented Jul 17, 2024 • edited Loading

HowardZJU commented Jul 17, 2024 •

edited

Loading

HowardZJU commented Jul 17, 2024 •

edited

Loading