How to load Steam dataset in python code like the ml-100k dataset? #792

alanjacob10 · 2021-04-05T00:35:56Z

alanjacob10
Apr 5, 2021

Hi,
I used the code:
from recbole.quick_start import run_recbole
run_recbole(dataset='ml-100k', model='BPR')

But was wondering if I could change the dataset name "ml-100k" to the Steam dataset you have on RecBole. What should I write instead of "ml-100k" to get the steam dataset? Does it have a specific name?

My goal is to test various algorithms on this dataset but I can't proceed because I can't load the Steam dataset.

ps. I'm relatively new to Python.

Sincerely,
Alan

Answered by 2017pxy

Apr 9, 2021

Let me make a summary to this question.
If you want to load a new dataset to run models, you can follow these steps:

Prepare your dataset files:
In RecBole, we have a default dataset: ml-100k. If you want to use other dataset, you need to prepare your data and convert the raw data into Atomic Files(About Atomic Files, here is the docs). By the way, we have prepared some popular datasets and you can download the atomic files of these datasets from our Google Drive or Baidu Wangpan. Then, create a folder called MyDataset and organize the file structure like:

-MyDataset
    -DataA
        -DataA.inter
        -DataA.item
        - ......
    -DataB
        -DataB.inter
    ........

Set c…

View full answer

2017pxy · 2021-04-05T05:10:11Z

2017pxy
Apr 5, 2021
Maintainer

@alanjacob10 Hi, you need to download the steam dataset, and then change the config.

About the dataset, you can get it from Our Google Drive. Here I recommand you to create a folder call dataset and you can put the steam dataset into this folder. The structure should be like:

-Dataset
    -steam
        -steam.inter
        -stram.item

About the config setting, we support three way to set config(by command line, config files and parameter dicts). I will take the parameter dicts as an example:

from recbole.quick_start import run_recbole

parameter_dict = {
    'data_path': "The file path of the Dataset"
    'model': BPR
    'dataset': steam
    'load_col’:
        inter: [ ]
    ......
}
run_recbole(config_dict=parameter_dict)

For more infomation about our config settings, you can read our docs.

2 replies

alanjacob10 Apr 5, 2021
Author

Hi,
Thank you for the reply. I have downloaded the steam dataset from the drive but unfortunately I still have a problem with the code.

It seems that BPR won't load. I also tried
from recbole.model.general_recommender import BPR

but it didn't solve it.
I don't know if it will load the dataset. I used the "Steam_Merged" and set up the folders as you instructed.

ps. I'm using PyCharm and have installed recbole in the program.

Sincerely,
Alan

2017pxy Apr 5, 2021
Maintainer

@alanjacob10 Oh! I am sorry. I made a mistake. Actually, model and dataset can not be set in parameter_dict, and each parameter should be split by ,. The correct code should be:

from recbole.quick_start import run_recbole

parameter_dict = {
    'data_path': "The file path of the Dataset",
    'load_col':
        inter: [] #this should not be empty
    ......
}
run_recbole(model='BPR', dataset='steam' config_dict=parameter_dict)

By the way, the setting of load_col depends on the dataset and model, and it decides which column you want to load. For example, in ml-100k dataset, the ml-100k.inter has 4 columns (user_id, item_id, rating, timestamp), if you only want to load user_id and item_id, the setting should be like:

 'load_col':
        inter: [user_id, item_id]

So I suggest you to check the header of steam.inter and set it by yourself.

Finally, sorry again for my carelessness. Please let me know if you have any question.

alanjacob10 · 2021-04-05T23:21:56Z

alanjacob10
Apr 5, 2021
Author

Hi again,
Appreciate your kind response, but unfortunately I'm still struggling with the code. I did as you instructed and also found the column names from the "steam_inter" file, where I input user_id, product_id.
And here was the result:

I don't know if i need to import something else from Pycharm other than recbole or what the problem is.

Sincerely,
Alan

10 replies

alanjacob10 Apr 6, 2021
Author

yes, it is as follows:

-dataset
-steam
-steam.inter
-stram.item

Thanks,
Alan

tsotfsk Apr 6, 2021

It seems that the message */dataset/steam/steam in error is different form what you set. Do you set up any other config files?

alanjacob10 Apr 6, 2021
Author

Hi,
I don't know any config files. It should be the only one set up. Do you know this error? it mentions pandas...

Regards,
Alan

alanjacob10 Apr 7, 2021
Author

@2017pxy

tsotfsk Apr 7, 2021

Hi ! @alanjacob10 , Maybe you can set up a config file named steam.yaml which contains the following three lines

load_col:  {inter: [user_id, product_id]}
USER_ID_FIELD: user_id
ITEM_ID_FIELD: product_id

Then try run your model by

python run_recbole.py --config_files steam.yaml --dataset steam --model BPR

alanjacob10 · 2021-04-07T13:17:27Z

alanjacob10
Apr 7, 2021
Author

?

7 replies

2017pxy Apr 11, 2021
Maintainer

@alanjacob10 Hi, if you want to load the column from you file, you only need to change your load_col. We don't have PLAY_HOURS_FIELD in our config settings. For example, if you want to load play_hours, your load_col should be:

'load_col': {'inter': ['user_id', 'product_id', 'play_hours']}

By the way, for general recommendation models like BPR, they will only use user_id and item_id in training phase. That's why USER_ID_FIELD and ITEM_ID_FIELD should be set base on your dataset. You need to tell the model which columns are USER_ID_FIELD and ITEM_ID_FIELD.

For other information, like play_hours in steam dataset, is viewed as context information. If you want to use context information to train the models in RecBole, you should use context-aware recommendation models like LR and FM. For more information about the models supported in RecBole, please check our model docs.

alanjacob10 Apr 17, 2021
Author

One follow-up question to this.. According to your documentation, context-aware recommendation models requires:
.inter , .user and .item file.
Does all three files need to be there to run the algorithm??

And how do I add these to load col. ?

from recbole.quick_start import run_recbole

parameter_dict = {
'data_path': "/Users/alanolewnik/Documents/PyCharm/Recommender_new/dataset",
'load_col': {'inter': ['user_id', 'product_id', 'timestamp', 'play_hours']},
'USER_ID_FIELD': 'user_id',
'ITEM_ID_FIELD': 'product_id'
}

run_recbole(model='DeepFM', dataset='steam', config_dict=parameter_dict)

Sincerely,
Alan

hyp1231 Apr 19, 2021
Maintainer

One follow-up question to this.. According to your documentation, context-aware recommendation models requires:
.inter , .user and .item file.
Does all three files need to be there to run the algorithm??

And how do I add these to load col. ?

from recbole.quick_start import run_recbole

parameter_dict = {
'data_path': "/Users/alanolewnik/Documents/PyCharm/Recommender_new/dataset",
'load_col': {'inter': ['user_id', 'product_id', 'timestamp', 'play_hours']},
'USER_ID_FIELD': 'user_id',
'ITEM_ID_FIELD': 'product_id'
}

run_recbole(model='DeepFM', dataset='steam', config_dict=parameter_dict)

Sincerely,
Alan

Hi Alan, to run a context-aware recommendation model, .inter is required, while .user and .item are optional.

Could you please point out the detail urls about the misleading docs? So that we will fix the descriptions, thanks!

alanjacob10 Apr 22, 2021
Author

Okay thanks. It says in this box.. (see attached):

https://recbole.io/docs/user_guide/data/atomic_files.html

Sincerely,
Alan

hyp1231 Apr 22, 2021
Maintainer

Got it, thanks so much. It's indeed misleading description. We'll consider improving this part.

2017pxy · 2021-04-09T09:08:31Z

2017pxy
Apr 9, 2021
Maintainer

Let me make a summary to this question.
If you want to load a new dataset to run models, you can follow these steps:

Prepare your dataset files:
In RecBole, we have a default dataset: ml-100k. If you want to use other dataset, you need to prepare your data and convert the raw data into Atomic Files(About Atomic Files, here is the docs). By the way, we have prepared some popular datasets and you can download the atomic files of these datasets from our Google Drive or Baidu Wangpan. Then, create a folder called MyDataset and organize the file structure like:

-MyDataset
    -DataA
        -DataA.inter
        -DataA.item
        - ......
    -DataB
        -DataB.inter
    ........

Set config:
If you load a new dataset, the default config settings may need to be changed, so you need to reset the config by yourself. Before you do this, I strongely recommend you to read our config setting docs and data args docs first, or you may face lots of problems.
About the config setting, here are some settings you may need to change:

data_path: The path of "MyDataset" (mentioned before)
load_col: Decide which file and column you want to load;
USER_ID_FIELD: Field name of user ID feature
ITEM_ID_FIELD: Field name of item ID feature
RATING_FIELD: Field name of rating feature
TIME_FIELD: Field name of timestamp feature

Load settings and run the model:
In RecBole, we support three way to load settings: config files, parameter dicts and command line. You can read our config setting docs to find the details and examples about loading the settings. And then, you can simply run the model with RecBole and finish your research.

0 replies

Faisalse · 2023-09-29T16:07:16Z

Faisalse
Sep 29, 2023

Hi Guys,
In the RecBole framework, can we use already train-test split files to tune or train the models?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to load Steam dataset in python code like the ml-100k dataset? #792

{{title}}

Replies: 5 comments 19 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

How to load Steam dataset in python code like the ml-100k dataset? #792

alanjacob10 Apr 5, 2021

Replies: 5 comments · 19 replies

2017pxy Apr 5, 2021 Maintainer

alanjacob10 Apr 5, 2021 Author

2017pxy Apr 5, 2021 Maintainer

alanjacob10 Apr 5, 2021 Author

alanjacob10 Apr 6, 2021 Author

tsotfsk Apr 6, 2021

alanjacob10 Apr 6, 2021 Author

alanjacob10 Apr 7, 2021 Author

tsotfsk Apr 7, 2021

alanjacob10 Apr 7, 2021 Author

2017pxy Apr 11, 2021 Maintainer

alanjacob10 Apr 17, 2021 Author

hyp1231 Apr 19, 2021 Maintainer

alanjacob10 Apr 22, 2021 Author

hyp1231 Apr 22, 2021 Maintainer

2017pxy Apr 9, 2021 Maintainer

Faisalse Sep 29, 2023

alanjacob10
Apr 5, 2021

Replies: 5 comments 19 replies

2017pxy
Apr 5, 2021
Maintainer

alanjacob10 Apr 5, 2021
Author

2017pxy Apr 5, 2021
Maintainer

alanjacob10
Apr 5, 2021
Author

alanjacob10 Apr 6, 2021
Author

alanjacob10 Apr 6, 2021
Author

alanjacob10 Apr 7, 2021
Author

alanjacob10
Apr 7, 2021
Author

2017pxy Apr 11, 2021
Maintainer

alanjacob10 Apr 17, 2021
Author

hyp1231 Apr 19, 2021
Maintainer

alanjacob10 Apr 22, 2021
Author

hyp1231 Apr 22, 2021
Maintainer

2017pxy
Apr 9, 2021
Maintainer

Faisalse
Sep 29, 2023