Datasets

A Customer Interaction Dataset of Clothing of Alibaba

This customer interaction dataset is collected from the TaoBao.com of Alibaba. It is one of the most popular e-commerce websites in China. The data format is in Simplified Chinese (简体中文). This dataset mainly focuses on the product category – ‘‘clothing’’. It records 1,897,339 interactions between 16,091 customers and 923,237 products. The statistics information of the dataset is shown in following table.

Number of Interactions Number of Queries Number of Customers Number of Products Number of Attribute Categories Number of Attributes
1,897,339 139,442 16,091 923,237 99 3,194


In this dataset, each interaction record contains a query, a customer ID, a product ID, a product title and the corresponding attribute set of this product. The attribute sets of this dataset contain both concrete product attributes and abstract product attributes. The attributes in this dataset are annotated manually by fashion and clothing experts. You can download this dataset from the GoogleDrive and BaiDuYun with code (9jqb). In the file, each line includes ‘‘interaction_session_id, customer_id, query, product_id, product_title, add cart or not (Y/N), buy or not (Y/N), attibute_category/attibute’’, which are seperated by ‘\t’.

This dataset is released solely for research purpose. Please cite the following paper if you use this dataset in your research.

  • Xuejiao Zhao, Yong Liu, Yonghui Xu, Yonghua Yang, Xusheng Luo, Chunyan Miao, “Heterogeneous Star Graph Attention Network for Product Attributes Prediction”, Advanced Engineering Informatics, Volume 51, January 2022.

Gowalla Dataset

This dataset was collected from Gowalla, a popular location-based social network, which has more than 600,000 users since November 2010 and was acquired by Facebook in December 2011. In practice, we used the Gowalla APIs to collect the user profiles, user friendship, location profiles, and users’ check-in history made before June 1, 2011. Finally, we have obtained 36,001,959 check-ins made by 319,063 users over 2,844,076 locations. The locations in Gowalla are grouped into 7 main categories, i.e., Community, Entertainment, Food, Nightlife, Outdoors, Shopping and Travel, and each main category consists of several subcategories.

You can download this dataset from here (about 350MB). This dataset is released solely for research purpose. Please cite the following paper if you use this dataset in your research.

  • Yong Liu, Wei Wei, Aixin Sun, Chunyan Miao, “Exploiting Geographical Neighborhood Characteristics for Location Recommendation”, In Proceedings of the 23rd ACM International Conference on Information and Knowledge Management (CIKM’14), pp. 739-748. ACM, 2014.

Weeplaces Dataset

This dataset is collected from Weeplaces, a website that aims to visualize users’ check-in activities in location-based social networks (LBSN). It is now integrated with the APIs of other location-based social networking services, e.g., Facebook Places, Foursquare, and Gowalla. Users can login Weeplaces using their LBSN accounts and connect with their friends in the same LBSN who have also used this application. All the crawled data is originally generated in Foursquare. This dataset contains 7,658,368 check-ins generated by 15,799 users over 971,309 locations. In the data collection, we can’t get the original Foursquare IDs of the Weeplaces users. We can only get their check-in history, their friends who also use Weeplaces, and other additional information about the locations.

You can download this dataset from here (about 140 MB). This dataset is released solely for research purpose.