Graduate students place in top 10 at international data mining competition
July 11, 2019
Competing against 150 teams from 28 countries, two groups from Iowa State took first and eighth place in the Data Mining Cup (DMC).
DMC is an annual competition that encourages students with an interest in data mining to analyze and submit the problem given. This year was the 20th anniversary.
Data Mining is the process of taking raw data and turning it into useful information for the situation at hand. The two teams from Iowa State used a set of data to analyze and assess the information and decide what is useful and which is not.
The Data Mining Cup established a scenario that describes the risk of fraud while self-checking out, such as customers not scanning all of the items in their cart. According to the scenario, about 5% of all self-scan transactions showed discrepancies.
The participants were assigned in March to create a model which will identify scans that display fraudulent or non-fraudulent behavior, regardless if the fraud was committed intentionally or not.
The top ten teams were announced in April, and the top three were invited to Berlin for the awards ceremony at the intelligence summit.
The Iowa State team had two groups, Team 1 and Team 2. Team 2 included Qihao Zhang, Qinglong Tian, Xingche Guo, Yifan Zhu, Haoyan Hu, Gang Han, Haihan Yu, Lijin Zhang, Yueying Wang and Wenting Zhao, who are all graduate students in statistics as well as Shaodong Wang and Zerui Zhang who are graduate students in industrial and manufacturing systems engineering (IMSE).
Team 2 placed first in the competition and took home roughly $2,200.
Team 1, taking home eighth place, included Oscar Aguilar, Kanak Choudhury and Souradeep Chattopadhyay, who are graduate students in statistics, as well as Hanisha Vemireddy, Samira Karimzadeh and Reyhaneh Bijari, graduate students in IMSE.
Qihao Zhang, a graduate student in statistics and part of the first place team, said he took the problem and divided it into sections to simplify the process. Zhang said each member worked on their section of the problem along with their coursework.
“Some of my team members sacrificed their spare time to work on this problem and even after the semester, so I really appreciate that,” Zhang said. “Most of them are really hard-working and fully drawn to this competition.
The two teams took modern multivariate statistical learning with Steve Vardeman, a professor in IMSE and statistics. The students used the competition to fulfill the course requirement.
Vardeman said some students can take the wrong approach with these types of problems and focus more on the programming side of it.
“You have to be doing more than just mindlessly running existing programs,” Vardeman said. “You have to be able to apply sound theory to put methods together in new and appropriate ways. Lots of people that claim to be data miners, all they really know how to do is run some computer programs. You don’t always win–and when you win–win big if that’s what you’re doing.”
Zhang said through previous experience, he was confident his team entered the top ten but he did not expect to place first.
“At that time I was really confident on our model just based on our simulation outcomes, I was almost 100% sure that we entered the top ten,” Zhang said. “But first place, everyone cannot be 100% sure, so this time we were lucky.”
Vardeman said this competition shows that the student understand the programs and the theory behind it which will help with future job opportunities.
“They can use this [win] as a credential that establishes — clearly — that they are more than just people who run programs,” Vardeman said. “But they’re people that can think carefully and sort out really hard problems based on some theory and the evidence is they beat the world. They beat the world, they beat the world clearly– it wasn’t even close.”