Among the engineers working in the DX Innovation Center, the core of Serendie, are three young engineers who won the gold medal in the world's largest AI (artificial intelligence) competition platform “Kaggle” in May 2024.
Kaggle is a competition in which companies and research institutes propose data analysis challenges, participants come up with their solutions and the quality of their analysis is evaluated. It is also a gateway to success for data scientists, and achieving a high rank in the competition is regarded as a badge of honor.
In this interview, we talked about how they won the award at Kaggle, their thoughts on Serendie, and the future of data science.
They honed their skills with Kaggle, challenged themselves, and reached the pinnacle of data science
You participated as a team of three instead of as individuals and won the gold medal in Kaggle. What made you decide to form the team?
OkumuraWe had been working in the same workplace before we were assigned to the DX Innovation Center. I joined the new business creation department one year after Shintani and Fukuhara were assigned to it. At that time, each of us participated in Kaggle individually. We all have our own strengths and weaknesses, and we decided to team up because we thought that the three of us could work on this challenge together and have a chance of winning the gold medal.
What were the characteristics of your challenge, “Predict the probability of an applicant defaulting on a loan "?
OkumuraWe assumed that we could aim for the gold medal but that was not because we were good at handling financial data. It was because of the characteristics of the data itself.
FukuharaData have various data formats, such as tabular data with rows and columns, natural language used by people, and images.
ShintaniThe data format we chose for this competition was tabular data. We decided that we could all work on it because data scientists at Mitsubishi Electric have many opportunities to work with machine logs and sensor data, and tabular data is often the subject of analysis in their work.
Please tell us about the difficulties you faced in tackling the challenge.
OkumuraThe host provided us with table data of 10 different types and 400 columns in total, but since we were not experts in financial data, we could not interpret or understand the meaning of some of them.
ShintaniThe data provided also included tables that could not be used in deriving solutions and many meaningless data. Surveying that huge amount of table data and determining which data should be discarded was a challenge. Since the amount of table data was too large, we decided to divide the tasks in order of priority and did our best within the limited time of the competition period of approximately three to four months.
FukuharaLooking back, there were times when we were frustrated because we could not quite get to the root of the solution although we tried and tried.
What did you gain from your experience with Kaggle?
ShintaniYou can teach yourself analysis methods, but in the competition, you have to input the ideas of various people through practice and put them into actual solutions. Kaggle was the perfect place to gain experience and skills in creating that framework.
FukuharaI learned how to approach data analysis by watching them next to me. Mr. Okumura is good at the step-by-step approach. He tries one method, and if it does not work, he goes to the next. It might take time, but he moves forward steadily one at a time. And yet his passion did not diminish at all over the three months. On the other hand, Mr. Shintani experiments with various methods simultaneously. I learned from him how to approach the task of not only trying but also building a solid foundation of a framework to improve the performance.
OkumuraWe certainly have different strengths. Perhaps we were able to win the gold medal because we were different and could complement each other.
Transcend vertical divisions and connect domains with data
As a data scientist, what do you think about the possibilities of Serendie?
OkumuraUntil now, our business domains have been divided vertically. In the future, we can work hand-in-hand across business domains and combine data in a way that has never been possible before. Although there are only a few examples of this, I expect that various solutions will be created by combining them in the future.
ShintaniKaggle is a fascinating competition that allows you to tackle global challenges related to business and society. However, there is no opportunity to combine and analyze domains. Therefore, I believe that we can create new value by combining them through Serendie.
FukuharaAs a prerequisite for handling data, the IT literacy of the data holders in each domain needs to be improved. Otherwise, it will be difficult to achieve something with data scientists. So, from now on, we need to move forward and learn data analysis together with people in each domain. Serendie businesses will be even more valuable if this initiative is integrated into activities that link different domains together.
Could you share a specific use case of Serendie?
FukuharaIn our quality department, there is the routine work of preparing and submitting reports on a regular basis. Reports are prepared using Excel and other tools, but most of them are only reported without any further processes. Even if a report contains useful information, it is rarely used effectively.
OkumuraSo, data scientists joined in April 2024 and replaced the flow for preparing reports in Excel with low-code. We created an application in low-code, added insights into the data accumulated in the reports, and approached and analyzed it from a different perspective to that of the quality department. The output was highly commended.
ShintaniCurrently, the quality department performs data analysis at its own initiative.
FukuharaThis is only carried out by the quality department at the moment. However, the goal is to analyze data in the marketing area as well. That way, we will be able to analyze data across all processes of our business and create a virtuous cycle.
Could you give us another use case?
OkumuraWe deliver, operate, and maintain a large number of machines to factories. These machines produce logs. These logs include data on when alarms triggered a stop their operation. However, some machines stopped by an alarm go back to normal after rebooting them. So, it was difficult to know which alarm required the maintenance team to come to the site for inspection.
ShintaniIn order to reduce unnecessary visits, we used machine learning to analyze machine logs to predict whether or not the maintenance team must go to the site.
FukuharaWe submitted a report on the results which was highly evaluated by people in the factories. We also agreed and are in the process of providing further support. We want to incorporate a model that analyzes alarms and reports predictive results into the maintenance process.
OkumuraAlthough this is not a project combining domains, it can be one of the Serendie projects as a circular project that collects and improves data.
New value and new opportunities. Serendie expands future possibilities
What would you like to achieve through Serendie?
Shintani“Serendie” comes from serendipity. It means the discovery of new value, e.g. through an accidental encounter, which is often the catalyst for scientific discovery. As a data scientist, I want to make amazing discoveries and deliver new results.
FukuharaIn January 2025, we plan to open a co-creation space called Serendie Street YIMP (Yokohama i-Mark Place). I am not very good at communicating, but I would like to talk to various people through Serendie and overcome my own limitations.
In addition, I would like to communicate with various people both inside and outside the company who visit Serendie Street, including the Serendie Street YDB (Yokohama Dia Building), which opened in March 2024, and to develop ideas for combining data.
OkumuraI want to create a world where everyone can work data-driven. People often talk about AI and the democratization of data, but one of the missions of Mitsubishi Electric is to make data easy for anyone to use. If this could be expanded at an accelerated rate, it would be possible to provide useful data for making management decisions.
Finally, what are your thoughts on the future of data science?
ShintaniIt would be ideal to create a world where the real world, products, and data are connected, circulated, and improved. Serendie is one of the answers we have to make the world a better place. I would like to contribute by having Serendie be used throughout the company and increasing the number of people who can analyze data.
FukuharaCurrently, my main activity is to use existing data to generate new insights. However, in the near future, I would like to be able to plan from data acquisition and analyze data while being aware of what kind of data we need to create a virtuous cycle. This is one of the roles of data scientists in promoting Serendie projects.
OkumuraEven if mechanization and automation through AI progress rapidly, it is important, especially for manufacturers, to be able to explain the theoretical background to this. On the other hand, in the future, I would like to create a mechanism that will go beyond the current individual optimization and produce some generic results, which will be possible as long as we have the necessary data. I hope that then we will be able to devote more time to advanced tasks that only humans can perform.