Four major crises in machine learning research
Researchers from CMU and Stanford, Zachary C. Lipton and Jacob Steinhardt, published an article for Machine Learning: The Great Debate at ICML, and pointed out the four major crises in machine learning research: confusion and speculation, and uncertainty Reasons for better results, accumulation of mathematical formulas, and language misuse.
The academic community is generally very serious in everyone's impressions, but recently AI Summit ICML held a debate in Stockholm - Machine Learning: The Great Debates (ML-GD), which is dedicated to scholars and researchers to actively explore the field. The current state of technology, bottlenecks and the impact on society.
Zachary C. Lipton, who is known as AI internet public figure, and Stanford researcher Jacob Steinhardt published the paper "Troubling Trends in Machine Learning Scholarship", which was a tweet and sparked a heated discussion.
Perspective background
Machine Learning (ML) researchers are working on the creation and dissemination of knowledge about "data-driven algorithms." According to research, many researchers are eager to achieve the following goals:
Theoretical explanation of learnable content
Deep understanding of experimentally rigorous experiments
Build a working system with high prediction accuracy
While it is subjective to determine which knowledge is worth exploring, once the topic is determined, it is most valuable to the community when it serves the reader, creating the foundation and articulating it as clearly as possible.
What kind of paper is more suitable for readers? We can list the following characteristics: these papers should
(i) provide an intuitive feel to help the reader understand, but should be clearly differentiated from proven strong conclusions;
(ii) an empirical investigation that considers and excludes other assumptions;
(iii) clarify the relationship between theoretical analysis and intuition or experience;
(iv) Use language to help readers understand, choose terminology to avoid misunderstandings or unsubstantiated content, avoid conflicts with other definitions, or be confused with other related but different concepts.
Four major crises in machine learning research
Although machine learning has made some progress recently, these "ideal" states often deviate from reality. In this article, we focus on the following four modes, which seem to be the most popular in ML academic (schoolar-ship):
It is impossible to distinguish between objective explanations and speculations.
It is impossible to determine the reason for the better result. For example, when it is actually because of the fine tuning of the hyperparameters, it is emphasized that it is not necessary to modify the neural network structure.
Mathematical formula stacking: Use confusing mathematical terms without clarification, such as confusing technical and non-technical concepts.
Language misuse, for example, using artistic terms with spoken language, or excessive use of established technical terms.
While the reasons behind these models are uncertain, they can lead to rapid community expansion, insufficient number of reviews, and often unbalanced academic and short-term success metrics such as the number of documents, attention, and entrepreneurial opportunities. While each model provides appropriate remedies (but is not recommended), we will also discuss some speculative suggestions on how the community can respond to these trends.
Defective academic research may mislead the public and hinder academic future research. In fact, many of these problems are circulating in the history of artificial intelligence (and more broadly, in scientific research). In 1976, Drew Mc-Dermott [1] accused the artificial intelligence community of giving up self-discipline and prophesying that "if we can't criticize ourselves, others will help us solve the problem."
Similar discussions occurred repeatedly throughout the 1980s, 1990s, and 2008 [2, 3, 4]. In other areas such as psychology, poor experimental standards weaken people's trust in the authority of the discipline. The current strong trend of machine learning is due to the large number of rigorous research to date, including theoretical research [5, 6, 7] and empirical research [8, 9, 10]. By improving clear-cut scientific thinking and communication, we can maintain the trust and investment that the community currently enjoys.
to sum up
Some people may think that these problems can be improved through self-discipline and self-correction. Although this view is correct, the machine learning community needs to repeatedly discuss how to construct reasonable academic standards to achieve this self-correction.