Mingyuan Zhang's Page

Mingyuan (William) Zhang
张明远

About Me

I am currently working in the tech industry on machine learning, artificial intelligence, large language models, and recommendation systems. I obtained a Ph.D. in Computer and Information Science at the University of Pennsylvania, focusing on Machine Learning and Artificial Intelligence. Before joining Penn, I received a B.S. degree from the University of Michigan in 2018 with four majors: (Honors) Mathematics, (Honors) Statistics, Computer Science and Data Science.

"Think big, start small, learn fast.
Seek progress, not perfection."

Education

University of Pennsylvania

Ph.D. in Computer and Information Science

2018 - 2024

University of Michigan

B.S. in Honors Mathematics, Honors Statistics, Computer Science, and Data Science

2013 - 2018

Honors and Awards

• Top Reviewer, NeurIPS, 2024

• Outstanding Achievement in Mathematics Awards, University of Michigan, 2017, 2018

• James B. Angell Scholar, University of Michigan, 2015, 2017, 2018

• William J. Branstrom Freshman Prize, University of Michigan, 2014

• University Honors, University of Michigan, 2013 - 2018

Professional Service

Reviewer

• Journal of Machine Learning Research

• IEEE Transactions on Pattern Analysis and Machine Intelligence

• NeurIPS (2021, 2022, 2023, 2024)

• ICLR (2022, 2023)

• AISTATS (2024)

Teaching

University of Pennsylvania

• Head Teaching Assistant for CIS 520, a graduate level machine learning course.

Spring 2020, Spring 2021, Spring 2022

University of Michigan

• Grader for various linear algebra and probability courses.

2015 - 2018

• Tutor for MATH 217, an introductory linear algebra course.

2015

Research

Learning from Noisy Labels and Weakly Supervised Learning

Noisy labels can occur due to various reasons such as errors in data collection, human error in annotating data, or mislabeling due to subjective or ambiguous definitions. The main challenge in learning from noisy labels is to design algorithms that can learn good classifiers, despite being given noisy training data. We study how to design good algorithms to learn from noisy labels for multiclass and multi-label learning problems.

We are also interested in a more general learning scheme beyond learning from noisy labels, specifically, weakly supervised learning. Our focus is on learning from missing or partial labels, and transfer learning.

Consistent Multi‑Label Classification from Noisy Labels.
Mingyuan Zhang, Shivani Agarwal.
Manuscript, 2024.
Paper

[5] Multiclass Learning from Noisy Labels for Non-decomposable Performance Measures.
Mingyuan Zhang, Shivani Agarwal.
In Proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AISTATS), 2024.
Paper Link

[4] Foreseeing the Benefits of Incidental Supervision.
Hangfeng He, Mingyuan Zhang, Qiang Ning, Dan Roth.
In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021.
Oral paper.
Paper Link

[3] Learning from Noisy Labels with No Change to the Training Process.
Mingyuan Zhang, Jane Lee, Shivani Agarwal.
In Proceedings of the 38th International Conference on Machine Learning (ICML), 2021.
Paper Link

Multi-Label Classification and Label Ranking

In multi-label classification, each instance can be associated with multiple labels/tags simultaneously. A good example is image tagging, where several tags can be active in the same image. We study how to design good multi-label learning algorithms for general multi-label losses (including the Hamming loss, precision, recall and the multi-label F-measure) in various settings including the standard setting, learning from noisy labels, and learning from partial/missing labels.

Label ranking is a prediction task where the goal is to map instances to rankings over a finite set of predefined labels/tags. We study the design of effective label ranking learning algorithms for a variety of label ranking losses. These include, for example, the pairwise disagreement loss, the (normalized) discounted cumulative gain, precision, and average precision. Our study includes both the standard and online settings.

Convex Calibrated Output Coding Surrogates for Low-Rank Loss Matrices, with Applications to Multi-Label Learning.
Harish G. Ramaswamy*, Mingyuan Zhang*, Shivani Agarwal, Robert C. Williamson.
In preparation, 2025.

Consistent Multi‑Label Classification from Noisy Labels.
Mingyuan Zhang, Shivani Agarwal.
Manuscript, 2024.
Paper

On the Minimax Regret in Online Ranking with Top-k Feedback.
Mingyuan Zhang, Ambuj Tewari.
Preprint, 2023.
Paper Link

[1] Convex Calibrated Surrogates for the Multi-Label F-Measure.
Mingyuan Zhang, Harish G. Ramaswamy, Shivani Agarwal.
In Proceedings of the 37th International Conference on Machine Learning (ICML), 2020.
Paper Link

Complex (Non‑decomposable) Performance Measures

Unlike 0-1 loss (accuracy) or cost-sensitive losses, non-decomposable performance measures cannot be expressed as the expectation or sum of a loss on individual examples; such performance measures are defined by general (usually nonlinear) functions of the confusion matrix of a classifier. These performance measures are often used in class imbalance settings and information retrieval. Examples include Micro F1 score, Jaccard measure, H-mean, G-mean, Q-mean, AUC-ROC, and AUC-PR. We study how to design learning algorithms to optimize non-decomposable performance measures.

Multiclass Classification

Multiclass classification (which includes binary classification) is a classic supervised machine learning task. We study how to design good multiclass learning algorithms for general multiclass losses in various settings including the standard setting, learning with a restricted function class, and learning from noisy labels.

[2] Bayes Consistency vs. H-Consistency: The Interplay between Surrogate Loss Functions and the Scoring Function Class.
Mingyuan Zhang, Shivani Agarwal.
In Advances in Neural Information Processing Systems (NeurIPS), 2020.
Spotlight paper.
Paper Link

Mingyuan (William) Zhang张明远

University of Pennsylvania

Ph.D. in Computer and Information Science

2018 - 2024

University of Michigan

B.S. in Honors Mathematics, Honors Statistics, Computer Science, and Data Science

2013 - 2018

• Top Reviewer, NeurIPS, 2024

• Outstanding Achievement in Mathematics Awards, University of Michigan, 2017, 2018

• James B. Angell Scholar, University of Michigan, 2015, 2017, 2018

• William J. Branstrom Freshman Prize, University of Michigan, 2014

• University Honors, University of Michigan, 2013 - 2018

Reviewer

• Journal of Machine Learning Research

• IEEE Transactions on Pattern Analysis and Machine Intelligence

• NeurIPS (2021, 2022, 2023, 2024)

• ICLR (2022, 2023)

• AISTATS (2024)

University of Pennsylvania

• Head Teaching Assistant for CIS 520, a graduate level machine learning course.

Spring 2020, Spring 2021, Spring 2022

University of Michigan

• Grader for various linear algebra and probability courses.

2015 - 2018

• Tutor for MATH 217, an introductory linear algebra course.

2015

Research

Learning from Noisy Labels and Weakly Supervised Learning

We are also interested in a more general learning scheme beyond learning from noisy labels, specifically, weakly supervised learning. Our focus is on learning from missing or partial labels, and transfer learning.

Multi-Label Classification and Label Ranking

Complex (Non‑decomposable) Performance Measures

Multiclass Classification

Mingyuan (William) Zhang
张明远