Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. HLE consists of 2,700 questions across dozens of subjects, including mathematics, humanities, and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable, but cannot be quickly answered via internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a significant gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai.
Humanity's Last Exam
Phan, Long; Gatti, Alice; Han, Ziwen; Li, Nathaniel; Hu, Josephina; Zhang, Hugh; Zhang, Chen Bo Calvin; Shaaban, Mohamed; Ling, John; Shi, Sean; Choi, Michael; Agrawal, Anish; Chopra, Arnav; Khoja, Adam; Kim, Ryan; Ren, Richard; Hausenloy, Jason; Zhang, Oliver; Mazeika, Mantas; Dodonov, Dmitry; Nguyen, Tung; Lee, Jaeho; Anderson, Daron; Doroshenko, Mikhail; Stokes, Alun Cennyth; Mahmood, Mobeen; Pokutnyi, Oleksandr; Iskra, Oleg; Wang, Jessica P.; Levin, John-Clark; Kazakov, Mstyslav; Feng, Fiona; Feng, Steven Y.; Zhao, Haoran; Yu, Michael; Gangal, Varun; Zou, Chelsea; Wang, Zihan; Popov, Serguei; Gerbicz, Robert; Galgon, Geoff; Schmitt, Johannes; Yeadon, Will; Lee, Yongki; Sauers, Scott; Sanchez, Alvaro; Giska, Fabian; Roth, Marc; Riis, Søren; Utpala, Saiteja; Burns, Noah; Goshu, Gashaw M.; Naiya, Mohinder Maheshbhai; Agu, Chidozie; Giboney, Zachary; Cheatom, Antrell; Fournier-Facio, Francesco; Crowson, Sarah-Jane; Finke, Lennart; Cheng, Zerui; Zampese, Jennifer; Hoerr, Ryan G.; Nandor, Mark; Park, Hyunwoo; Gehrunger, Tim; Cai, Jiaqi; McCarty, Ben; Garretson, Alexis C; Taylor, Edwin; Sileo, Damien; Ren, Qiuyu; Qazi, Usman; Li, Lianghui; Nam, Jungbae; Wydallis, John B.; Arkhipov, Pavel; Shi, Jack Wei Lun; Bacho, Aras; Willcocks, Chris G.; Cao, Hangrui; Motwani, Sumeet; Santos, Emily de Oliveira; Veith, Johannes; Vendrow, Edward; Cojoc, Doru; Zenitani, Kengo; Robinson, Joshua; Tang, Longke; Li, Yuqi; Vendrow, Joshua; Fraga, Natanael Wildner; Kuchkin, Vladyslav; Maksimov, Andrey Pupasov; Marion, Pierre; Efremov, Denis; Lynch, Jayson; Liang, Kaiqu; Mikov, Aleksandar; Gritsevskiy, Andrew; Guillod, Julien; Demir, Gözdenur; Martinez, Dakotah; Pageler, Ben; Zhou, Kevin; Soori, Saeed; Press, Ori; Tang, Henry; Rissone, Paolo; Green, Sean R.; Brüssel, Lina; Twayana, Moon; Dieuleveut, Aymeric; Imperial, Joseph Marvin; Prabhu, Ameya; Yang, Jinzhou; Crispino, Nick; Rao, Arun; Zvonkine, Dimitri; Loiseau, Gabriel; Kalinin, Mikhail; Lukas, Marco; Manolescu, Ciprian; Stambaugh, Nate; Mishra, Subrata; Hogg, Tad; Bosio, Carlo; Coppola, Brian P; Salazar, Julian; Jin, Jaehyeok; Sayous, Rafael; Ivanov, Stefan; Schwaller, Philippe; Senthilkuma, Shaipranesh; Bran, Andres M; Algaba, Andres; Houte, Kelsey Van den; Van Der Sypt, Lynn; Verbeken, Brecht; Noever, David; Kopylov, Alexei; Myklebust, Benjamin; Li, Bikun; Schut, Lisa; Zheltonozhskii, Evgenii; Yuan, Qiaochu; Lim, Derek; Stanley, Richard; Yang, Tong; Maar, John; Wykowski, Julian; Oller, Martí; Sahu, Anmol; Ardito, Cesare Giulio; Hu, Yuzheng; Kamdoum, Ariel Ghislain Kemogne; Jin, Alvin; Vilchis, Tobias Garcia; Zu, Yuexuan; Lackner, Martin; Koppel, James; Sun, Gongbo; Antonenko, Daniil S.; Chern, Steffi; Zhao, Bingchen; Arsene, Pierrot; Cavanagh, Joseph M; Li, Daofeng; Shen, Jiawei; Crisostomi, Donato; Zhang, Wenjin; Dehghan, Ali; Ivanov, Sergey; Perrella, David; Kaparov, Nurdin; Zang, Allen; Sucholutsky, Ilia; Kharlamova, Arina; Orel, Daniil; Poritski, Vladislav; Ben-David, Shalev; Berger, Zachary; Whitfill, Parker; Foster, Michael; Munro, Daniel; Ho, Linh; Sivarajan, Shankar; Hava, Dan Bar; Kuchkin, Aleksey; Holmes, David; Rodriguez-Romero, Alexandra; Sommerhage, Frank; Zhang, Anji; Moat, Richard; Schneider, Keith; Kazibwe, Zakayo; Clarke, Don; Kim, Dae Hyun; Dias, Felipe Meneguitti; Fish, Sara; Elser, Veit; Kreiman, Tobias; Vilchis, Victor Efren Guadarrama; Klose, Immo; Anantheswaran, Ujjwala; Zweiger, Adam; Rawal, Kaivalya; Li, Jeffery; Nguyen, Jeremy; Daans, Nicolas; Heidinger, Haline; Radionov, Maksim; Rozhoň, Václav; Ginis, Vincent; Stump, Christian; Cohen, Niv; Poświata, Rafał; Tkadlec, Josef; Goldfarb, Alan; Wang, Chenguang; Padlewski, Piotr; Barzowski, Stanislaw; Montgomery, Kyle; Stendall, Ryan; Tucker-Foltz, Jamie; Stade, Jack; Rogers, T. Ryan; Goertzen, Tom; Grabb, Declan; Shukla, Abhishek; Givré, Alan; Ambay, John Arnold; Sen, Archan; Aziz, Muhammad Fayez; Inlow, Mark H; He, Hao; Zhang, Ling; Kaddar, Younesse; Ängquist, Ivar; Chen, Yanxu; Wang, Harrison K; Ramakrishnan, Kalyan; Thornley, Elliott; Terpin, Antonio; Schoelkopf, Hailey; Zheng, Eric; Carmi, Avishy; Brown, Ethan D. L.; Zhu, Kelin; Bartolo, Max; Wheeler, Richard; Stehberger, Martin; Bradshaw, Peter; Heimonen, JP; Sridhar, Kaustubh; Akov, Ido; Sandlin, Jennifer; Makarychev, Yury; Tam, Joanna; Hoang, Hieu; Cunningham, David M.; Goryachev, Vladimir; Patramanis, Demosthenes; Krause, Michael; Redenti, Andrew; Aldous, David; Lai, Jesyin; Coleman, Shannon; Xu, Jiangnan; Lee, Sangwon; Magoulas, Ilias; Zhao, Sandy; Tang, Ning; Cohen, Michael K.; Paradise, Orr; Kirchner, Jan Hendrik; Ovchynnikov, Maksym; Matos, Jason O.; Shenoy, Adithya; Wang, Michael; Nie, Yuzhou; Sztyber-Betley, Anna; Faraboschi, Paolo; Riblet, Robin; Crozier, Jonathan; Halasyamani, Shiv; Verma, Shreyas; Joshi, Prashant; Meril, Eli; Ma, Ziqiao; Andréoletti, Jérémy; Singhal, Raghav; Platnick, Jacob; Nevirkovets, Volodymyr; Basler, Luke; Ivanov, Alexander; Khoury, Seri; Gustafsson, Nils; Piccardo, Marco; Mostaghimi, Hamid; Chen, Qijia; Singh, Virendra; Khánh, Tran Quoc; Rosu, Paul; Szlyk, Hannah; Brown, Zachary; Narayan, Himanshu; Menezes, Aline; Roberts, Jonathan; Alley, William; Sun, Kunyang; Patel, Arkil; Lamparth, Max; Reuel, Anka; Xin, Linwei; Xu, Hanmeng; Loader, Jacob; Martin, Freddie; Wang, Zixuan; Achilleos, Andrea; Preu, Thomas; Korbak, Tomek; Bosio, Ida; Kazemi, Fereshteh; Chen, Ziye; Bálint, Biró; Lo, Eve J. Y.; Wang, Jiaqi; Nunes, Maria Inês S.; Milbauer, Jeremiah; Bari, M Saiful; Wang, Zihao; Ansarinejad, Behzad; Sun, Yewen; Durand, Stephane; Elgnainy, Hossam; Douville, Guillaume; Tordera, Daniel; Balabanian, George; Wolff, Hew; Kvistad, Lynna; Milliron, Hsiaoyun; Sakor, Ahmad; Eron, Murat; O. , Andrew Favre D.; Shah, Shailesh; Zhou, Xiaoxiang; Kamalov, Firuz; Abdoli, Sherwin; Santens, Tim; Barkan, Shaul; Tee, Allison; Zhang, Robin; Tomasiello, Alessandro; De Luca, G. Bruno; Looi, Shi-Zhuo; Le, Vinh-Kha; Kolt, Noam; Pan, Jiayi; Rodman, Emma; Drori, Jacob; Fossum, Carl J; Muennighoff, Niklas; Jagota, Milind; Pradeep, Ronak; Fan, Honglu; Eicher, Jonathan; Chen, Michael; Thaman, Kushal; Merrill, William; Firsching, Moritz; Harris, Carter; Ciobâcă, Stefan; Gross, Jason; Pandey, Rohan; Gusev, Ilya; Jones, Adam; Agnihotri, Shashank; Zhelnov, Pavel; Mofayezi, Mohammadreza; Piperski, Alexander; Zhang, David K.; Dobarskyi, Kostiantyn; Leventov, Roman; Soroko, Ignat; Duersch, Joshua; Taamazyan, Vage; Ho, Andrew; Ma, Wenjie; Held, William; Xian, Ruicheng; Zebaze, Armel Randy; Mohamed, Mohanad; Leser, Julian Noah; Yuan, Michelle X; Yacar, Laila; Lengler, Johannes; Olszewska, Katarzyna; Di Fratta, Claudio; Oliveira, Edson; Jackson, Joseph W.; Zou, Andy; Chidambaram, Muthu; Manik, Timothy; Haffenden, Hector; Stander, Dashiell; Dasouqi, Ali; Shen, Alexander; Golshani, Bita; Stap, David; Kretov, Egor; Uzhou, Mikalai; Zhidkovskaya, Alina Borisovna; Winter, Nick; Rodriguez, Miguel Orbegozo; Lauff, Robert; Wehr, Dustin; Tang, Colin; Hossain, Zaki; Phillips, Shaun; Samuele, Fortuna; Ekström, Fredrik; Hammon, Angela; Patel, Oam; Farhidi, Faraz; Medley, George; Mohammadzadeh, Forough; Peñaflor, Madellene; Kassahun, Haile; Friedrich, Alena; Perez, Rayner Hernandez; Pyda, Daniel; Sakal, Taom; Dhamane, Omkar; Mirabadi, Ali Khajegili; Hallman, Eric; Okutsu, Kenchi; Battaglia, Mike; Maghsoudimehrabani, Mohammad; Amit, Alon; Hulbert, Dave; Pereira, Roberto; Weber, Simon; Handoko,; Peristyy, Anton; Malina, Stephen; Mehkary, Mustafa; Aly, Rami; Reidegeld, Frank; Dick, Anna-Katharina; Friday, Cary; Singh, Mukhwinder; Shapourian, Hassan; Kim, Wanyoung; Costa, Mariana; Gurdogan, Hubeyb; Kumar, Harsh; Ceconello, Chiara; Zhuang, Chao; Park, Haon; Carroll, Micah; Tawfeek, Andrew R.; Steinerberger, Stefan; Aggarwal, Daattavya; Kirchhof, Michael; Dai, Linjie; Kim, Evan; Ferret, Johan; Shah, Jainam; Wang, Yuzhou; Yan, Minghao; Burdzy, Krzysztof; Zhang, Lixin; Franca, Antonio; Pham, Diana T.; Loh, Kang Yong; Robinson, Joshua; Jackson, Abram; Giordano, Paolo; Petersen, Philipp; Cosma, Adrian; Colino, Jesus; White, Colin; Votava, Jacob; Vinnikov, Vladimir; Delaney, Ethan; Spelda, Petr; Stritecky, Vit; Shahid, Syed M.; Mourrat, Jean-Christophe; Vetoshkin, Lavr; Sponselee, Koen; Bacho, Renas; Yong, Zheng-Xin; de la Rosa, Florencia; Cho, Nathan; Li, Xiuyu; Malod, Guillaume; Weller, Orion; Albani, Guglielmo; Lang, Leon; Laurendeau, Julien; Kazakov, Dmitry; Adesanya, Fatimah; Portier, Julien; Hollom, Lawrence; Souza, Victor; Zhou, Yuchen Anna; Degorre, Julien; Yalın, Yiğit; Obikoya, Gbenga Daniel; Rai,; Bigi, Filippo; Boscá, M. C.; Shumar, Oleg; Bacho, Kaniuar; Recchia, Gabriel; Popescu, Mara; Shulga, Nikita; Tanwie, Ngefor Mildred; Lux, Thomas C. H.; Rank, Ben; Ni, Colin; Brooks, Matthew; Yakimchyk, Alesia; Huanxu,; Liu,; Cavalleri, Stefano; Häggström, Olle; Verkama, Emil; Newbould, Joshua; Gundlach, Hans; Brito-Santana, Leonor; Amaro, Brian; Vajipey, Vivek; Grover, Rynaa; Wang, Ting; Kratish, Yosi; Li, Wen-Ding; Gopi, Sivakanth; Caciolai, Andrea; de Witt, Christian Schroeder; Hernández-Cámara, Pablo; Rodolà, Emanuele; Robins, Jules; Williamson, Dominic; Cheng, Vincent; Raynor, Brad; Qi, Hao; Segev, Ben; Fan, Jingxuan; Martinson, Sarah; Wang, Erik Y.; Hausknecht, Kaylie; Brenner, Michael P.; Mao, Mao; Demian, Christoph; Kassani, Peyman; Zhang, Xinyu; Avagian, David; Scipio, Eshawn Jessica; Ragoler, Alon; Tan, Justin; Sims, Blake; Plecnik, Rebeka; Kirtland, Aaron; Bodur, Omer Faruk; Shinde, D. P.; Labrador, Yan Carlos Leyva; Adoul, Zahra; Zekry, Mohamed; Karakoc, Ali; Santos, Tania C. B.; Shamseldeen, Samir; Karim, Loukmane; Liakhovitskaia, Anna; Resman, Nate; Farina, Nicholas; Gonzalez, Juan Carlos; Maayan, Gabe; Anderson, Earth; Pena, Rodrigo De Oliveira; Kelley, Elizabeth; Mariji, Hodjat; Pouriamanesh, Rasoul; Wu, Wentao; Finocchio, Ross; Alarab, Ismail; Cole, Joshua; Ferreira, Danyelle; Johnson, Bryan; Safdari, Mohammad; Dai, Liangti; Arthornthurasuk, Siriphan; McAlister, Isaac C.; Moyano, Alejandro José; Pronin, Alexey; Fan, Jing; Ramirez-Trinidad, Angel; Malysheva, Yana; Pottmaier, Daphiny; Taheri, Omid; Stepanic, Stanley; Perry, Samuel; Askew, Luke; Rodríguez, Raúl Adrián Huerta; Minissi, Ali M. R.; Lorena, Ricardo; Iyer, Krishnamurthy; Fasiludeen, Arshad Anil; Clark, Ronald; Ducey, Josh; Piza, Matheus; Somrak, Maja; Vergo, Eric; Qin, Juehang; Borbás, Benjámin; Chu, Eric; Lindsey, Jack; Jallon, Antoine; McInnis, I. M. J.; Chen, Evan; Semler, Avi; Gloor, Luk; Shah, Tej; Carauleanu, Marc; Lauer, Pascal; Huy, Tran Đuc; Shahrtash, Hossein; Duc, Emilien; Lewark, Lukas; Brown, Assaf; Albanie, Samuel; Weber, Brian; Vaz, Warren S.; Clavier, Pierre; Fan, Yiyang; Silva, Gabriel Poesia Reis e; Long,; Lian,; Abramovitch, Marcus; Jiang, Xi; Mendoza, Sandra; Islam, Murat; Gonzalez, Juan; Mavroudis, Vasilios; Xu, Justin; Kumar, Pawan; Goswami, Laxman Prasad; Bugas, Daniel; Heydari, Nasser; Jeanplong, Ferenc; Jansen, Thorben; Pinto, Antonella; Apronti, Archimedes; Galal, Abdallah; Ze-An, Ng; Singh, Ankit; Jiang, Tong; Xavier, Joan of Arc; Agarwal, Kanu Priya; Berkani, Mohammed; Zhang, Gang; Du, Zhehang; Junior, Benedito Alves de Oliveira; Malishev, Dmitry; Remy, Nicolas; Hartman, Taylor D.; Tarver, Tim; Mensah, Stephen; Loume, Gautier Abou; Morak, Wiktor; Habibi, Farzad; Hoback, Sarah; Cai, Will; Gimenez, Javier; Montecillo, Roselynn Grace; Łucki, Jakub; Campbell, Russell; Sharma, Asankhaya; Meer, Khalida; Gul, Shreen; Gonzalez, Daniel Espinosa; Alapont, Xavier; Hoover, Alex; Chhablani, Gunjan; Vargus, Freddie; Agarwal, Arunim; Jiang, Yibo; Patil, Deepakkumar; Outevsky, David; Scaria, Kevin Joseph; Maheshwari, Rajat; Dendane, Abdelkader; Shukla, Priti; Cartwright, Ashley; Bogdanov, Sergei; Mündler, Niels; Möller, Sören; Arnaboldi, Luca; Thaman, Kunvar; Siddiqi, Muhammad Rehan; Saxena, Prajvi; Gupta, Himanshu; Fruhauff, Tony; Sherman, Glen; Vincze, Mátyás ;Usawasutsakorn, Siranut; Ler, Dylan; Radhakrishnan, Anil; Enyekwe, Innocent; Salauddin, Sk Md; Muzhen, Jiang; Maksapetyan, Aleksandr; Rossbach, Vivien; Harjadi, Chris; Bahaloohoreh, Mohsen; Sparrow, Claire; Sidhu, Jasdeep; Ali, Sam; Bian, Song; Lai, John; Singer, Eric; Uro, Justine Leon; Bateman, Greg; Sayed, Mohamed; Menshawy, Ahmed; Duclosel, Darling; Bezzi, Dario; Jain, Yashaswini; Aaron, Ashley; Tiryakioglu, Murat; Siddh, Sheeshram; Krenek, Keith; Shah, Imad Ali; Jin, Jun; Creighton, Scott; Peskoff, Denis; EL-Wasif, Zienab; P, Ragavendran; Richmond, Michael; McGowan, Joseph; Patwardhan, Tejal; Sun, Hao-Yu; Sun, Ting; Zubić, Nikola; Sala, Samuele; Ebert, Stephen; Kaddour, Jean; Schottdorf, Manuel; Wang, Dianzhuo; Petruzella, Gerol; Meiburg, Alex; Medved, Tilen; ElSheikh, Ali; Hebbar, S Ashwin; Vaquero, Lorenzo ;Yang, Xianjun; Poulos, Jason; Zouhar, Vilém; Bogdanik, Sergey; Zhang, Mingfang; Sanz-Ros, Jorge; Anugraha, David; Dai, Yinwei; Nhu, Anh N.; Wang, Xue; Demircali, Ali Anil; Jia, Zhibai; Zhou, Yuyin; Wu, Juncheng; He, Mike; Chandok, Nitin; Sinha, Aarush; Luo, Gaoxiang; Le, Long; Noyé, Mickaël; Pantidis, Ioannis; Qi, Tianbo; Purohit, Soham Sachin; Parcalabescu, Letitia; Nguyen, Thai-Hoa; Winata, Genta Indra; Ponti, Edoardo M.; Li, Hanchen; Dhole, Kaustubh; Park, Jongee; Abbondanza, Dario; Wang, Yuanli; Nayak, Anupam; Caetano, Diogo M.; Wong, Antonio A. W. L.; del Rio-Chanona, Maria; Kondor, Dániel; Francois, Pieter; Chalstrey, Ed; Zsambok, Jakob; Hoyer, Dan; Reddish, Jenny; Hauser, Jakob; Rodrigo-Ginés, Francisco-Javier; Datta, Suchandra; Shepherd, Maxwell; Kamphuis, Thom; Zhang, Qizheng; Kim, Hyunjun; Sun, Ruiji; Yao, Jianzhu; Dernoncourt, Franck; Krishna, Satyapriya; Rismanchian, Sina; Pu, Bonan; Pinto, Francesco; Wang, Yingheng; Shridhar, Kumar; Overholt, Kalon J.; Briia, Glib; Nguyen, Hieu; David,; Bartomeu, Soler; Pang, Tony CY; Wecker, Adam; Xiong, Yifan; Li, Fanfei; Huber, Lukas S.; Jaeger, Joshua; De Maddalena, Romano; Lù, Xing Han; Zhang, Yuhui; Beger, Claas; Kon, Patrick Tser Jern; Li, Sean; Sanker, Vivek; Yin, Ming; Liang, Yihao; Zhang, Xinlu; Agrawal, Ankit; Yifei, Li S.; Zhang, Zechen; Cai, Mu; Sonmez, Yasin; Cozianu, Costin; Li, Changhao; Slen, Alex; Yu, Shoubin; Park, Hyun Kyu; Sarti, Gabriele; Briański, Marcin; Stolfo, Alessandro; Nguyen, Truong An; Zhang, Mike; Perlitz, Yotam; Hernandez-Orallo, Jose; Li, Runjia; Shabani, Amin; Juefei-Xu, Felix; Dhingra, Shikhar; Zohar, Orr; Nguyen, My Chiffon; Pondaven, Alexander; Yilmaz, Abdurrahim; Zhao, Xuandong; Jin, Chuanyang; Jiang, Muyan; Todoran, Stefan; Han, Xinyao; Kreuer, Jules; Rabern, Brian; Plassart, Anna; Maggetti, Martino; Yap, Luther; Geirhos, Robert; Kean, Jonathon; Wang, Dingsu; Mollaei, Sina; Sun, Chenkai; Yin, Yifan; Wang, Shiqi; Li, Rui; Chang, Yaowen; Wei, Anjiang; Bizeul, Alice; Wang, Xiaohan; Arrais, Alexandre Oliveira; Mukherjee, Kushin; Chamorro-Padial, Jorge; Liu, Jiachen; Qu, Xingyu; Guan, Junyi; Bouyamourn, Adam; Wu, Shuyu; Plomecka, Martyna; Chen, Junda; Tang, Mengze; Deng, Jiaqi; Subramanian, Shreyas; Xi, Haocheng; Chen, Haoxuan; Zhang, Weizhi; Ren, Yinuo; Tu, Haoqin; Kim, Sejong; Chen, Yushun; Marjanović, Sara Vera; Ha, Junwoo; Luczyna, Grzegorz; Ma, Jeff J.; Shen, Zewen; Song, Dawn; Zhang, Cedegao E.; Wang, Zhun; Gendron, Gaël; Xiao, Yunze; Smucker, Leo; Weng, Erica; Lee, Kwok Hao; Ye, Zhe; Ermon, Stefano; Lopez-Miguel, Ignacio D.; Knights, Theo; Gitter, Anthony; Park, Namkyu; Wei, Boyi; Chen, Hongzheng; Pai, Kunal; Elkhanany, Ahmed; Lin, Han; Siedler, Philipp D.; Fang, Jichao; Mishra, Ritwik; Zsolnai-Fehér, Károly; Jiang, Xilin; Khan, Shadab; Yuan, Jun; Jain, Rishab Kumar; Lin, Xi; Peterson, Mike; Wang, Zhe; Malusare, Aditya; Tang, Maosen; Gupta, Isha; Fosin, Ivan; Kang, Timothy; Dworakowska, Barbara; Matsumoto, Kazuki; Zheng, Guangyao; Sewuster, Gerben; Villanueva, Jorge Pretel; Rannev, Ivan; Chernyavsky, Igor; Chen, Jiale; Banik, Deepayan; Racz, Ben; Dong, Wenchao; Wang, Jianxin; Bashmal, Laila; Gonçalves, Duarte V.; Hu, Wei; Bar, Kaushik; Bohdal, Ondrej; Patlan, Atharv Singh; Dhuliawala, Shehzaad; Geirhos, Caroline; Wist, Julien; Kansal, Yuval; Chen, Bingsen; Tire, Kutay; Yücel, Atak Talay; Christof, Brandon; Singla, Veerupaksh; Song, Zijian; Chen, Sanxing; Ge, Jiaxin; Ponkshe, Kaustubh; Park, Isaac; Shi, Tianneng; Ma, Martin Q.; Mak, Joshua; Lai, Sherwin; Moulin, Antoine; Cheng, Zhuo; Zhu, Zhanda; Zhang, Ziyi; Patil, Vaidehi; Jha, Ketan; Men, Qiutong; Wu, Jiaxuan; Zhang, Tianchi; Vieira, Bruno Hebling; Aji, Alham Fikri; Chung, Jae-Won; Mahfoud, Mohammed; Hoang, Ha Thi; Sperzel, Marc; Hao, Wei; Meding, Kristof; Xu, Sihan; Kostakos, Vassilis; Manini, Davide; Liu, Yueying; Toukmaji, Christopher; Paek, Jay; Yu, Eunmi; Demircali, Arif Engin; Sun, Zhiyi; Dewerpe, Ivan; Qin, Hongsen; Pflugfelder, Roman; Bailey, James; Morris, Johnathan; Heilala, Ville; Rosset, Sybille; Yu, Zishun; Chen, Peter E.; Yeo, Woongyeong; Jain, Eeshaan; Yang, Ryan; Chigurupati, Sreekar; Chernyavsky, Julia; Reddy, Sai Prajwal; Venugopalan, Subhashini; Batra, Hunar; Park, Core Francisco; Tran, Hieu; Maximiano, Guilherme; Zhang, Genghan; Liang, Yizhuo; Shiyu, Hu; Xu, Rongwu; Pan, Rui; Suresh, Siddharth; Liu, Ziqi; Gulati, Samaksh; Zhang, Songyang; Turchin, Peter; Bartlett, Christopher W.; Scotese, Christopher R.; Cao, Phuong M.; Nattanmai, Aakaash; McKellips, Gordon; Cheraku, Anish; Suhail, Asim; Luo, Ethan; Deng, Marvin; Luo, Jason; Zhang, Ashley; Jindel, Kavin; Paek, Jay; Halevy, Kasper; Baranov, Allen; Liu, Michael; Avadhanam, Advaith; Zhang, David; Cheng, Vincent; Ma, Brad; Fu, Evan; Do, Liam; Lass, Joshua; Yang, Hubert; Sunkari, Surya; Bharath, Vishruth; Ai, Violet; Leung, James; Agrawal, Rishit; Zhou, Alan; Chen, Kevin; Kalpathi, Tejas; Xu, Ziqi; Wang, Gavin; Xiao, Tyler; Maung, Erik; Lee, Sam; Yang, Ryan; Yue, Roy; Zhao, Ben; Yoon, Julia; Sun, Sunny; Singh, Aryan; Luo, Ethan; Peng, Clark; Osbey, Tyler; Wang, Taozhi; Echeazu, Daryl; Yang, Hubert; Wu, Timothy; Patel, Spandan; Kulkarni, Vidhi; Sundarapandiyan, Vijaykaarti; Zhang, Ashley; Le, Andrew; Nasim, Zafir; Yalam, Srikar; Kasamsetty, Ritesh; Samal, Soham; Yang, Hubert; Sun, David; Shah, Nihar; Saha, Abhijeet; Zhang, Alex; Nguyen, Leon; Nagumalli, Laasya; Wang, Kaixin; Zhou, Alan; Wu, Aidan; Luo, Jason; Telluri, Anwith; Yue, Summer; Wang, Alexandr; Hendrycks, Dan
2025-01-01
Abstract
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. HLE consists of 2,700 questions across dozens of subjects, including mathematics, humanities, and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable, but cannot be quickly answered via internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a significant gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai.
Non ci sono file associati a questo prodotto.
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
Sei sicuro che questo prodotto debba essere cancellato?
Il report seguente simula gli indicatori relativi alla propria produzione scientifica in relazione alle soglie ASN 2023-2025 del proprio SC/SSD. Si ricorda che il superamento dei valori soglia (almeno 2 su 3) è requisito necessario ma non sufficiente al conseguimento dell'abilitazione. La simulazione si basa sui dati IRIS e sugli indicatori bibliometrici alla data indicata e non tiene conto di eventuali periodi di congedo obbligatorio, che in sede di domanda ASN danno diritto a incrementi percentuali dei valori. La simulazione può differire dall'esito di un’eventuale domanda ASN sia per errori di catalogazione e/o dati mancanti in IRIS, sia per la variabilità dei dati bibliometrici nel tempo. Si consideri che Anvur calcola i valori degli indicatori all'ultima data utile per la presentazione delle domande. La presente simulazione è stata realizzata sulla base delle specifiche raccolte sul tavolo ER del Focus Group IRIS coordinato dall’Università di Modena e Reggio Emilia e delle regole riportate nel DM 589/2018 e allegata Tabella A. Cineca, l’Università di Modena e Reggio Emilia e il Focus Group IRIS non si assumono alcuna responsabilità in merito all’uso che il diretto interessato o terzi faranno della simulazione. Si specifica inoltre che la simulazione contiene calcoli effettuati con dati e algoritmi di pubblico dominio e deve quindi essere considerata come un mero ausilio al calcolo svolgibile manualmente o con strumenti equivalenti.