• <em id="6vhwh"><rt id="6vhwh"></rt></em>

    <style id="6vhwh"></style>

    <style id="6vhwh"></style>
    1. <style id="6vhwh"></style>
        <sub id="6vhwh"><p id="6vhwh"></p></sub>
        <p id="6vhwh"></p>
          1. 国产亚洲欧洲av综合一区二区三区 ,色爱综合另类图片av,亚洲av免费成人在线,久久热在线视频精品视频,成在人线av无码免费,国产精品一区二区久久毛片,亚洲精品成人片在线观看精品字幕 ,久久亚洲精品成人av秋霞

            22個(gè)大型語言模型(LLM),從Bert到GPT-4

            更新時(shí)間:2024-03-06 16:34:18 閱讀: 評(píng)論:0

            2024年3月6日發(fā)(作者:做湯圓作文)

            22個(gè)大型語言模型(LLM),從Bert到GPT-4

            22個(gè)大型語言模型,

            從Bert到GPT-4

            大型語言模型(LLM)是人工智能領(lǐng)域機(jī)器學(xué)習(xí)技術(shù)在自然語言處理(NLP)方向上的產(chǎn)物,通常包含一個(gè)參數(shù)眾多的神經(jīng)元網(wǎng)絡(luò),使用大量的文本數(shù)據(jù)進(jìn)行訓(xùn)練。這些模型已經(jīng)開始展露出完成多種語言任務(wù)的能力,包括理解、編寫、推理和對(duì)話等,并且有可能超越語言范圍成為更多人工智能工具的有力的基礎(chǔ)。[1]

            本文收集了迄今為止公布的大型語言模型,共22個(gè),并給出了其基本信息,基本技術(shù)參數(shù)和參考資源,詳見下表。

            序號(hào)

            1

            2

            3

            4

            5

            6

            7

            名稱

            BERT

            GPT-2

            GPT-3

            GPT-Neo

            GPT-J

            發(fā)布日期

            2018年10月

            2019年2月

            2020年5月

            2021年3月

            2021年6月

            開發(fā)者

            Google

            OpenAI

            OpenAI

            EleutherAI

            EleutherAI

            Microsoft

            and

            Nvidia

            百度

            Baidu

            Anthropic

            Google

            DeepMind

            Google

            EleutherAI

            DeepMind

            Google

            Meta

            Yandex

            Google

            參數(shù)數(shù)據(jù)集

            版權(quán)數(shù)量 大小

            類型

            (億) (Tokens)

            3.4

            15

            1750

            27

            60

            5300

            2600

            520

            12000

            2800

            1370

            200

            700

            5400

            1750

            1000

            5400

            33億個(gè)詞

            40GB

            100億

            4990億

            825GB

            4020億

            3386億

            4TB

            4000億

            1.6萬億

            3000億

            1.56萬億個(gè)詞

            2.81萬億

            825GB

            1.4萬億

            7680億

            1800億

            1.7TB

            385億

            A-1

            A-2

            B-1

            A-2

            A-1

            C-1

            A-1

            C-2

            C-2

            C-2

            C-2

            A-1

            C-2

            C-2

            A-4

            A-1

            C-2

            參考

            資源

            [5]-[8]

            [9][10]

            [11][12]

            [15][16]

            [17][18]

            [19][20]

            [21][46]

            [22][23]

            [24][25]

            [26][27]

            [28][29]

            [30]

            [31][32]

            [32][33]

            [34][35]

            [36]

            [37][38]

            Megatron-2021年10月

            Turing NLG

            2021年12月

            2021年12月

            2021年12月

            2021年12月

            2022年1月

            2022年2月

            2022年3月

            2022年4月

            2022年5月

            2022年6月

            2022年6月

            Ernie 3.0

            Titan

            8 Claude

            9 GLaM

            10 Gopher

            11 LaMDA

            12

            13

            14

            15

            16

            17

            GPT-NeoX

            Chinchilla

            PaLM

            OPT

            YaLM 100B

            Minerva

            18 BLOOM

            19

            20

            21

            22

            AlexaTM

            ChatGPT

            LLaMA

            GPT-4

            2022年7月

            2022年11月

            2022年11月

            2023年2月

            2023年3月

            Hugging

            Face

            Amazon

            OpenAI

            Meta

            OpenAI

            1750

            200

            <1750

            650

            未知

            3500億

            1.6TB

            1.3萬億

            未知

            1.4萬億

            未知

            A-3

            B-1

            B-1

            A-4

            B-1

            [39]

            [40]-[42]

            [12]-[14]

            [43][44]

            [45]

            表1. 22個(gè)大型語言模型

            參數(shù)說明

            參數(shù)數(shù)量指模型中可以由訓(xùn)練過程不斷更新的參數(shù)個(gè)數(shù),代表著模型的學(xué)習(xí)潛力。使用參數(shù)數(shù)量進(jìn)行對(duì)比的前提是這些模型具有相似的體系結(jié)構(gòu),并且使用同樣的基本模塊Transformer。同一名稱的模型有時(shí)包含一系列不同規(guī)模的子模型,上表中統(tǒng)一選取最大的子模型的參數(shù)數(shù)量。

            數(shù)據(jù)集大小是用于訓(xùn)練模型的數(shù)據(jù)集的大小,這些數(shù)據(jù)集都是未經(jīng)壓縮的文本數(shù)據(jù)。其大小有三種不同的計(jì)算方式,Token數(shù)、詞個(gè)數(shù)和存儲(chǔ)空間大小。其中Token是文本經(jīng)預(yù)處理轉(zhuǎn)化后生成的作為模型輸入的數(shù)據(jù)的基本單元。這三種計(jì)算方式的粗略的換算關(guān)系如下:

            1 Token ≈ 0.75 個(gè)詞 ≈ 4 字節(jié)

            數(shù)據(jù)集大小代表了模型學(xué)習(xí)的廣度。

            另一個(gè)重要的考量是模型的學(xué)習(xí)深度,但沒有統(tǒng)一的可對(duì)比的指標(biāo),讀者可以根據(jù)參考資源中的測試結(jié)果來做推測和對(duì)比。

            版權(quán)類型分為以下幾類:

            A-1: 開源,Apach2.0協(xié)議[2];

            A-2: 開源,MIT協(xié)議[3];

            A-3: 開源,Responsible AI協(xié)議[4];

            A-4: 開源,限于非商業(yè)研究,訪問需要申請(qǐng);

            B-1: 私有,提供開放的基于Web的API;

            C-1: 私有,提供受限的基于Web的訪問;

            C-2: 私有,無開放接口。

            參考資源主要是論文,其次是官方發(fā)表的技術(shù)文檔。如果是開源模型,我們會(huì)給出對(duì)應(yīng)的Github倉庫或者模型說明書的鏈接。

            參考文獻(xiàn)

            [1]

            Wikipedia. “Large language model”. . Retrieved 2023-03-28.

            [2]

            Apache. “Apache Licen, Version 2.0”. . Retrieved 2023-03-28..

            [3]

            Opensource Initiative. “The MIT Licen”. . Retrieved 2023-03-28.

            [4]

            Big Science. “The BigScience RAIL Licen”. . Retrieved 2023-03-28.

            [5]

            Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (11 October 2018). "BERT:

            Pre-training of Deep Bidirectional Transformers for Language Understanding".

            arXiv:1810.04805v2.

            [6]

            "Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language

            Processing". Google AI Blog. Retrieved 2019-11-27.

            [7]

            Rogers, Anna; Kovaleva, Olga; Rumshisky, Anna (2020). "A Primer in BERTology: What We

            Know About How BERT Works". Transactions of the Association for Computational

            Linguistics. 8: 842–866.

            arXiv:2002.12327

            [8]

            Hugging Face, Model Card for BERT ba model.

            [9]

            Radford, Alec and Wu, Jeff and Child, Rewon and Luan, David and Amodei, Dario and

            Sutskever, Ilya. (2019). "Language Models are Unsupervid Multitask Learners".

            [10]

            [11]

            OpenAI. GPT-2 Model Card.

            Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared;

            Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda;

            Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon;

            Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hes, Christopher; Chen,

            Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner,

            Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (May 28, 2020).

            "Language Models are Few-Shot Learners". arXiv:2005.14165

            [12]

            [13]

            [14]

            OpenAI, “API Reference”. . Retrieved 2023-03-28.

            OpenAI (November 30, 2022). "ChatGPT: Optimizing Language Models for

            Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela

            Dialogue". Retrieved December 5, 2022.

            Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob

            Hilton, Frar Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul

            Christiano, Jan Leike, Ryan Lowe (Mar 2022). "Training language models to follow

            instructions with human feedback".

            arXiv:2203.02155.

            [15]

            Gao, Leo; Biderman, Stella; Black, Sid; Golding, Laurence; Hoppe, Travis; Foster,

            Charles; Phang, Jason; He, Horace; Thite, Anish; Nabeshima, Noa; Presr, Shawn; Leahy,

            Connor (31 December 2020). "The Pile: An 800GB Datat of Diver Text for Language

            Modeling". arXiv:2101.00027.

            [16]

            [17]

            [18]

            [19]

            Eleuther AI. Github repository for GPT Neo.

            Forefront Team. "GPT-J-6B: An Introduction to the Largest Open Source GPT Model |

            Eleuther AI. GPT-J 6B Model Card.

            Alvi, Ali; Kharya, Paresh (11 October 2021). "Using DeepSpeed and Megatron to Train

            Forefront". . Retrieved 2023-02-28.

            Megatron-Turing NLG 530B, the World's Largest and Most Powerful Generative Language

            Model". Microsoft Rearch..

            [20]

            Smith, Shaden; Patwary, Mostofa; Norick, Brandon; LeGresley, Patrick; Rajbhandari,

            Samyam; Casper, Jared; Liu, Zhun; Prabhumoye, Shrimai; Zerveas, George; Korthikanti,

            Vijay; Zhang, Elton; Child, Rewon; Aminabadi, Reza Yazdani; Bernauer, Julie; Song, Xia

            (2022-02-04). "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A

            Large-Scale Generative Language Model".

            arXiv:2201.11990.

            [21]

            Wang, Shuohuan; Sun, Yu; Xiang, Yang; Wu, Zhihua; Ding, Siyu; Gong, Weibao; Feng,

            Shikun; Shang, Junyuan; Zhao, Yanbin; Pang, Chao; Liu, Jiaxiang; Chen, Xuyi; Lu, Yuxiang;

            Liu, Weixin; Wang, Xi; Bai, Yangfan; Chen, Qiuliang; Zhao, Li; Li, Shiyong; Sun, Peng; Yu,

            Dianhai; Ma, Yanjun; Tian, Hao; Wu, Hua; Wu, Tian; Zeng, Wei; Li, Ge; Gao, Wen; Wang,

            Haifeng (December 23, 2021). "ERNIE 3.0 Titan: Exploring Larger-scale Knowledge

            Enhanced Pre-training for Language Understanding and Generation". arXiv:2112.12731.

            [22]

            [23]

            [24]

            [25]

            Askell, Amanda; Bai, Yuntao; Chen, Anna; et al. (9 December 2021). "A General

            Bai, Yuntao; Kadavath, Saurav; Kundu, Sandipan; et al. (15 December 2022).

            Dai, Andrew M; Du, Nan (December 9, 2021). "More Efficient In-Context Learning with

            Nan Du, Yanping Huang, Andrew M. Dai, Simon Tong, Dmitry Lepikhin, Yuanzhong

            Language Assistant as a Laboratory for Alignment".

            arXiv:2112.00861.

            "Constitutional AI: Harmlessness from AI Feedback". arXiv:2212.08073

            GLaM". . Retrieved 2023-03-09.

            Xu, Maxim Krikun, Yanqi Zhou, Adams Wei Yu, Orhan Firat, Barret Zoph, Liam

            Fedus, Maarten Bosma, Zongwei Zhou, Tao Wang, Yu Emma Wang, Kellie Webster, Marie

            Pellat, Kevin Robinson, Kathleen Meier-Hellstern, Toju Duke, Lucas Dixon, Kun Zhang, Quoc

            V Le, Yonghui Wu, Zhifeng Chen, Claire Cui. "GLaM: Efficient Scaling of Language Models

            with Mixture-of-Experts".

            arXiv:2112.06905.

            [26]

            [27]

            [28]

            Jack Rae, Geoffrey Irving, Laura Weidinger.

            "Language modelling at scale: Gopher,

            Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; et al. (29 March 2022).

            Cheng, Heng-Tze; Thoppilan, Romal (January 21, 2022). "LaMDA: Towards Safe,

            ethical considerations, and retrieval". . Retrieved 20 March 2023.

            "Training Compute-Optimal Large Language Models".

            arXiv:2203.15556

            Grounded, and High-Quality Dialog Models for Everything". .

            Retrieved 2023-03-09.

            [29]

            Romal Thoppilan, Daniel De Freitas, Jamie Hall, Noam Shazeer, Apoorv

            Kulshreshtha, Heng-Tze Cheng, Alicia Jin, Taylor Bos, Leslie Baker, Yu Du, YaGuang

            Li, Hongrae Lee, Huaixiu Steven Zheng, Amin Ghafouri, Marcelo Menegali, Yanping

            Huang, Maxim Krikun, Dmitry Lepikhin, James Qin, Dehao Chen, Yuanzhong Xu, Zhifeng

            Chen, Adam Roberts, Maarten Bosma, Vincent Zhao, Yanqi Zhou, Chung-Ching Chang, Igor

            Krivokon, Will Rusch, Marc Pickett, Pranesh Srinivasan, Laichee Man, Kathleen Meier-Hellstern, Meredith Ringel Morris, Tule Doshi, Renelito Delos Santos, Toju Duke, Johnny

            Soraker, Ben Zevenbergen, Vinodkumar Prabhakaran, Mark Diaz, Ben Hutchinson, Kristen

            Olson, Alejandra Molina, Erin Hoffman-John, Josh Lee, Lora Aroyo, Ravi Rajakumar, Alena

            Butryna, Matthew Lamm, Viktoriya Kuzmina, Joe Fenton, Aaron Cohen, Rachel

            Bernstein, Ray Kurzweil, Blai Aguera-Arcas, Claire Cui, Marian Croak, Ed Chi, Quoc Le.

            "LaMDA: Language Models for Dialog Applications".

            arXiv:2201.08239.

            [30]

            [31]

            [32]

            Eleuther AI. Github repository for GPT Neox.

            Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; et al. (29 March 2022).

            Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; Sifre, Laurent (12 April

            "Training Compute-Optimal Large Language Models". arXiv:2203.15556

            2022). "An empirical analysis of compute-optimal large language model training". Deepmind

            Blog.

            [33]

            Narang, Sharan; Chowdhery, Aakanksha (April 4, 2022). "Pathways Language Model

            (PaLM): Scaling to 540 Billion Parameters for Breakthrough

            Performance". . Retrieved 2023-03-09.

            [34]

            [35]

            Susan Zhang, Mona Diab, Luke Zettlemoyer. "Democratizing access to large-scale

            Zhang, Susan; Roller, Stephen; Goyal, Naman; Artetxe, Mikel; Chen, Moya; Chen,

            language models with OPT-175B". . Retrieved 2023-03-28.

            Shuohui; Dewan, Christopher; Diab, Mona; Li, Xian; Lin, Xi Victoria; Mihaylov, Todor; Ott,

            Myle; Shleifer, Sam; Shuster, Kurt; Simig, Daniel; Koura, Punit Singh; Sridhar, Anjali; Wang,

            Tianlu; Zettlemoyer, Luke (21 June 2022). "OPT: Open Pre-trained Transformer Language

            Models". arXiv:2205.01068.

            [36]

            [37]

            Khrushchev, Mikhail; Vasilev, Ruslan; Petrov, Alexey; Zinov, Nikolay (2022-06-Lewkowycz, Aitor; Andreasn, Anders; Dohan, David; Dyer, Ethan; Michalewski,

            22). Github Reposity for YaLM 100B. . Retrieved 2023-03-18.

            Henryk; Ramash, Vinay; Slone, Ambro; Anil, Cem; Schlag, Imanol; Gutman-Solo, Theo;

            Wu, Yuhuai; Neyshabur, Behnam; Gur-Ari, Guy; Misra, Vedant (30 June 2022). "Solving

            Quantitative Reasoning Problems with Language Models". arXiv:2206.14858.

            [38]

            [39]

            [40]

            "Minerva: Solving Quantitative Reasoning Problems with Language

            "bigscience/bloom · Hugging Face". . Retrieved 2023-03-28.

            "20B-parameter Alexa model ts new marks in few-shot learning". Amazon Science. 2

            Models". . Retrieved 20 March 2023

            August 2022. Retrieved 2023-03-28.

            [41]

            Soltan, Saleh; Ananthakrishnan, Shankar; FitzGerald, Jack; et al. (3 August 2022).

            "AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq

            Model". arXiv:2208.01448.

            [42]

            [43]

            [44]

            "AlexaTM 20B is now available in Amazon SageMaker JumpStart | AWS Machine

            "Introducing LLaMA: A foundational, 65-billion-parameter large language model". Meta

            Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne

            Learning Blog". . 17 November 2022. Retrieved 13 March 2023.

            AI. 24 February 2023.

            Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal

            Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample. “LLaMA:

            Open and Efficient Foundation Language Models”.

            arXiv:2302.13971.

            [45]

            [46]

            "GPT-4 Technical Report". OpenAI. 2023. Retrieved March 14, 2023.

            百度. ERNIE 3.0. . Retrieved 2023-03-28.

            22個(gè)大型語言模型(LLM),從Bert到GPT-4

            本文發(fā)布于:2024-03-06 16:34:17,感謝您對(duì)本站的認(rèn)可!

            本文鏈接:http://www.newhan.cn/zhishi/a/1709714058252992.html

            版權(quán)聲明:本站內(nèi)容均來自互聯(lián)網(wǎng),僅供演示用,請(qǐng)勿用于商業(yè)和其他非法用途。如果侵犯了您的權(quán)益請(qǐng)與我們聯(lián)系,我們將在24小時(shí)內(nèi)刪除。

            本文word下載地址:22個(gè)大型語言模型(LLM),從Bert到GPT-4.doc

            本文 PDF 下載地址:22個(gè)大型語言模型(LLM),從Bert到GPT-4.pdf

            上一篇:新疆年鑒
            下一篇:返回列表
            留言與評(píng)論(共有 0 條評(píng)論)
               
            驗(yàn)證碼:
            Copyright ?2019-2022 Comsenz Inc.Powered by ? 實(shí)用文體寫作網(wǎng)旗下知識(shí)大全大全欄目是一個(gè)全百科類寶庫! 優(yōu)秀范文|法律文書|專利查詢|
            主站蜘蛛池模板: 美女扒开内裤无遮挡禁18| 开心激情站开心激情网六月婷婷| 高中女无套中出17p| 2019国产精品青青草原| 成人无码视频在线观看免费播放| 久久精产国品一二三产品| 美女的胸www又黄的网站| 亚洲日韩亚洲另类激情文学| 久久精品伊人波多野结衣| 国产精品久久久久久久影院| 国产成人精品视频不卡| 欧美精品v| 激情综合网激情五月俺也去| 色色97| 國產尤物AV尤物在線觀看| 亚洲a毛片| 伊人色在线视频| 欧美在线人视频在线观看| 欧美精品一区二区精品久久| 国产目拍亚洲精品二区| 欧洲熟妇熟女久久精品综合| 粉嫩av蜜臀一区二区三区| 亚洲全网成人资源在线观看| 浴室人妻的情欲hd三级国产| 免费看的一级毛片| yw尤物av无码国产在线观看| 丁香五月婷激情综合第九色| 欧美亚洲国产精品久久蜜芽| 亚洲国产超清无码专区| 一区二区三区激情都市| 中文有无人妻vs无码人妻激烈| 中文字幕日韩人妻高清在线| 亚洲夜色噜噜av在线观看| 被灌满精子的少妇视频| 欧美人与动牲交精品| 激情自拍校园春色中文| 2022最新国产在线不卡a| 色午夜久久男人操女人| 日韩毛片在线视频x| 一本久久a久久精品亚洲| 无码精品一区二区久久久|