首頁 > 文體寫作

tag是什么

更新時間:2023-03-12 23:52:42 閱讀：評論：0

合肥有什么好玩的地方-簡短的小故事

2023年3月12日發(作者：愚公移山英文版)

POSTagging

POStagging:part-of-speechtagging,orwordclassorlexicalcategories.說法很多其實就是詞性標注。

那么?nltk的?具集的off-the-shelf?具可以簡單的對?本進?POStagging

>>>text=_tokenize("Andnowforsomethingcompletelydifferent")

>>>_tag(text)

[('And','CC'),('now','RB'),('for','IN'),('something','NN'),('completely','RB'),('different','JJ')]

APIDocument??是這么介紹這個接?的

UNLTK'scurrentlyrecommendedpartofspeechtaggertotagthegivenlistoftokens.

我查了下code，pos_tagloadtheStandardtreebankPOStagger

dinatingconjunction

inalnumber

rminer

tentialthere

ignword

ositionorsubordinatingconjunction

ctive

ective,comparative

ective,superlative

itemmarker

,singularormass

n,plural

pernoun,singular

opernoun,plural

determiner

ssiveending

sonalpronoun

$Posssivepronoun

erb,comparative

erb,superlative

icle

bol

rjection

,baform

b,pastten

b,gerundorprentparticiple

b,pastparticiple

b,non-3rdpersonsingularprent

b,3rdpersonsingularprent

-determiner

-pronoun

$Posssivewh-pronoun

-adverb

現在根據上?主要詞性縮寫的解釋，可以?較容易理解上?接?給出的詞性標注了。

在nltk的corpus，語料庫，??有些是加過詞性標注的，這些可以?于訓練集，標注過的corpors都有tagged_words()method

>>>_words()

[('The','AT'),('Fulton','NP-TL'),('County','NN-TL'),...]

>>>_words(simplify_tags=True)

[('The','DET'),('Fulton','N'),('County','N'),...]

AutomaticTagging

下?就來講講各種?動標注的?法，因為tag要根據詞的context，所以tag是以nten為單位的，?不是word為單位，因為如果以詞為

單位，?個句?的結尾詞會影響到下個句?開頭詞的tag，這樣是不合理的，以句?為單位可以避免這樣的錯誤，讓context的影響不會越過

nten。

我們就?browncorpus作為例?，

>>>importbrown

>>>brown_tagged_nts=_nts(categories='news')

>>>brown_nts=(categories='news')

可以分布取出標注過的句?集合，未標注的句?集合，分別?做標注算法的驗證集和測試集。

TheDefaultTagger

Thesimplestpossibletaggerassignsthesametagtoeachtoken.

>>>raw='Idonotlikegreeneggsandham,IdonotlikethemSamIam!'

>>>tokens=_tokenize(raw)

>>>default_tagger=tTagger('NN')

>>>default_(tokens)

[('I','NN'),('do','NN'),('not','NN'),('like','NN'),('green','NN'),

('eggs','NN'),('and','NN'),('ham','NN'),(',','NN'),('I','NN'),

198|Chapter5:CategorizingandTaggingWords

('do','NN'),('not','NN'),('like','NN'),('them','NN'),('Sam','NN'),

('I','NN'),('am','NN'),('!','NN')]

這個Tagger，真的很簡單就是把所有的都標注成你告訴他的這種，看似毫?意義的tagger，不過作為backoff，還是有?的

TheRegularExpressionTagger

Theregularexpressiontaggerassignstagstotokensonthebasisofmatchingpatterns.

>>>patterns=[

...(r'.*ing$','VBG'),#gerunds

...(r'.*ed$','VBD'),#simplepast

...(r'.*es$','VBZ'),#3rdsingularprent

...(r'.*ould$','MD'),#modals

...(r'.*/'s$','NN$'),#posssivenouns

...(r'.*s$','NNS'),#pluralnouns

...(r'^-?[0-9]+(.[0-9]+)?$','CD'),#cardinalnumbers

...(r'.*','NN')#nouns(default)

...]

>>>regexp_tagger=Tagger(patterns)

>>>regexp_(brown_nts[3])

[('``','NN'),('Only','NN'),('a','NN'),('relative','NN'),('handful','NN'),

('of','NN'),('such','NN'),('reports','NNS'),('was','NNS'),('received','VBD'),

("''",'NN'),(',','NN'),('the','NN'),('jury','NN'),('said','NN'),(',','NN'),

('``','NN'),('considering','VBG'),('the','NN'),('widespread','NN'),...]

這個Tagger，進步了?點，就是你可以定義?些正則?法的規則，滿?規則就tag成相應的詞性，否則還是default

TheLookupTagger

’sfindthehundredmostfrequentwordsandstoretheirmost

likelytag.

這個?法開始有點實?價值了，就是通過統計訓練corpus??最常?的詞，最有可能出現的詞性是什么，來進?詞性標注。

>>>fd=st((categories='news'))

>>>cfd=ionalFreqDist(_words(categories='news'))

>>>most_freq_words=()[:100]

>>>likely_tags=dict((word,cfd[word].max())forwordinmost_freq_words)

>>>baline_tagger=mTagger(model=likely_tags)

這段code就是從corpus中取出top100的詞，然后找到這100個詞出現次數最多的詞性，然后形成likely_tags的字典

然后將這個字典作為model傳個unigramTagger

unigramTagger就是?元的tagger，即不考慮前后context的?種簡單的tagger

這個?法有個最?的問題，你只指定了top100詞的詞性，那么其他的詞怎么辦

好，前?的defaulttagger有?了

baline_tagger=mTagger(model=likely_tags,backoff=tTagger('NN'))

這樣就可以部分解決這個問題，不知道的就?defaulttagger來標注

這個?法的準確性完全取決于這個model的??，這?取了top100的詞，可能準確性不?，但是隨著你取的詞的增多，這個準確率會不斷

提?。

N-GramTagging

Unigramtaggersarebadonasimplestatisticalalgorithm:foreachtoken,assignthetagthatismostlikelyforthat

particulartoken.

上?給出的lookuptagger就是?的Unigramtagger，現在給出Unigramtagger更?般的?法

>>>importbrown

>>>brown_tagged_nts=_nts(categories='news')

>>>brown_nts=(categories='news')

>>>unigram_tagger=mTagger(brown_tagged_nts)＃Training

>>>unigram_(brown_nts[2007])

[('Various','JJ'),('of','IN'),('the','AT'),('apartments','NNS'),

('are','BER'),('of','IN'),('the','AT'),('terrace','NN'),('type','NN'),

(',',','),('being','BEG'),('on','IN'),('the','AT'),('ground','NN'),

('floor','NN'),('so','QL'),('that','CS'),('entrance','NN'),('is','BEZ'),

('direct','JJ'),('.','.')]

你可以來已標注的語料庫對Unigramtagger進?訓練

Ann-gramtaggerisageneralizationofaunigramtaggerwhocontextisthecurrentwordtogetherwiththepart-of-

speechtagsofthen-1precedingtokens.

n元就是要考慮context，即考慮前n-1個word的tag，來給當前的word進?tagging

就n元tagger的特例?元tagger作為例?

>>>bigram_tagger=Tagger(train_nts)

>>>bigram_(brown_nts[2007])

這樣有個問題，如果tag的句?中的某個詞的context在訓練集??沒有，哪怕這個詞在訓練集中有，也?法對他進?標注，還是要通過

backoff來解決這樣的問題

>>>t0=tTagger('NN')

>>>t1=mTagger(train_nts,backoff=t0)

>>>t2=Tagger(train_nts,backoff=t1)

Transformation-BadTagging

n-gramtagger存在的問題是，model會占??較?的空間，還有就是在考慮context時，只會考慮前?詞的tag，?不會考慮詞本?。

?要介紹的這種tagger可以?較好的解決這些問題，?存儲rule來代替model，這樣可以節省?量的空間，同時在rule中不限制僅考慮

tag，也可以考慮word本?。

Brilltaggingisakindoftransformation-badlearning,eralideaisverysimple:guessthe

tagofeachword,thengobackandfixthemistakes.

那么Brilltagging的原理從底下這個例?就可以了解

(1)replaceNNwithVBwhenthepreviouswordisTO;

(2)replaceTOwithINwhenthenexttagisNNS.

Phratoincreagrantstostatesforvocationalrehabilitation

UnigramTONNNNSTONNSINJJNN

Rule1VB

Rule2IN

OutputTOVBNNSINNNSINJJNN

第?步?unigramtagger對所有詞做?遍tagging，這??可能有很多不準確的

下?就?rule來糾正第?步中guess錯的那些詞的tag，最終得到?較準確的tagging

那么這些rules是怎么?成的了，答案是在training階段?動?成的

Duringitstrainingpha,thetaggerguessvaluesforT1,T2,andC,leis

scoredaccordingtoitsnetbenefit:thenumberofincorrecttagsthatitcorrects,lessthenumber

ofcorrecttagsitincorrectlymodifies.

意思就是在training階段，先創建thousandsofcandidaterules，這些rule創建可以通過簡單的統計來完成，所以可能有?些rule是不準

確的。那么?每條rule去fixmistakes，然后和正確tag對?，改對的數?減去改錯的數??來作為score評價該rule的好壞，?然得分?的

留下，得分低的rule就刪去，底下是些rules的例?

NN->VBifthetagoftheprecedingwordis'TO'

NN->VBDifthetagofthefollowingwordis'DT'

NN->VBDifthetagoftheprecedingwordis'NNS'

NN->NNPifthetagofwordsi-2...i-1is'-NONE-'

NN->NNPifthetagofthefollowingwordis'NNP'

NN->NNPifthetextofwordsi-2...i-1is'like'

NN->VBNifthetextofthefollowingwordis'*-1'

本文發布于:2023-03-12 23:52:40，感謝您對本站的認可！

本文鏈接：http://www.newhan.cn/zhishi/a/167863636126939.html

本文word下載地址：tag是什么.doc

本文 PDF 下載地址：tag是什么.pdf

上一篇：學生會面試

下一篇：返回列表

標簽：tag是什么

留言與評論（共有 0 條評論）