San Jose State University : Department of Linguistics and Language Development

Navigation

Main Content

CHLT Corpora

Members of the Center for Human Language Technology have access to the following list of speech and text corpora:

Catalog IDDescriptionFormat
LDC94T4AUN Parallel Text (Complete)3 DVD
LDC95T7Penn Treebank, Release 2download
LDC96L16CALLHOME Spanish Lexicondownload
LDC96L17CALLHOME Japanese Lexicondownload
LDC96S35CALLHOME Spanish Speech1 CD
LDC96T17CALLHOME Spanish Transcriptsdownload
LDC99T41Spanish Newswire Text, Volume 21 CD
LDC99L22Egyptian Colloquial Arabic Lexicondownload
LDC2002L49Buckwalter Arabic Morphological Analyzer Version 1.0download
LDC2003T10Syntactically Annotated Idioms Dictionarydownload
LDC2005S25Santa Barbara Corpus of Spoken American English1 DVD
LDC2005S26CSLU: 22 Languages Corpus2 DVD
LDC2005T01Chinese Treebank 5.0download
LDC2005T06Chinese News Translation Text Part 1download
LDC2005T10Chinese English News Magazine Parallel Text1 CD
LDC2005T12English Gigaword Second Edition2 DVD
LDC2005T13CCGbankdownload
LDC2005T14Chinese Gigaword Second Edition1 DVD
LDC2005T23Chinese Proposition Bank 1.0download
LDC2005T28HARD 2004 Text1 DVD
LDC2005T33BBN Pronoun Coreference and Entity Type Corpusonline
LDC2005T35ANC Second Release2 DVD
LDC2006S34Russian through Switched Telephone Network (RuSTeN)1 DVD
LDC2006S42Korean Broadcast News Speech1 DVD
LDC2006T04Multiple Translation Chinese (MTC) Part 4download
LDC2006T12Spanish Gigaword First Edition1 DVD
LDC2006T13Web 1T 5-gram Version 16 DVD
LDC2006T17French Gigaword First Edition1 DVD
LDC2007S08CSLU: Foreign Accented English Release 1.21 DVD
LDC2007S15Nationwide Speech Project1 DVD
LDC2007T02English Chinese Translation Treebank v 1.0download
LDC2007T09ISI Chinese-English Automatically Extracted Parallel Textdownload
LDC2007T40Arabic Gigaword Third Edition1 DVD
LDC2008S03STC-TIMIT 1.01 DVD
LDC2008S052005 NIST Language Recognition Evaluation1 DVD
ELRA-S0004BDLEX1 DVD
N/ABritish National Corpus - XML Edition2 DVD

Related Information