<?xml version="1.0" encoding="UTF-8"?>
  <feed xmlns="http://www.w3.org/2005/Atom">
  <title type="html"><![CDATA[☆☆ 卢伟个人主页 _____ ☆☆☆☆☆ _____ Lu Wei's Homepage ☆☆ - 语言学]]></title>
  <subtitle type="html"><![CDATA[对外汉语教学／第二语言教学／(应用)语言学网络资源 Web Resources for TCFL, L2, &amp; (Applied) Linguistics]]></subtitle>
  <id>http://www.luweixmu.com/home/</id>
  <link rel="alternate" type="text/html" href="http://www.luweixmu.com/home/" /> 
  <link rel="self" type="application/atom+xml" href="http://www.luweixmu.com/home/atom.asp" /> 
  <generator uri="http://www.pjhome.net/" version="2.8">PJBlog3</generator> 
  <updated>2010-07-20T01:32:29+08:00</updated>

  <entry>
	  <title type="html"><![CDATA[New Publications of Linguistic Data Consortium]]></title>
	  <author>
		 <name>admin</name>
		 <uri>http://www.luweixmu.com/home/</uri>
		 <email>luwig@xmu.edu.cn</email>
	  </author>
	  <category term="" scheme="http://www.luweixmu.com/home/default.asp?cateID=8" label="语言学" /> 
	  <updated>2010-07-20T01:32:29+08:00</updated>
	  <published>2010-07-20T01:32:29+08:00</published>
		  <summary type="html"><![CDATA[In this newsletter:<br/>- 2010 Publication Pipeline Up&#100;ate -<br/>- Up&#100;ated LDC Data Sheets and Papers Pages -<br/><br/>New publications:<br/>- LDC Standard Arabic Morphological Analyzer (SAMA) Version 3.1 - <br/>- NIST 2004 Open Machine Translation (OpenMT) Evaluation -<br/><br/>2010 Publication Pipeline Up&#100;ate<br/><br/>Membership Year (MY) 2010 has included a strong sel&#101;ction of publications including up&#100;ates to the Arabic and Chinese treebanks, Spanish telephone speech and transcript data from the Fisher collection, and Chinese word n-grams collected from the web .&nbsp;&nbsp;Please consult our corpus catalog for a full list of publications distributed by LDC. As we are now in the second half of this membership year, we would like to provide information on what publications you can expect for the remainder of MY2010.&nbsp;&nbsp;Our pipeline includes the following:<br/><br/>Arabic Treebank Part 1 Version 4.1 ~ a revision of Arabic Treebank: Part 1 v 3.0 (POS with full vocalization + syntactic analysis) (LDC2005T02) (ATB1), according to the new Arabic Treebank (ATB) annotation guidelines.&nbsp;&nbsp;The Arabic Treebank project consists of two distinct phases: (a) Part-of-Speech (POS) tagging which divides the text into lexical tokens, and gives relevant information about each token such as lexical category, inflectional features, and a gloss, and (b) Arabic Treebanking which characterizes the constituent structures of word sequences, provides categories for each non-terminal node, and identifies null elements, co-reference, traces, etc. on-terminal node.&nbsp;&nbsp; Arabic Treebank Part 1 Version 4.1 represents the manual revision of the syntactic tree annotation in ATB1, the automatic revision and updating of certain part-of-speech tags, and the manual revision of certain targeted POS tags (function words, in particular).&nbsp;&nbsp;The source data consists of 734 newswire stories from Agence France Presse.<br/><br/>Microsoft Research India POS-Tagged Bengali - to support the task of Part-of-Speech Tagging (POS) and other forms of data-driven linguistic research on Indian languages in general, Microsoft Research India has developed POS labeled data for Hindi, Bengali, and Sanskrit as a part of the Indian Language – Part-of-Speech Tagset (IL-POST) project.&nbsp;&nbsp;The corpora are based on the IL-POST framework. IL-POST is a POS-tagset framework which has been designed to cover the morph-syntactic details of Indian languages. It supports a three-level hierarchy of Categories, Types and Attributes. The Bengali corpus consists of two different levels of information for each lexical token: (a) lexical category and types, and (b) set morphological attributes and their associated values in the context.&nbsp;&nbsp;The data consists of 7168 manually annotated sentences (102933 words) targeted to cover written modern standard Bengali from various sources, including blogs, Multikulti, and Wikipedia. .<br/><br/>TRECVID 2006 Keyframes and Transcripts ~ TREC Video Retrieval Evaluation (TRECVID) is sponsored by NIST to promote progress in content-based retrieval from digital video via open, metrics-based evaluation. The keyframes in this release were extracted for use in the NIST TRECVID 2006 Evaluation.&nbsp;&nbsp;The source data includes approximately 158.6 hours of English, Arabic and Chinese language video data collected by LDC from NBC, CNN, MSN, New Tang Dynasty TV, Phoenix TV, Lebanese Broadcasting Corp.,&nbsp;&nbsp;and China Central TV.&nbsp;&nbsp;The keyframes were sel&#101;cted by going to the middle frame of the shot boundary, then parsing left and right of that frame to locate the nearest I-Frame. This then became the keyframe and was extracted. Keyframes have been provided at both the subshot (NRKF) and master shot (RKF) levels. <br/><br/>Uda Walawe Asian Elephant Vocalizations ~ partially-annotated corpus of Asian Elephant communication/vocalization. The data set contains vocalizations primarily by adult female and juvenile Asian elephants. This corpus is intended to enable researchers in acoustic communication of elephants and other species to compare acoustic features and repertoire diversity to this population. Of particular interest is whether there may be regional dialects that differ among Asian elephant populations in the wild and in captivity. A second interest is in whether structural commonalities exist between this and other species that shed light on underlying social and ecological factors shaping communication systems. <br/><br/>2010 Subscription Members are automatically sent all MY2010 data as it is released.&nbsp;&nbsp;2010 Standard Members are entitled to request 16 corpora for free from MY2010.&nbsp;&nbsp; Non-members may license most data for research use.<br/><br/>Up&#100;ated LDC Data Sheets and Papers Pages<br/><br/>LDC is pleased to announce that both our LDC Data Sheets and LDC Papers pages recently have been up&#100;ated.&nbsp;&nbsp;On our Data Sheets page, you&#39;ll find our growing collection of LDC Data Sheets, each of which highlights a key aspect of the Consortium’s research and development tasks.&nbsp;&nbsp;Recent additions include a data sheet covering Arabic and English treebanking at LDC and one that provides an overview of LDC&#39;s role in sponsored projects.&nbsp;&nbsp; Our up&#100;ated papers page contains several papers from LREC2010:&nbsp;&nbsp;Seventh International Conference on Language Resources and Evaluation, as well as other conferences and journals, dating from 1998 forward.&nbsp;&nbsp;Most papers are available for download in pdf format; presentations slides and posters are available for several papers as well.<br/><br/>On our Papers page, you can read about LDC&#39;s efforts to apply treebank annotation to Arabic broadcast news (Maamouri et al).&nbsp;&nbsp;Broadcast news (BN) transcript data posed new challenges; for instance, the transcript data included metadata which conveys information in addition to the text of what is being said.&nbsp;&nbsp; Some forms of metadata were ignored, such as indications of coughs o&#114; laughter, while others, such as speech effects including discourse markers and word fragments, were annotated.&nbsp;&nbsp;Annotators also had to handle indistinct audio signal wh&#101;rein speech could be heard, but not fully understood, so the words could only be inferred from context rather than from the audio signal.&nbsp;&nbsp;In these cases, the annotation must convey information not contained in the audio signal that accounts for the annotation in that region.&nbsp;&nbsp;The improved Arabic Treebank (ATB) pipeline and revised annotation guidelines proved robust enough to carry out this task with few changes. This paper discusses wh&#101;re some adaptation was necessary and describes the overall pipeline as used in the production of BN ATB data.<br/><br/>Additionally, you can learn about LDC&#39;s role in resource creation for the Knowledge Base Population (KBP) Track of the Text Analysis Conference (TAC) o&#114;ganized by NIST (Simpson et al).&nbsp;&nbsp;The KBP track of TAC is a hybrid descendant of the TREC Question Answering track and the Automated Content Extraction (ACE) evaluation program and is designed to support development of systems that are capable of automatically populating a knowledge base with information about entities mined from unstructured text. An important component of the KBP evaluation is the Entity Linking task, wh&#101;re systems must accurately associate text mentions of unknown Person (PER), o&#114;ganization (ORG), and Geopolitical (GPE) names to entries in a knowledge base. This paper describes the 2009 resource creation efforts, with particular focus on the sel&#101;ction and development of named entity mentions for the Entity Linking task evaluation.<br/><br/>New Publications<br/><br/>(1)&nbsp;&nbsp;The LDC Standard Arabic Morphological Analyzer (SAMA) Version 3.1 was developed by researchers at LDC. SAMA 3.1 is based on, and up&#100;ates Tim Buckwalt&#101;r&#39;s Buckwalt&#101;r Arabic Morphological Analyzer (BAMA) 2.0 (LDC2004L02). Since this is the first public release of SAMA, it has been numbered continuously to reflect the continuity between this release and previous BAMA releases.&nbsp;&nbsp;SAMA 3.1 is a software tool for the morphological analysis of Standard Arabic. SAMA 3.1 considers each Arabic word token in all possible &#39;prefix-stem-suffix&#39; segmentations, and lists all known/possible annotation solutions, with assignment of all diacritic marks, morpheme boundaries (separating clitics and inflectional morphemes from stems), and all Part-of-Speech (POS) labels and glosses for each morpheme segment. The generated output may then be reviewed by users, and the most appropriate annotation sel&#101;cted from among several choices.<br/><br/>The software layer of SAMA 3.1 relies on a data layer that consists primarily of three Arabic-English lexicon files: prefixes (1328 entries), suffixes (945 entries), and stems (79318 entries representing 40654 lemmas). The lexicons are supplemented by three morphological compatibility tables used for controlling prefix-stem combinations (2497 entries), stem-suffix combinations (1632 entries), and prefix-suffix combinations (1180 entries). <br/><br/>The input format, output format, and data layer of SAMA 3.1 were designed to be backward compatible with BAMA. Incremental changes to the data layer in SAMA have resulted in: <br/><br/>increased lexicon coverage in the dictionary files <br/>important changes and additions to the inventory of POS tags <br/>more possible solutions generated for numerous word forms <br/>The software implementation has been up&#100;ated to allow more input/output options, installation and configuration options, and smoother incorporation in other Perl tools/services. The structure of the dictionary and morphotactic tables has remained the same (the tables provided with SAMA 3.1 differ from the BAMA 2.0 tables only in size and content, not in format). Logical separation between the software layer and data layer allows the new software tools to be used with previous versions of the tables (instructions are provided with software documentation).&nbsp;&nbsp;The basic logic that implements the segmentation and analysis look-up for Arabic words is essentially unchanged since BAMA 2.0. <br/><br/>The data layer is now accessed through Berkeley DB, with result-caching enabled by default, leading to improved performance. Various utility scripts have also been added to the software package to facilitate more flexible interaction with tools and data.<br/><br/>LDC Standard Arabic Morphological Analyzer (SAMA) Version 3.1 is distributed via web download. <br/><br/>2010 Subscription Members will automatically receive two copies of this corpus on disc, provided that they have submitted a completed copy of the User License Agreement for LDC Standard Arabic Morphological Analyzer (SAMA) Version 3.1 (LDC2010L01).&nbsp;&nbsp;2010 Standard Members may request a copy as part of their 16 free membership corpora. As a Members-Only release, LDC Standard Arabic Morphological Analyzer (SAMA) Version 3.1 is not available for non-member licensing.<br/><br/>(2)&nbsp;&nbsp;NIST 2004 Open Machine Translation (OpenMT) Evaluation is a package containing source data, reference translations, and scoring software used in the NIST 2004 OpenMT evaluation. It is designed to help evaluate the effectiveness of machine translation systems. The package was compiled and scoring software was developed by researchers at NIST, making use of newswire source data and reference translations collected and developed by LDC.<br/><br/>The objective of the NIST OpenMT evaluation series is to support research in, and help advance the state of the art of, machine translation (MT) technologies -- technologies that translate text between human languages. Input may include all forms of text. The goal is for the output to be an adequate and fluent translation of the o&#114;iginal.&nbsp;&nbsp;The 2004 task was to evaluate translation from Chinese to English and from Arabic to English. Additional information about these evaluations may be found at the NIST Open Machine Translation (OpenMT) Evaluation web site. <br/><br/>This evaluation kit includes a single perl script (mteval-v11a.pl) that may be used to produce a translation quality score for one (or more) MT systems. The script works by comparing the system output translation with a set of (expert) reference translations of the same source text. Comparison is based on finding sequences of words in the reference translations that match word sequences in the system output translation. <br/><br/>This corpus consists of 150 Arabic newswire documents, 150 Chinese newswire documents, and 29 Chinese &#34;prepared speech&#34; documents. For each language, the test set consists of two files: a source and a reference file. Each reference file contains four independent translations of the data set. The evaluation year, source language, test set, version of the data, and source vs. reference file are reflected in the file name. <br/><br/>NIST 2004 Open Machine Translation (OpenMT) Evaluation is distributed via web download. <br/><br/>2010 Subscription Members will automatically receive two copies of this corpus on disc.&nbsp;&nbsp;2010 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for US$150.<br/><br/>Ilya Ahtaridis<br/>Membership Coordinator<br/><br/>Linguistic Data Consortium&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Phone: (215) 573-1275<br/>University of Pennsylvania&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Fax: (215) 573-2175<br/>3600 Market St., Suite 810&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ldc@ldc.upenn.edu<br/>Philadelphia, PA 19104 USA&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="http://www.ldc.upenn.edu" target="_blank" rel="external">http://www.ldc.upenn.edu</a><br/><br/>]]></summary>
	  <link rel="alternate" type="text/html" href="http://www.luweixmu.com/home/article.asp?id=167" /> 
	  <id>http://www.luweixmu.com/home/default.asp?id=167</id>
  </entry>	
		
  <entry>
	  <title type="html"><![CDATA[New Publications of Linguistic Data Consortium]]></title>
	  <author>
		 <name>admin</name>
		 <uri>http://www.luweixmu.com/home/</uri>
		 <email>luwig@xmu.edu.cn</email>
	  </author>
	  <category term="" scheme="http://www.luweixmu.com/home/default.asp?cateID=8" label="语言学" /> 
	  <updated>2010-05-16T21:49:30+08:00</updated>
	  <published>2010-05-16T21:49:30+08:00</published>
		  <summary type="html"><![CDATA[In this newsletter:- Coming Soon: LDC Data Scholarship Program! -<br/>New publications:<br/>LDC2010S03- 2003 NIST Speaker Recognition Evaluation -<br/>LDC2010T09- ACE 2005 Mandarin SpatialML Annotations -<br/>LDC2010T10- NIST 2002 Open Machine Translation (OpenMT) Evaluation -<br/>--------------------------------------------------------------------------------<br/>Coming Soon: LDC Data Scholarship Program!<br/>We are pleased to announce that the LDC Data Scholarship program is in the works! This program will provide university students with access to LDC data at no-cost. Each year LDC distributes thousands of dollars worth of data at no- o&#114; reduced-cost to students who demonstrate a need for data, yet cannot secure funding.&nbsp;&nbsp;LDC will formalize this practice through the newly cr&#101;ated LDC Data Scholarship program. <br/>Data scholarships will be offered each semester beginning with the fall 2010 semester (September - December 2010). Students will need to complete an application, which will include a data use proposal and letter of support from their faculty adviser.&nbsp;&nbsp;We anticipate that the sel&#101;ction process will be highly competitive.<br/>Stay tuned for further announcements in our newsletter and on our home page!<br/><br/>New Publications<br/>(1) 2003 NIST Speaker Recognition Evaluation was developed by researchers at NIST (National Institute of Standards and Technology). It consists of just over 120 hours of English conversational telephone speech used as training data and test data in the 2003 Speaker Recognition Evaluation (SRE), along with evaluation metadata and test set answer keys.<br/>2003 NIST Speaker Recognition Evaluation is part of an ongoing series of yearly evaluations conducted by NIST. These evaluations provide an important contribution to the direction of research efforts and the calibration of technical capabilities. They are intended to be of interest to all researchers working on the general problem of text independent speaker recognition. To this end the evaluation was designed to be simple, to focus on core technology issues, to be fully supported, and to be accessible to those wishing to participate. <br/>This speaker recognition evaluation focused on the task of 1-speaker and 2-speaker detection, in the context of conversational telephone speech.&nbsp;&nbsp;The o&#114;iginal evaluation consisted of three parts: 1-speaker detection &#34;limited data&#34;, 2-speaker detection &#34;limited data&#34;, and 1-speaker detection &#34;extended data&#34;. This corpus contains training and test data and supporting metadata (including answer keys) for only the 1-speaker &#34;limited data&#34; and 2-speaker &#34;limited data&#34; components of the o&#114;iginal evaluation. The 1-speaker &#34;extended data&#34; component of the o&#114;iginal evaluation (not included in this corpus) provided metadata only, to be used in conjunction with data from Switchboard-2 Phase II (LDC99S79) and Switchboard-2 Phase III Audio (LDC2002S06). The metadata (resources and answer keys) for the 1-speaker &#34;extended data&#34; component of the o&#114;iginal 2003 SRE evaluation are available from the NIST Speech Group website for the 2003 Speaker Recognition Evaluation. <br/>The data in this corpus is a 120-hour subset of data first made available to the public as Switchboard Cellular Part 2 Audio (LDC2004S07), reorganized specifically for use in the 2003 NIST SRE.<br/>2003 NIST Speaker Recognition Evaluation is distributed on one DVD.<br/>2010 Subscription Members will automatically receive two copies of this corpus.&nbsp;&nbsp;2010 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for US$1000.<br/><br/>(2)&nbsp;&nbsp;ACE 2005 Mandarin SpatialML Annotations was developed by researchers at The MITRE Corporation (MITRE). ACE 2005 Mandarin SpatialML Annotations applies SpatialML tags to a subset of the source Mandarin training data in ACE 2005 Multilingual Training Corpus (LDC2006T06). <br/>SpatialML is a mark-up language for representing spatial e&#173;xpressions in natural language documents. SpatialML focuses is on geography and culturally-relevant landmarks, rather than biology, cosmology, geology, o&#114; other regions of the spatial language domain. The goal is to allow for better integration of text collections with resources such as databases that provide spatial information about a domain, including gazetteers, physical feature databases and mapping services. <br/>The SpatialML annotation scheme is intended to emulate earlier progress on time e&#173;xpressions such as TIMEX2, TimeML, and the 2005 ACE guidelines. The main SpatialML tag is the PLACE tag which encodes information about location. The central goal of SpatialML is to map location information in text to data from gazetteers and other databases to the extent possible by defining attributes in the PLACE tag. Therefore, semantic attributes such as country abbreviations, country subdivision and dependent area abbreviations (e.g., US states), and geo-coordinates are used to help establish such a mapping. The SpatialML guidelines are compatible with existing guidelines for spatial annotation and existing corpora within the ACE research program. <br/>This corpus consists of a 298-document subset of broadcast material from the ACE 2005 Multilingual Training Corpus (LDC2006T06) that has been tagged by a native Mandarin speaker according to version 2.3 of the SpatialML annotation guidelines, which are included in the documentation for this release. <br/>ACE 2005 Mandarin SpatialML Annotations&nbsp;&nbsp;is distributed via web download.<br/>2010 Subscription Members will automatically receive two copies of this corpus on disc.&nbsp;&nbsp;2010 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for US$500.<br/><br/>(3)&nbsp;&nbsp;NIST 2002 Open Machine Translation (OpenMT) Evaluation is a package containing source data, reference translations, and scoring software used in the NIST 2002 OpenMT evaluation. It is designed to help evaluate the effectiveness of machine translation systems. The package was compiled and scoring software was developed by researchers at NIST, making use of newswire source data and reference translations collected and developed by LDC. <br/>The objective of the NIST OpenMT evaluation series is to support research in, and help advance the state of the art of, machine translation (MT) technologies -- technologies that translate text between human languages. Input may include all forms of text. The goal is for the output to be an adequate and fluent translation of the o&#114;iginal. Additional information about these evaluations may be found at the NIST Open Machine Translation (OpenMT) Evaluation web site. <br/>This evaluation kit includes a single perl script that may be used to produce a translation quality score for one (or more) MT systems. The script works by comparing the system output translation with a set of (expert) reference translations of the same source text. Comparison is based on finding sequences of words in the reference translations that match word sequences in the system output translation.<br/>The Chinese-language source text included in this corpus is a reorganization of data that was initially released to the public as Multiple-Translation Chinese (MTC) Part 2 (LDC2003T17). The Chinese-language reference translations are a reorganized subset of data from the same MTC corpus. The Arabic-language data (source text and reference translations) is a reorganized subset of data that was initially released to the public as Multiple-Translation Arabic (MTA) Part 1 (LDC2003T18). All source data for this corpus is newswire text. <br/>For each language, the test set consists of two files, a source and a reference file. Each reference file contains four independent translations of the data set. The evaluation year, source language, test set, version of the data, and source vs. reference file are reflected in the file name.<br/>NIST 2002 Open Machine Translation (OpenMT) Evaluation is distributed via web download.&nbsp;&nbsp;<br/>2010 Subscription Members will automatically receive two copies of this corpus on disc.&nbsp;&nbsp;2010 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for US$150.<br/>--------------------------------------------------------------------------------<br/>Ilya Ahtaridis<br/>Membership Coordinator<br/>--------------------------------------------------------------------<br/>Linguistic Data Consortium&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Phone: (215) 573-1275<br/>University of Pennsylvania&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Fax: (215) 573-2175<br/>3600 Market St., Suite 810&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ldc@ldc.upenn.edu<br/>Philadelphia, PA 19104 USA&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="http://www.ldc.upenn.edu" target="_blank" rel="external">http://www.ldc.upenn.edu</a>]]></summary>
	  <link rel="alternate" type="text/html" href="http://www.luweixmu.com/home/article.asp?id=166" /> 
	  <id>http://www.luweixmu.com/home/default.asp?id=166</id>
  </entry>	
		
  <entry>
	  <title type="html"><![CDATA[新书推荐：基于语料库的英语语言学语体分析]]></title>
	  <author>
		 <name>admin</name>
		 <uri>http://www.luweixmu.com/home/</uri>
		 <email>luwig@xmu.edu.cn</email>
	  </author>
	  <category term="" scheme="http://www.luweixmu.com/home/default.asp?cateID=8" label="语言学" /> 
	  <updated>2010-04-03T21:29:42+08:00</updated>
	  <published>2010-04-03T21:29:42+08:00</published>
		  <summary type="html"><![CDATA[基于语料库的英语语言学语体分析<br/>作者：桂诗春<br/>出版社：外语教学与研究出版社<br/>出版日期：2009-12<br/>ISBN：978-7-5600-9175-4<br/>页数：123<br/>开本：16开<br/>版次：一版一次<br/>装祯方式：平装<br/>内容提要：<br/>&nbsp;&nbsp;&nbsp;&nbsp;著名应用语言学家桂诗春教授集多年语言学教学与研究之经验，在本书中以“使用语料库和语料库过程来发现那些帮助我们弄懂语言怎样建立语篇的方式的语言型式”为主要目的，深入地分析了面向语言学和应用语言学教学的“英语语言学语料库”。<br/>&nbsp;&nbsp;&nbsp;&nbsp;研究通过100万词的语言学样本归纳出其语言特征，供我国英语教师和研究生阅读语言学著作和撰写语言学论文作参考。并附有光盘，包含词频排列表、词频分布表、超用词和少用词表等。<br/>&nbsp;&nbsp;&nbsp;&nbsp;本书是语言学及应用语言学专业的一线教师、研究生、语料库研究者、语篇研究者等不可多得的参考资料。 <br/>作者简介：<br/>&nbsp;&nbsp;&nbsp;&nbsp;桂诗春，广东外语外贸大学外国语言学与应用语言学研究中心教授、博士生导师，扭任中国外语教学研究会副会长、广东外语学会会长等职务。历任国务院学位委员会第二、三届学科评议组成员，广东省第五、六、七届人民代表大会代表，中山大学和广州外国语学院系副主任、系主任、院长等职。 主要研究方向为应用语言学、心理语言学、语料库语言学、语言测试等，著有《心理语言学》、《标准化考试——理论、原則与方法》、《应用语言学》、《应用语言学与中国英语教学》、《实验心理语音学纲要》、《中国学生英语学习心理》、《语言学方法论》(合著)、《新编心理语言学》、《应用语言学研究》、《中国学习者英语语料库》(合著)、《基于CLEC语料库的中国学习者英语分析》(合编)等。 <br/>目录：<br/>前言<br/>1 研究背景<br/>2 研究目的与方法<br/>2.1 基本统计分析<br/>2.2 多特征／多维度方法<br/>2.3 关键性方法<br/>3 基本统计分析<br/>3.1 词频分析<br/>3.2 语料库的词汇密度<br/>3.3 平均词长<br/>3.4 覆盖面<br/>3.5 罕用词<br/>3.6 句子长度<br/>3.7 词类<br/>3.8 小结<br/>4 语法特征分析<br/>4.1 语法特征<br/>4.2 因子分析<br/>4.3 词汇语法<br/>4.3.1 名词化<br/>4.3.2 名词<br/>4.3.3 现在时<br/>4.3.4 被动式<br/>4.3.5 过去分词省略Wh-式<br/>4.3.6 介词<br/>4.3.7 连接式<br/>4.3.8 修饰方式<br/>4.3.9 分裂辅助词<br/>4.3.10 无人称<br/>4.3.11 情态词<br/>4.4 小结<br/>5 词汇特征分析<br/>5.1超用词的特点<br/>5.1.1 名词居多<br/>5.1.2 功能词的使用<br/>5.1.3 凸显研究焦点<br/>5.2 语族<br/>5.2.1 语族的分布<br/>5.2.2 常用的语族<br/>5.3 搭配分析<br/>5.3.1 两条原则<br/>5.3.2 几个例子<br/>5.4 小结<br/>5.5 语言学语体的词汇特征<br/>5.5.1 定义性语言<br/>5.5.2 分类性语言<br/>5.5.3 分析性语言<br/>5.5.4 修饰性语言<br/>5.5.5 词汇包<br/>5.6 专用词汇表的特点<br/>5.6.1 名词和派生形容词<br/>5.6.2 功能词<br/>5.6.3 专有名词<br/>5.6.4 通用性词汇<br/>5.7 少用词<br/>5.7.1 人称代词<br/>5.7.2 动词过去时态<br/>5.7.3 缩约语<br/>5.7.4 和时间、年、月有关的词语<br/>5.7.5 和个人生活有关的词语<br/>5.7.6 和社会生活有关的词语<br/>5.8 小结<br/>6 研究结论与教学应用<br/>6.1 研究结论<br/>6.2 在教学中的应用<br/>参考文献]]></summary>
	  <link rel="alternate" type="text/html" href="http://www.luweixmu.com/home/article.asp?id=162" /> 
	  <id>http://www.luweixmu.com/home/default.asp?id=162</id>
  </entry>	
		
  <entry>
	  <title type="html"><![CDATA[新书推荐：英语学习者口语交际能力研究——语料库语言学视角]]></title>
	  <author>
		 <name>admin</name>
		 <uri>http://www.luweixmu.com/home/</uri>
		 <email>luwig@xmu.edu.cn</email>
	  </author>
	  <category term="" scheme="http://www.luweixmu.com/home/default.asp?cateID=8" label="语言学" /> 
	  <updated>2010-04-03T21:26:25+08:00</updated>
	  <published>2010-04-03T21:26:25+08:00</published>
		  <summary type="html"><![CDATA[英语学习者口语交际能力研究——语料库语言学视角<br/>作者：甄凤超<br/>出版社：上海交通大学出版社<br/>出版日期：2009-10<br/>ISBN：978-7-313-05575-0<br/>页数：270<br/>开本：16开<br/>版次：一版一次<br/>装祯方式：平装<br/>内容提要：<br/>&nbsp;&nbsp;&nbsp;&nbsp;本书从语料库语言学角度，着重探讨了中国学习者英语口语交际能力的四个重要成分：使用预构成语块的能力、通过目的语运用图式知识的能力、语用能力和策略能力。基于“大学学习者英语口语语料库”，本书从上述四个纬度对比分析了不同口语交际能力水平的学习者之间及与英语母语者之间的差异，发现中国学习者英语口语的特征和问题。书中所提出的观点、概念、方法以及研究发现为学习者英语研究、二语习得研究、外语教学研究提供了有价值的信息。 <br/>目录：<br/>Chapter 1 Introduction<br/>1.1 Motivations for the book<br/>1.2 Research objectives<br/>1.3 Outline of the book<br/>Chapter 2 Critical Discussion of the Previous Studies on Learner Language<br/>2.1 Changes in current linguistic and SLA research<br/>2.2 Main features of CLC data<br/>2.3 Methodological approaches to CLC research<br/>2.4 Studies based on learner corpora<br/>2.5 Conversation analysis<br/>2.6 Communicative competence<br/>2.7 Conclusion<br/>Chapter 3 Defining o&#114;al Communicative Ability<br/>3.1 Features of o&#114;al communication<br/>3.2 Accuracy, fluency and appropriacy<br/>3.3 Four essential components<br/>3.4 Conclusion<br/>Chapter 4 Research Methodology<br/>4.1 The COLSEC<br/>4.2 The data in the book<br/>4.3 Methods for data extraction and processing<br/>4.4 Summary<br/>Chapter 5 Findings and Analysis (1): Productive Vocabulary Size and the Ability to Use PCs<br/>5.1 Productive vocabulary size<br/>5.2 The ability to use PCs<br/>5.3 Summary<br/>Chapter 6 Findings and Analysis (2): the Ability to Manipulate Schemata<br/>6.1 Introduction<br/>6.2 Key words for the Use of Computers<br/>6.3 Key words for Going on Tour<br/>6.4 Discussion and conclusion<br/>Chapter 7 Findings and Analysis (3): Pragmatic Competence and Strategic Competence<br/>7.1 Pragmatic markers<br/>7.2 Interruptions and overlaps<br/>7.3 Strategies for taking, holding and yielding turns<br/>7.4 Reactive tokens<br/>7.5 Summary<br/>Chapter 8 Conclusion<br/>8.1 Summary of findings<br/>8.2 Implications for pedagogy<br/>8.3 Limitations of the book<br/>8.4 Directions for further researches<br/>Appendices<br/>Appendix A An error classification system<br/>Appendix B Error types in three groups<br/>Appendix C Collocates of IMPROVE in the position of n in the pattern V n from the SBNC<br/>Appendix D Collocates of KNOWLEDGE in the pattern v N from the SBNC]]></summary>
	  <link rel="alternate" type="text/html" href="http://www.luweixmu.com/home/article.asp?id=161" /> 
	  <id>http://www.luweixmu.com/home/default.asp?id=161</id>
  </entry>	
		
  <entry>
	  <title type="html"><![CDATA[新书推荐：系统与语料：二者关联探索]]></title>
	  <author>
		 <name>admin</name>
		 <uri>http://www.luweixmu.com/home/</uri>
		 <email>luwig@xmu.edu.cn</email>
	  </author>
	  <category term="" scheme="http://www.luweixmu.com/home/default.asp?cateID=8" label="语言学" /> 
	  <updated>2010-04-03T21:23:07+08:00</updated>
	  <published>2010-04-03T21:23:07+08:00</published>
		  <summary type="html"><![CDATA[系统与语料：二者关联探索（英文影印版）<br/>原书名： System And Corpus: Exploring Connections <br/>原出版社： David Brown Book Company <br/>作者： (英)Geoff Thompson,&nbsp;&nbsp;Susan Hunston&nbsp;&nbsp;<br/>丛书名： 西方语言学前沿书系 <br/>出版社：世界图书出版公司 <br/>ISBN：9787510014185 <br/>定价 ： ￥45.00<br/>出版日期：2010 年1月 <br/>开本：24 <br/>页码：326 <br/>版次：1-1<br/>&nbsp;&nbsp;&nbsp;&nbsp;本书所收论文从不同角度探讨了系统功能语言学与语料库语言学的协同关系，从中我们可以看到当前国外语言学研究中对这个重要问题的思考。论文来自英国、德国、意大利、澳大利亚、奥地利、日本等国从事系统功能语言学和语料库语言学研究的专家，系统功能语言学的代表人物M．A．K．Hamday为全书写了后记。<br/>&nbsp;&nbsp;&nbsp;&nbsp;本书适合从事系统功能语言学和语料库语言学研究，以及对二者感兴趣的相关人士阅读。 <br/>本书提供作译者介绍<br/>&nbsp;&nbsp;&nbsp;&nbsp;Geoff Thompson任教于英国利物浦大学，曾在中国等多个国家教授英语及应用语言学课程。他的主要著作有Introducing Functional Grammar(《功能语言学入门》)等。<br/>&nbsp;&nbsp;&nbsp;&nbsp;Susan Hunston是英国伯明翰大学的教授，具有丰富的语言教学和应用语言学研究经验，是语料库分析领域公认的专家。她的主要著作有Corpora in Applied Linguistics(《应用语言学中的语料库》)等。<br/>目录<br/>《语料库与计算语言学研究丛书》序 <br/>《系统与语料——二者关联探索》导读 <br/>原书目录 <br/>前言 <br/>供稿者名录 <br/>1 导论 <br/>系统与语料库：建立在共同基础上的两个传统 <br/>2 语料分析：学科现状与未回答问题的三个类型 <br/>3 作为选择的语言：什么是选择? <br/>4 熟语选词与系统：对于争论的一个贡献 <br/>5 在语料中注入系统：论语料库与系统功能语言学的关系 <br/>6 某些基本语法系统的频度侧面：中期研究报告 <br/>7 语料数据与系统网络的“匹配”：使用语料库修改并扩充英语的“及物性”网络 <br/>8 多模态语料库语言学 <br/>9 在系统功能语言学中怎样处理词汇语义：基于语料库的范围形容词意图研究 <br/>10 日语中“痛苦”的表达方法 <br/>11 基于语料库的泰语连接词：研究：文本资源的利用 <br/>12 从索引行到文本：在小型语料库Alma Mater关于赠送的提问中对于giving的评价 <br/>13 旅游网页中与文化相关的语言差异：激情与事实，在Appraisal框架内的语料分析 <br/>14 后记 <br/>参考文献 <br/>索引 ]]></summary>
	  <link rel="alternate" type="text/html" href="http://www.luweixmu.com/home/article.asp?id=160" /> 
	  <id>http://www.luweixmu.com/home/default.asp?id=160</id>
  </entry>	
		
  <entry>
	  <title type="html"><![CDATA[New Publications of Linguistic Data Consortium]]></title>
	  <author>
		 <name>admin</name>
		 <uri>http://www.luweixmu.com/home/</uri>
		 <email>luwig@xmu.edu.cn</email>
	  </author>
	  <category term="" scheme="http://www.luweixmu.com/home/default.asp?cateID=8" label="语言学" /> 
	  <updated>2010-01-26T15:28:41+08:00</updated>
	  <published>2010-01-26T15:28:41+08:00</published>
		  <summary type="html"><![CDATA[In this newsletter:<br/>- Newly Expanded Press Release Section - <br/>- Upcoming LDC Institute Seminar -<br/>New Publications:<br/>LDC2010T02<br/>- Czech Broadcast News MDE Transcripts -<br/>LDC2010T03<br/>- GALE Phase 1 Chinese Newsgroup Parallel Text - Part 2 -<br/>LDC2010T01<br/>- NIST Open Machine Translation 2008 Evaluation (MT08) Sel&#101;cted Reference and System Translations -<br/>--------------------------------------------------------------------------------<br/>Newly Expanded Press Release Section<br/><br/>Recall reading a newsletter article about the Reduced Licensing Fee but unsure what you did with the email?&nbsp;&nbsp; Curious as to which o&#114;ganization was the recipient of LDC&#39;s 15,000th corpus distribution nearly eight years ago?&nbsp;&nbsp;If so, be sure to visit LDC&#39;s newly expanded Press Release section on our What&#39;s New! What&#39;s Free! page to read about these topics and more.&nbsp;&nbsp;The Press Release section includes the articles of previous newsletters as well as major announcements from LDC.&nbsp;&nbsp;Information is o&#114;ganized into the following categories:<br/><br/>15th Anniversary Monthly Spotlight Archive - as part of our 15th Anniversary celebration in 2007, we highlighted one aspect of the LDC in our monthly newsletters. These features provided our members and data users with a glimpse of the broad range of LDC’s research activities. <br/><br/>Conference Attendance by LDC - recent publisher displays and conference participation by LDC.<br/><br/>Etc. - recent collaborations and grant awards plus other announcements.<br/><br/>Membership Mailbag Archive - to address the questions that our data users have asked, we introduced our Membership Mailbag series of newsletter articles in May 2008. This periodic series addresses frequently asked questions about LDC data, the LDC Intranet, and the benefits of an LDC membership.<br/><br/>Member Surveys - LDC conducted two end-of-year surveys to obtain feedback on satisfaction levels with LDC Membership and data releases as well as our corpus catalog, and to gather suggestions on future publications.<br/><br/>Milestones and Celebrations - information on our landmark corpora distributions and events to celebrate our 10th and 15th anniversary years.<br/><br/>Use of LDC Corpora in University Summer Schools - ways LDC corpora have been used for teaching purposes at university summer school programs.<br/><br/>The Press Release section will be up&#100;ated as new announcements are made so we anticipate that this will be a great resource for information about LDC.<br/><br/>- Upcoming LDC Institute Seminar -<br/><br/>The LDC Institute will hold its next session on&nbsp;&nbsp;Tuesday, January 26, 2010, from 10:00 a.m. to 12:00 p.m.&nbsp;&nbsp;in the LDC Conference Room at LDC&#39;s Philadelphia offices, 3600 Market Street, Suite 810. <br/><br/>The topic of this session will be the U.S. Supreme Court Corpus (SCOTUS) presented by Daniel Katz, J.D., M.P.P., Fellow in Empirical Legal Studies, Michigan Law School, PhD Candidate, Political Science and Public Policy, University of Michigan, and Michael Bommarito, PhD Student, Political Science: Methods &amp; Modeling, University of Michigan. <br/><br/>ABSTRACT: <br/>The corpus of Supreme Court written opinions is a rich linguistic resource. Not only does this corpus provide a longitudinal sample of formal American English, but it is also a source of text with identified authors and vote-coded sentiment. Despite this value and years of qualitative and quantitative material of the United States Supreme Court, no compiled corpus of these opinions is currently available to researchers. The purpose of this talk is (1) to describe efforts to compile both the complete corpus of Supreme Court Opinions and associated metadata, (2) to outline a number of our current research projects utilizing this data, and (3) to discuss any criticism, potential projects, o&#114; possible collaboration. <br/><br/>Refreshments will be provided. If you are in the area, we hope to see you there! <br/><br/>New Publications<br/><br/>(1)Czech Broadcast News MDE Transcripts was prepared by researchers at the University of West Bohemia, Pilsen, Czech Republic. It consists of metadata extraction (MDE) annotations for the approximately 26 hours of transcribed broadcast news speech in Czech Broadcast News Transcripts (LDC2004T01). The audio files corresponding to the transcripts in this corpus are contained in Czech Broadcast News Speech (LDC2004S01). Czech Broadcast News MDE Transcripts joins LDC&#39;s other holdings of Czech broadcast data: Czech Broadcast Conversation Speech (LDC2009S02), Czech Broadcast Conversation MDE Transcripts (LDC2009T20), Voice of America (VOA) Czech Broadcast News Audio (LDC2000S89) and Voice of America (VOA) Czech Broadcast News Transcripts (LDC2000T53). <br/><br/>The audio recordings were collected from February 1, 2000 through April 22, 2000 from three Czech radio stations and two television stations. The broadcasts included both public and commercial subjects and were presented in various styles, ranging from a formal style to a colloquial style more typical for commercial broadcast companies that do not primarily focus on news. <br/><br/>The goal of MDE research is to take raw speech recognition output and refine it into forms that are of more use to humans and to downstream automatic processes. In simple terms, this means the creation of automatic transcripts that are maximally readable. This readability might be achieved in a number of ways: removing non-content words like filled pauses and discourse markers from the text; removing sections of disfluent speech; and creating boundaries between natural breakpoints in the flow of speech so that each sentence o&#114; other meaningful unit of speech might be presented on a separate line within the resulting transcript. Natural capitalization, punctuation, standardized spelling and sensible conventions for representing speaker turns and identity are further elements in the readable transcript. <br/><br/>The transcripts and annotations in this corpus are stored in two formats: QAn (Quick Annotator), and RTTM. Character encoding in all files is ISO-8859-2.<br/><br/>Czech Broadcast News MDE Transcripts is distributed via web download.<br/><br/>2010 Subscription Members will automatically receive two copies of this corpus on disc. 2010 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for US$750.<br/><br/>(2) GALE Phase 1 Chinese Newsgroup Parallel Text - Part 2 was prepared by LDC and contains 223,000 characters (98 files) of Chinese newsgroup text and its translation sel&#101;cted from twenty-one sources. Newsgroups consist of posts to electronic bulletin boards, Usenet newsgroups, discussion groups and similar forums. This release was used as training data in Phase 1 (year 1) of the DARPA-funded GALE program. <br/><br/>Preparing the source data involved four stages of work: data scouting, data harvesting, formating and data sel&#101;ction.<br/><br/>Data scouting involved manually searching the web for suitable newsgroup text. Data scouts were assigned particular topics and genres along with a production target in o&#114;der to focus their web search. Formal annotation guidelines and a customized annotation toolkit helped data scouts to manage the search process and to track progress. <br/><br/>Data scouts logged their decisions about potential text of interest to a database. A nightly process queried the annotation database and harvested all designated URLs. Whenever possible, the entire site was downloaded, not just the individual thread o&#114; post located by the data scout. Once the text was downloaded, its format was standardized so that the data could be more easily integrated into downstream annotation processes. Typically, a new script was required for each new domain name that was identified. After scripts were run, an optional manual process corrected any remaining formatting problems.<br/><br/>The sel&#101;cted documents were then reviewed for content-suitability using a semi-automatic process. A statistical approach was used to rank a document&#39;s relevance to a set of already-sel&#101;cted documents labeled as &#34;good.&#34; An annotator then reviewed the list of relevance-ranked documents and sel&#101;cted those which were suitable for a particular annotation task o&#114; for annotation in general. These newly-judged documents in turn provided additional input for the generation of new ranked lists. <br/><br/>Manual sentence units/segments (SU) annotation was also performed as part of the transcription task. Three types of end of sentence SU were identified: statement SU, question SU, and incomplete SU. After transcription and SU annotation, files were reformatted into a human-readable translation format and assigned to professional translators for careful translation. Translators followed LDC&#39;s GALE Translation guidelines which describe the makeup of the translation team, the source data format, the translation data format, best practices for translating certain linguistic features and quality control procedures applied to completed translations. <br/><br/>GALE Phase 1 Chinese Newsgroup Parallel Text - Part 2 is distributed via web download.<br/><br/>2010 Subscription Members will automatically receive two copies of this corpus on disc. 2010 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for US$1500.<br/><br/>(3) NIST Open Machine Translation 2008 Evaluation (MT08) Sel&#101;cted Reference and System Translations.&nbsp;&nbsp;NIST Open MT is an evaluation series to support research in, and help advance the state of the art of, technologies that translate text between human languages. Participants submit machine translation output of source language data to NIST (National Institute of Standards and Technology); the output is then evaluated with automatic and manual measures of quality against high quality human translations of the same source data. This program supports the growing interest in system combination approaches that generate improved translations from output of several different machine translation (MT) systems. MT system combination approaches require data sets composed of high-quality human reference translations and a variety of machine translations of the same text. The NIST Open Machine Translation 2008 Evaluation (MT08) Sel&#101;cted Reference and System Translations set addresses this need. <br/><br/>The data in this release consists of the human reference translations and corresponding machine translations for the NIST Open MT08 test sets, which consist of newswire and web data in the four MT08 language pairs:&nbsp;&nbsp;Arabic-to-English, Chinese-to-English, English-to-Chinese (newswire only) and Urdu-to-English. Two documents per language pair and genre were removed at random from the test sets for release. For the machine translations, only output from one submission per training condition (Constrained and Unconstrained training, wh&#101;re available) per participant is included. See section 2 of the MT08 Evaluation Plan for a description of the training conditions. The resulting data set has the following characteristics: <br/><br/>Arabic-to-English: 120 documents with 1312 segments, output from 17 machine translation systems. <br/>Chinese-to-English: 105 documents with 1312 segments, output from 23 machine translation systems. <br/>English-to-Chinese: 127 documents with 1830 segments, output from 11 machine translation systems. <br/>Urdu-to-English: 128 documents with 1794 segments, output from 12 machine translation systems. <br/>The data is o&#114;ganized and annotated in such a way that subsets for each language pair and/or data genre and/or training condition can be extracted and used separately, depending on the user&#39;s needs.<br/><br/>NIST Open Machine Translation 2008 Evaluation (MT08) Sel&#101;cted Reference and System Translations is distributed via web download.<br/><br/>2010 Subscription Members will automatically receive two copies of this corpus on disc. 2010 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for US$200.<br/>-------------------------------------------------------------------------------<br/>Ilya Ahtaridis<br/>Membership Coordinator<br/>--------------------------------------------------------------------<br/>Linguistic Data Consortium&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Phone: (215) 573-1275<br/>University of Pennsylvania&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Fax: (215) 573-2175<br/>3600 Market St., Suite 810&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ldc@ldc.upenn.edu<br/> Philadelphia, PA 19104 USA&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <a href="http://www.ldc.upenn.edu" target="_blank" rel="external">http://www.ldc.upenn.edu</a>]]></summary>
	  <link rel="alternate" type="text/html" href="http://www.luweixmu.com/home/article.asp?id=158" /> 
	  <id>http://www.luweixmu.com/home/default.asp?id=158</id>
  </entry>	
		
  <entry>
	  <title type="html"><![CDATA[New Publications of Linguistic Data Consortium]]></title>
	  <author>
		 <name>admin</name>
		 <uri>http://www.luweixmu.com/home/</uri>
		 <email>luwig@xmu.edu.cn</email>
	  </author>
	  <category term="" scheme="http://www.luweixmu.com/home/default.asp?cateID=8" label="语言学" /> 
	  <updated>2009-12-21T21:56:40+08:00</updated>
	  <published>2009-12-21T21:56:40+08:00</published>
		  <summary type="html"><![CDATA[-&nbsp;&nbsp;LDC and Oxford University Receive Digging into Data Challenge Grant&nbsp;&nbsp;-<br/>-&nbsp;&nbsp;LDC to Close for Winter Break&nbsp;&nbsp;-<br/>-&nbsp;&nbsp;Planned Maintenance - Jan 10, 2010&nbsp;&nbsp;-<br/>New Publications:<br/>LDC2009T29<br/>-&nbsp;&nbsp;ACL Anthology Reference Corpus&nbsp;&nbsp;-<br/><br/>LDC2009T30<br/>-&nbsp;&nbsp;Arabic Gigaword Fourth Edition&nbsp;&nbsp;-<br/>--------------------------------------------------------------------------------<br/>LDC and Oxford University Receive Digging into Data Challenge Grant <br/>LDC and its research team partner Oxford University are one of eight international research teams to have been awarded the first Digging into Data Challenge grants for projects that promote innovative humanities and social science research using large-scale data analysis. Four leading research agencies sponsor the international competition: The Joint Information Systems Committee (JISC) from the United Kingdom, the National Endowment for the Humanities and the National Science Foundation (NSF) from the United States and the Social Sciences and Humanities Research Council from Canada. <br/><br/>LDC and Oxford University (with the participation of the The British Library) have been funded by NSF and JISC, respectively, for a project entitled “Mining a Year of Speech,” which will focus on creating tools to enable rapid and flexible access to more than 9,000 hours of spoken audio files. Those files contain a wide variety of speech drawn from some of the leading British and American spoken word corpora, allowing for news kinds of linguistic analysis. <br/><br/>Further information about the Digging into Data Challenge can be found on the project website.<br/><br/>LDC to Close for Winter Break<br/><br/>LDC will be closed from Friday, December 25, 2009 through Friday, January 1, 2010 in accordance with the University of Pennsylvania Winter Break Policy.&nbsp;&nbsp;Our offices will reopen on Monday, January 4, 2010.&nbsp;&nbsp;Requests received for membership renewals and corpora will be processed at that time.<br/><br/>Best wishes for a happy and safe holiday season!<br/><br/>Planned Maintenance - Jan 10, 2010<br/><br/>Please take note: <br/><br/>As a result of planned electrical maintenance, LDC&#39;s website, including the Intranet and catalog, will not be accessible on Sunday, January 10, 2010 from 12 AM EST to approximately 4 AM EST.&nbsp;&nbsp;We apologize for any inconvenience this will cause.<br/><br/>New Publications<br/><br/>(1)&nbsp;&nbsp;ACL Anthology Reference Corpus is a digital archive of 10,291 research papers in computational linguistics sponsored by the Association for Computational Linguistics (ACL). Also available from the ACL, this release contains most of the papers that appear up to February 2007 in the web-based ACL Anthology, a dynamic repository that currently hosts over 16,500 articles drawn from a range of conferences and workshops as well as past issues of the Computational Linguistics journal. The ACL Anthology Reference Corpus is designed to be a standard, real-world digital collection testbed for experiments in bibliographic and bibliometric research. <br/><br/>The ACL is the international scientific and professional society for scholars working on problems involving natural language and computation. Membership includes the ACL quarterly journal, Computational Linguistics, reduced registration at most ACL-sponsored conferences, discounts on ACL-sponsored publications and participation in ACL Special Interest Groups. Since 1988, Computational Linguistics has been the primary forum for research on computational linguistics and natural language processing. <br/><br/>The material in the ACL Anthology Reference Corpus was scanned at 600dpi grayscale for archival storage, down-sampled to 300dpi black-and-white, assembled into articles and stored in the PDF Image with Hidden Text format. Author and title metadata was extracted from the OCRed text and used to build HTML index pages. Older materials, such as conference proceedings from the 1960s and early volumes of Computational Linguistics, were manually digitized from microfiche slides. <br/><br/>ACL Reference Anthology includes: <br/><br/>10,921 PDF files in the pdf/anthology-PDF tree. <br/>13,551 files with metadata described in the metadata/anthology-XML tree <br/>84,542 pages in the PDF files <br/>ACL Anthology Reference Corpus is distributed on four DVD-ROM.<br/><br/>2009 Subscription Members will automatically receive two copies of this corpus.&nbsp;&nbsp;2009 Standard Members may request a copy as part of their 16 free membership corpora.&nbsp;&nbsp;Non-members may license this data for US$75.&nbsp;&nbsp;ACL Anthology Reference Corpus is made available for research-only use under the Creative Commons Attribution-Noncommercial Share Alike 3.0 license.<br/><br/>(2)&nbsp;&nbsp;Arabic Gigaword Fourth Edition is a comprehensive archive of Arabic newswire text that has been acquired over several years at LDC. Arabic Gigaword Fourth Edition includes all of the content of Arabic Gigaword Third Edition (LDC2007T40) as well as newly-collected data. In addition, three new sources have been added in the fourth edition: Al-Ahram, Asharq Al-Awsat and Al-Quds Al-Arabi. <br/><br/>Nine distinct international sources of Arabic newswire are represented here:<br/><br/>Al-Ahram (ahr_arb) <br/>Asharq Al-Awsat (aaw_arb) <br/>Agence France Presse (afp_arb) <br/>Assabah (asb_arb) <br/>Al Hayat (hyt_arb) <br/>An Nahar (nhr_arb) <br/>Al-Quds Al-Arabi (qds_arb) <br/>Ummah Press (umh_arb) <br/>Xinhua News Agency (xin_arb) <br/>The seven-character codes shown above represent both the directory names wh&#101;re the data files are found and the 7-letter prefix that appears at the beginning of every file name. The 7-letter codes consist of the three-character source name IDs and the three-character language code (&#34;arb&#34;) separated by an underscore (&#34;_&#34;) character.<br/><br/>These news services all use Modern Standard Arabic (MSA), so there should be a fairly limited scope for o&#114;thographic and lexical variation due to regional Arabic dialects. <br/><br/>New in the Fourth Edition<br/><br/>New Sources <br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;This release marks the first edition of Arabic Gigaword to include content from Al-Ahram, Asharq Al-Awsat and Al-Quds Al-Arabi covering the period from November 2006 through December 2008.&nbsp;&nbsp;<br/><br/>New Data for Existing Sources <br/>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;This release contains all data collected by LDC from January 2007 through December 2008, except for Ummah Press for which data from January 2005 through December 2008 is included. <br/><br/>The table below shows data quantity by source under the following categories: data source (Source); the number of files per source (#Files); compressed file size (Gzip-MB); uncompressed file size (Totl-MB); the number of space-separated words tokens in the text (K-words); and the number of documents per source (#DOCs).<br/><br/>Arabic Gigaword Fourth Edition is distributed on one DVD-ROM.<br/><br/>2009 Subscription Members will automatically receive two copies of this corpus.&nbsp;&nbsp;2009 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for US$5000.<br/>--------------------------------------------------------------------------------<br/>Ilya Ahtaridis<br/>Membership Coordinator<br/>--------------------------------------------------------------------<br/>Linguistic Data Consortium&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Phone: (215) 573-1275<br/>University of Pennsylvania&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Fax: (215) 573-2175<br/>3600 Market St., Suite 810&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ldc@ldc.upenn.edu<br/> Philadelphia, PA 19104 USA&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <a href="http://www.ldc.upenn.edu" target="_blank" rel="external">http://www.ldc.upenn.edu</a>]]></summary>
	  <link rel="alternate" type="text/html" href="http://www.luweixmu.com/home/article.asp?id=157" /> 
	  <id>http://www.luweixmu.com/home/default.asp?id=157</id>
  </entry>	
		
  <entry>
	  <title type="html"><![CDATA[New Publications of Linguistic Data Consortium]]></title>
	  <author>
		 <name>admin</name>
		 <uri>http://www.luweixmu.com/home/</uri>
		 <email>luwig@xmu.edu.cn</email>
	  </author>
	  <category term="" scheme="http://www.luweixmu.com/home/default.asp?cateID=8" label="语言学" /> 
	  <updated>2009-11-21T23:56:49+08:00</updated>
	  <published>2009-11-21T23:56:49+08:00</published>
		  <summary type="html"><![CDATA[- LDC Incentives: Early Renewal Discounts for Membership Year (MY) 2010 -<br/>- LDC at NWAV 38 -<br/>New Publications:<br/>- 2007 NIST Language Recognition Evaluation Supplemental Training Set -<br/>- French Gigaword Second Edition -<br/>- NXT Switchboard Annotations -<br/>--------------------------------------------------------------------------------<br/>LDC Incentives:&nbsp;&nbsp;Early Renewal Discounts for Membership Year (MY) 2010<br/><br/>LDC appreciates the important contribution LDC members make through their continued support of the consortium.&nbsp;&nbsp;We would like to invite all current and previous members of LDC to renew for Membership Year (MY) 2010.&nbsp;&nbsp;For MY2010, LDC is pleased to maintain membership fees at last year’s rates – membership fees will not increase.&nbsp;&nbsp;Additionally, in last month&#39;s newsletter, we announced an LDC Incentives Package which will include a host of incentives to help lower the cost of LDC membership and data licensing fees.&nbsp;&nbsp;As part of this package, LDC will extend discounts to members who keep their membership current and who join early in the year.<br/><br/>The details of our Early Renewal Discounts for MY2010 are as follows: <br/><br/>o&#114;ganizations who joined for MY2009, will receive a 5% discount when renewing. This discount will apply throughout 2010, regardless of time of renewal. MY2009 members renewing before March 1, 2010 will receive an additional 5% discount, for a total 10% discount off the membership fee. <br/>New members as well as o&#114;ganizations who did not join for MY2009, but who held membership in any of the previous MY&#39;s (1993-2008), will also be eligible for a 5% discount provided that they join/renew before March 1, 2010. <br/>The Membership Fee Table provides exact pricing information. <br/> <br/> MY2010 Fee<br/> MY2010 Fee<br/>with 5% Discount *<br/> MY2010 Fee <br/>with 10% Discount **<br/> <br/>Not-for-Profit<br/>&nbsp;&nbsp;<br/> Standard<br/> US$2400<br/> US$2280<br/> US$2160<br/>&nbsp;&nbsp;<br/> Subscription<br/> US$3850<br/> US$3657.50<br/> US$3465<br/> <br/>For-Profit<br/> <br/> Standard<br/> US$24000<br/> US$22800<br/> US$21600<br/>&nbsp;&nbsp;<br/> Subscription<br/> US$27500<br/> US$26125<br/> US$24750<br/> <br/>*&nbsp;&nbsp; For MY2009 Members renewing for MY2010 and any previous year Member who renews before March 1, 2010<br/><br/><br/>** For MY2009 Members renewing before March 1, 2010<br/><br/>Publications for MY2010 are still being planned but it will be another productive year with a broad sel&#101;ction of publications.&nbsp;&nbsp;The working titles of data sets we intend to provide include: <br/><br/>Arabic Treebank: Part 2 v 4.0<br/> Fisher Spanish<br/> <br/>Chinese Treebank 7.0 <br/> LCTL Bengali<br/> <br/>Chinese Web N-gram Version 1.0<br/> NPS Chat Corpus <br/><br/>In addition to receiving new publications, current year members of the LDC also enjoy the benefit of licensing older data at reduced costs; current year for-profit members may use most data for commercial applications.<br/><br/>This past year, nearly 100 o&#114;ganizations who renewed membership o&#114; joined early received a discount on membership fees for MY2009.&nbsp;&nbsp;Taken together, these members saved over US$50,000!&nbsp;&nbsp;Be sure to keep an eye out on your mail - all LDC members have been sent an invitation to join letter and renewal invoice for MY2010.&nbsp;&nbsp;Renew early for MY2010 and save today!<br/><br/>LDC at NWAV 38<br/><br/>LDC exhibited at NWAV for the third straight year. We were delighted to interact with so many talented sociolinguistic researchers and to introduce numerous attendees to LDC and our data catalog. LDC distributed free copies of both the SLX Corpus of Classic Sociolinguistic Interviews, as per the terms of the Timebank grant, and the 2008 LDC Spoken Language Sampler, which is available for download here. We also distributed many of our newly minted data sheets, including one featuring the speech annotation tool XTrans. This tool is also freely available from our website in Linux and Windows formats.&nbsp;&nbsp; <br/><br/>LDC’s Executive Director Chris Cieri and Senior Associate Director Stephanie Strassel presented papers on the following topics:<br/><br/>·&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Models of Phonological Variation for Multi-dialectal Communities: the case of L’Aquila<br/><br/>·&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Closer Still to a Robust, All Digital, Empirical, Reproducible Sociolinguistic Methodology<br/><br/>Thanks again to everyone who stopped by our display and we look forward to seeing you again next year!<br/><br/>New Publications<br/><br/>(1) 2007 NIST Language Recognition Evaluation Supplemental Training Set consists of 118 hours of conversational telephone speech segments in the following languages and dialects: Arabic (Egyptian colloquial), Bengali, Min Nan Chinese, Wu Chinese, Taiwan Mandarin, Cantonese, Russian, Mexican Spanish, Thai, Urdu and Tamil. <br/><br/>The goal of the NIST (National Institute of Standards and Technology) Language Recognition Evaluation (LRE) is to establish the baseline of current performance capability for language recognition of conversational telephone speech and to lay the groundwork for further research efforts in the field. NIST conducted three previous language recognition evaluations, in 1996, 2003 and 2005. The most significant differences between those evaluations and the 2007 task were the increased number of languages and dialects, the greater emphasis on a basic detection task for evaluation and the variety of evaluation conditions. Thus, in 2007, given a segment of speech and a language of interest to be detected (i.e., a target language), the task was to decide whether that target language was in fact spoken in the given telephone speech segment (yes o&#114; no), based on an automated analysis of the data contained in the segment. <br/><br/>The supplemental training material in this release consists of the following: <br/><br/>Approximately 53 hours of conversational telephone speech segments in Arabic (Egyptian colloquial), Bengali, Cantonese, Min Nan Chinese,Wu Chinese, Russian, Thai and Urdu. This material is taken from LDC&#39;s CALLHOME, CALLFRIEND and Mixer collections. <br/>Approximately 65 hours of full telephone conversations in Mandarin Chinese (Taiwan), Spanish (Mexican) and Tamil. This material was collected by o&#114;egon Health and Science University (OHSU), Beaverton, o&#114;egon. The test segments used in the 2005 NIST Language Recognition Evaluation were derived from these full conversations. <br/>In addition to the supplemental material contained in this release, the training data for the 2007 NIST Language Recognition Evaluation consisted of data from previous LRE evaluation test sets, namely, 2003 NIST Language Recognition Evaluation and 2005 NIST Language Recognition Evaluation.<br/><br/>2007 NIST Language Recognition Evaluation Supplemental Training Set is distributed on one DVD-ROM.<br/><br/>2009 Subscription Members will automatically receive two copies of this corpus.&nbsp;&nbsp;2009 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for US$1500.<br/><br/>(2) French Gigaword Second Edition is a comprehensive archive of newswire text data that has been acquired over several years by LDC. This second edition up&#100;ates French Gigaword First Edition (LDC2006T7) and adds material collected from August 1, 2006 through December 31, 2008. <br/><br/>The two distinct international sources of French newswire in this edition, and the time spans of collection covered for each, are as follows: <br/><br/>Agence France-Presse (afp_fre) May 1994 - Dec 2008 <br/>Associated Press Worldstream, French (apw_fre) Nov 1994 - Dec 2008 <br/>The seven-letter codes in parentheses include the three-character source name abbreviations and the three-character language code (&#34;fre&#34;) separated by an underscore (&#34;_&#34;) character. The three-letter language code conforms to LDC&#39;s internal convention based on the ISO 639-3 standard. These codes are used in the directory names wh&#101;re the data files are found and in the prefix that appears at the beginning of every data file name. They are also used (in all UPPER CASE) as the initial portion of the DOC &#34;id&#34; strings that uniquely identify each news story.<br/><br/>The overall totals for each source are summarized below. The &#34;Totl-MB&#34; numbers show the amount of data obtained when the files are uncompressed (i.e., approximately 15 gigabytes, total); the &#34;Gzip-MB&#34; column shows totals for compressed file sizes as stored on the DVD-ROM; and the &#34;K-wrds&#34; numbers are the number of whitespace-separated tokens (of all types) after all SGML tags are eliminated.<br/><br/>Source<br/> #Files<br/> Gzip-MB<br/> Totl-MB<br/> K-wrds<br/> #DOCs<br/> <br/>AFP_FRE<br/> 172<br/> 2408<br/> 4079<br/> 560000<br/> 2060803<br/> <br/>APW_FRE<br/> 171<br/> 2280<br/> 1719<br/> 241324<br/> 0872573<br/> <br/>TOTAL<br/> 343<br/> 4688<br/> 5789<br/> 801324<br/> 2933376<br/> <br/>The data has undergone a consistent extent of quality control to eliminate out-of-band content and other obvious forms of corruption. Since the source data is generated manually on a daily basis, there will be a small percentage of human errors common to all sources: missing whitespace, incorrect o&#114; variant spellings, badly formed sentences, and so on, as are normally seen in newspapers. No attempt has been made to address this property of the data.<br/><br/>French Gigaword Second Edition is distributed on one DVD-ROM.<br/><br/>2009 Subscription Members will automatically receive two copies of this corpus.&nbsp;&nbsp;2009 Standard Members may request a copy as part of their 16 free membership corpora. Non-members may license this data for US$4000.<br/><br/>(3) NXT Switchboard Annotations, brings together in NITE XML, a single XML format, the multiple layers of annotation performed on a transcript subset from Switchboard 1- Release 2, LDC97S62. NXT Switchboard Annotations was developed in a collaboration among researchers from Edinburgh University, Stanford University and the University of Washington. <br/><br/>The o&#114;iginal Switchboard corpus is a collection of spontaneous telephone conversations between previously unacquainted speakers of American English on a variety of topics chosen from a pre-determined list. A subset of one million words from those conversations was annotated for syntactic structure and disfluencies as part of the Penn Treebank project. Phonetic transcripts were generated by the International Computer Science Institute, University of California Berkeley and later corrected by the Institute for Signal Information Processing, Mississippi State Univeristy. The Penn Treebank transcripts provided the basis for the NXT Switchboard corpus, and the noun phrases from that subset were annotated for animacy. The Treebank transcript was then aligned with the corresponding subset from the corrected Mississippi State (MS-State) transcript in o&#114;der to provide word timing information. Focus/contrast and prosodic annotations, as well as phone/syllable alignment were next added to the annotations. The previous annotations of dialog acts and prosody were converted to NITE XML. Lastly, hand annotations for markables were added to provide information about their animacy and information structure, including coreferential links. <br/><br/>NXT is an open source toolkit that enables multiple linguistic annotations to be assembled into a unified database. It uses a stand-off XML data format that consists of several XML files that point to each other. The NXT format provides a data model that describes how the various annotations for a corpus relate to one another. For that reason, it does not impose any particular linguistic theory o&#114; any particular markup structure. Instead, users define their annotations in a &#34;metadata&#34; file that expresses their contents and how they relate to each other in terms of the graph structure for the corpus annotations overall. The relationships that can be defined in the data model draw annotations together into a set of intersecting trees, but also allow arbitrary links between annotations over the top of this structure, giving a representation that is highly expressive, easier to process than arbitrary graphs and structured in a way that helps data users. NXT&#39;s other core component is a query language designed specifically for working with data conforming to this data model. Together, the data model and query language allow annotations to be treated as one coherent set containing both structural and timing information.<br/><br/>NXT Switchboard Annotations is distributed via web download.<br/><br/>2009 Subscription Members will automatically receive two copies of this corpus.&nbsp;&nbsp;2009 Standard Members may request a copy as part of their 16 free membership corpora.&nbsp;&nbsp;Non-members may license this data for US$25.&nbsp;&nbsp;NXT Switchboard Annotations is made available to LDC not-for-profit members and all non-members under the Creative Commons Attribution-Noncommercial Share Alike 3.0 license. NXT Switchboard Annotations is available to LDC&#39;s for-profit members under the terms of their For-Profit Membership Agreements.<br/>--------------------------------------------------------------------------------<br/>Ilya Ahtaridis<br/>Membership Coordinator<br/>-------------------------------------------------------------------<br/>Linguistic Data Consortium&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Phone: (215) 573-1275<br/>University of Pennsylvania&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Fax: (215) 573-2175<br/>3600 Market St., Suite 810&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ldc@ldc.upenn.edu<br/> Philadelphia, PA 19104 USA&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <a href="http://www.ldc.upenn.edu" target="_blank" rel="external">http://www.ldc.upenn.edu</a>]]></summary>
	  <link rel="alternate" type="text/html" href="http://www.luweixmu.com/home/article.asp?id=155" /> 
	  <id>http://www.luweixmu.com/home/default.asp?id=155</id>
  </entry>	
		
  <entry>
	  <title type="html"><![CDATA[新书推荐：第二语言语用习得]]></title>
	  <author>
		 <name>admin</name>
		 <uri>http://www.luweixmu.com/home/</uri>
		 <email>luwig@xmu.edu.cn</email>
	  </author>
	  <category term="" scheme="http://www.luweixmu.com/home/default.asp?cateID=8" label="语言学" /> 
	  <updated>2009-10-02T13:49:06+08:00</updated>
	  <published>2009-10-02T13:49:06+08:00</published>
		  <summary type="html"><![CDATA[第二语言语用习得：中国学习者英语请求语言行为习得研究&nbsp;&nbsp;<br/>作者: 杨仙菊 <br/>出版社: 国防工业出版社<br/>出版日期: 2009年<br/>ISBN: 9787118062663，7118062669<br/>定价: ￥25.00 元&nbsp;&nbsp;<br/>内容提要：<br/>《第二语言语用习得:中国学习者英语请求语言行为习得研究》在中介语语用学(Interlanguage Pragmatics)理论指导下，研究母语为汉语的学习者英语语用能力习得的特点和规律。采取横向设计，收集不同英语水平的学习者产生的英语“请求”言语行为语篇并分别与英、汉语本族语者语料进行比较发现：不同语言水平的学习者的语用能力处于不同的发展阶段，其“请求”言语行为体现出不同的中介语特征。除语言水平外，学习者二语语用能力的习得过程受母语语用迁移、课堂教学中的语用输入等因素的影响。本研究的发现对语用教学的大纲制定和教学过程具有一定的指导作用。<br/>编辑推荐：<br/>《第二语言语用习得:中国学习者英语请求语言行为习得研究》是在作者的博士论文基础上修改而成的。在中介语语用学（Interlanguage：Pragmatics）理论指导下，研究母语为汉语的学习者英语语用能力习得的特点和规律以及影响二语语用习得的因素。<br/>目录：<br/>Chapter One Introduction.<br/>1.1 Research background<br/>1.2 Significance of the present study<br/>1.3 o&#114;ganization of the book<br/><br/>Chapter Two Acquisitional Issues in Interlanguage Pragmatics<br/>2.1 Pragmatic competence<br/>2.1.1 Communicative competence<br/>2.1.2 A working definition of pragmatic competence<br/>2.2 The definition and scope of Intedanguage Pragmatics (ILP)<br/>2.3 The theoretical perspectives of L2 pragmatic acquisition<br/>2.3.1 Cognitive processing<br/>2.3.2 Relevance theory (RT)<br/>2.3.3 The complexification hypothesis<br/>2.4 Factors influencing L2 pragmatic acquisition<br/>2.4.1 Classroom instruction<br/>2.4.2 Pragmatic transfer<br/>2.4.3 Grammatical competence<br/>2.5 Methodology in doing acquisitional research<br/>2.5.1 Developmental research designs<br/>2.5.2 Data collection instruments<br/>2.6 Summary<br/><br/>Chapter Three Studies on the Speech Act of Requests<br/>3.1 Definition and taxonomy of requests<br/>3.1.1 Defining requests<br/>3.1.2 A taxonomy of requests<br/>3.2 Illocutionary aspects of requests<br/>3.2.1 Request strategies<br/>3.2.2 Internal modification (1M)<br/>3.2.3 External modification (EM)<br/>3.3 Studies on the speech act of requests<br/>3.3.1 Cross-cultural research on requests<br/>3.3.2 Developmental research on requests<br/>3.3.3 Studies on requests by Chinese researchers<br/>3.4 Summary<br/><br/>Chapter Four Research Design<br/>4.1 Research questions<br/>4.2 Informants<br/>4.2.1 Criteria in sel&#101;cting informants<br/>4.2.2 Learner informants<br/>4.2.3 Native informants<br/>4.3 Research instruments<br/>4.3.1 Background questionnaire<br/>4.3.2 Developing request scenarios..<br/>4.3.3 The production questionnaire and sociopragmatic assessment questionnaire<br/>4.4 Data collection<br/>4.5 Data analysis framework<br/>4.5.1 Request strategies<br/>4.5.2 Internal modification<br/>4.5.3 External modification<br/>4.5.4 Procedures of data analysis<br/>4.6 Summary<br/><br/>Chapter Five Data Analysis and Discussion<br/>5.1 Request strategies<br/>5.1.1 Total number of strategies<br/>5.1.2 Directness<br/>5.1.3 Conventional indirectness<br/>5.1.4 Non-conventional indirectness (Hints)<br/>5.1.5 Combination of individual strategies<br/>5.1.6 Opting out<br/>5.1.7 Summary and discussion of request strategies<br/>5.2 Internal modification<br/>5.2.1 Overall per[ormanee of internal modification<br/>5.2.2 Syntactic downgraders<br/>5.2.3 Lexical/phrasal downgraders<br/>5.2.4 Upgraders<br/>5.2.5 Combination of IM<br/>5.2.6 ZerolM<br/>5.2.7 Summary and discussion of internal modification<br/>5.3 External modification<br/>5.3.1 Overall performance of supportive moves<br/>5.3.2 Employment of supportive moves by situation<br/>5.3.3 Summary and discussion of external modification<br/>5.4 Situational variation in requesting behavior<br/>5.4.1 Sociopragrnatic assessment questionnaire<br/>5.4.2 Assessment of social parameters across groups<br/>5.4.3 Assessment of social parameters across situations<br/>5.4.4 Perception of social parameters and choice of request strategies<br/>5.4.5 Summary and discussion of sociopragmatic competence<br/>5.5 The effect of L1 pragmatic transfer on L2 request acquisition<br/>5.5.1 Request strategies and L1 influence<br/>5.5.2 lntenal modification and L1 influence<br/>5.5.3 External modification and L1 influence<br/>5.5.4 Summary of transfer effect<br/>5.6 Summary: Overall developmental features of English requests<br/><br/>Chapter Six The Role of Instruction in [2 Pragmatic Development<br/>6.1 Negative effect of instruct}on on 1_2 pragmatic development<br/>6.1.1 Metapragmatic input<br/>6.1.2 Classroom discourse<br/>6.1.3 Grammar-oriented Pedagogy<br/>6 2 Need for instruction of L2 pragmatics<br/>6.2.1 Curricula<br/>6.2.2 Teaching materials<br/>6.2.3 Instruction of L2 speech acts<br/>6.3 Summary<br/><br/>Chapter Seven Conclusion<br/>7.1 Summary of findings<br/>7.2 Factors influencing L2 pragmatic development<br/>7.3 Limitations of the present study and directions for future research<br/>References<br/>Appendix 1 Questionnaire for Chinese learners<br/>Appendix 2 Questionnaire for Chinese native speakers<br/>Appendix 3 Qnestiomnaire for English native speakers...<br/>……<br/>书摘：<br/>Motivated by an acquisitional approach to interlanguge re- search，the present study meets the need for developmental re- search of Interlanguge Pragmatics and the need for exploration into learners of English with different L1 backgrounds，in parti- cular，Chinese，thus enriching the bulk of research in the field of both SLA and pragmatics.Thus far，a large number of Interlan- guage Pragmatics studies have primarily focused on the compari- son of the realization patterns of requests in two languages（e.g.，Blum-Kulka Olshtain，1986；Blum-Kulka，et al，1989；Car- cia，1993；Hill，1997；Hassall，2003；Byon，2004）rather than ex- ploring the various factors influencing pragmatic development（e.g.，L1 transfer，input，instruction）.So is the case in China.For example，substantial cross-cultural studies deal with the similari- ties and differences between Chinese and English realization pat- terns of requests，the universality of politeness phenomena，the social factors influencing the performance of requests，the prag- matic failures found in inter-cultural communication（including requests），and so on.However，few studies have explored how Chinese learners acquire English pragmatic knowledge.]]></summary>
	  <link rel="alternate" type="text/html" href="http://www.luweixmu.com/home/article.asp?id=151" /> 
	  <id>http://www.luweixmu.com/home/default.asp?id=151</id>
  </entry>	
		
  <entry>
	  <title type="html"><![CDATA[新书推荐：牛津计算语言学手册]]></title>
	  <author>
		 <name>admin</name>
		 <uri>http://www.luweixmu.com/home/</uri>
		 <email>luwig@xmu.edu.cn</email>
	  </author>
	  <category term="" scheme="http://www.luweixmu.com/home/default.asp?cateID=8" label="语言学" /> 
	  <updated>2009-10-02T13:34:31+08:00</updated>
	  <published>2009-10-02T13:34:31+08:00</published>
		  <summary type="html"><![CDATA[中文书名：牛津计算语言学手册<br/>英文书名：The Oxford Handbook of Computational Linguistics<br/>丛书名：当代国外语言学与应用语言学文库<br/>著译者：米特科夫(英)著<br/>出版社：外语教学与研究出版社<br/>出版日期：2009-09-11<br/>ISBN：978-7-5600-8913-3<br/>开本：16开<br/>页数：836页<br/>装订：平<br/>定价：109.90<br/>内容简介：<br/>&nbsp;&nbsp;&nbsp;&nbsp;本书是一部手册性的计算语言学专著，收录了包括语言学家、计算机专家和语言工程人员在内的50位学者撰写的综述性文章，全面地反映了国外计算语言学主要领域的最新成果，是我们了解国外计算语言学发展动向的一个窗口。 <br/>全书各章写作风格一致，内容协调，浑然一体，使用有趣的实例来介绍艰深的技术问题，而且尽量不使用繁难的数学公式，尤其适合文科背景的读者阅读。对于那些对计算语言学感兴趣和刚入门的读者而言，本书也是一本必备的参考书。<br/>章节目录：<br/>Preface <br/>Abbreviations <br/>Introduction <br/>PART I FUNDAMENTALS <br/>PART II PROCESSES, METHODS, AND RESOURCES <br/>PART III APPLICATIONS]]></summary>
	  <link rel="alternate" type="text/html" href="http://www.luweixmu.com/home/article.asp?id=150" /> 
	  <id>http://www.luweixmu.com/home/default.asp?id=150</id>
  </entry>	
		
</feed>
