孙波,
外国语言学及应用语言学博士点法律语言学方向博士生。
学术兴趣
主要包括语篇分析、法庭语言研究、语言证据等相关领域。
目前选题范围
着眼于语言证据中的文本鉴别,包括作者归属分析、翻译抄袭判定等。作者归属分析是语言证据的一个重要方面。Howald(2008)将作者归属分析技术分为两大类:法律风格学方法和计量统计学方法。前者大体上选择一些规约性变量,有关此种方法的文献较少提到其测量精确度,少数有记载的大约也只在67-72% (如Koppel and Schler, 2003)。而后者则主要关注的是一些可量化的变量,测量精确度较高,可达到80%以上(如Baayen et al. 2002)。此外,在美国,因受到Daubert准则的制约,对语言证据进行分析时,误差率已知或可知的分析方法更能为法庭所接受,因而目前作者归属分析运用更多的是计量统计学方法。对于此种方法,研究者一般会从文本中寻找出某种个人言语特征,并将其作为变量进行数理统计分析,最后得出结论。对于作者归属分析而言,法律风格学方法大多直接从连续体终端的语言形式中寻找带有显著性的特征,如大小写、标点符号、日期格式等,将其作为个人言语特征加以分析,随着电子计算机的日益普及,某些特征如日期格式一般由系统自动生成,并且这类特征往往较为直观,易于人为的改变从而达到掩饰作者身份的目的,因此加大了作者归属分析的复杂性,降低了分析的准确性,所以单纯的法律风格学方法受到了一些限制。而计量统计学方法虽然也从语言形式入手,但重点选取一些可量化的变量,如单词的长度、句子的长度、虚词的使用频率、不同词性的排列顺序等,将已知文本视作样本,运用统计学技术,推断由于认知机制和语言社会经历的作用导致作者具有某些特定的表达习惯,并将其作为个人言语特征,在文本间进行比较。由于这些表达习惯不易凭直觉察觉,因而具有相对的稳定性,分析的准确性也因此相对较高。给有志于进行本方向研究的博硕士生的建议是对法律语言学理论有系统了解,研读本方向的最新学术成果,并全面掌握各项统计分析技术。
Sun Bo, Doctoral Student in the National Key Research Center for Linguistics and Applied Linguistics, has conducted his research in Forensic Linguistics. He is strongly interested in discourse analysis, courtroom language research and language evidence. To prepare for his Doctorate Dissertation, he now focuses on text identification, including authorship attribution and plagiarism in translation. As for authorship attribution, Howald (2008) divides techniques of authorship attribution into two broad categories–forensic stylistic approach and stylometric approach. The former usually select some prescriptive variables and report a comparatively low accuracy 67-72% (Koppel and Schler 2003). The latter mainly focus on those quantifiable variables and perform significantly better in the accuracy rates: in the high 80% range (e.g. Baayen et al. 2002). In addition to this, as required by Daubert criteria, scientific methods with known or potential error rates are more acceptable in the courtroom of USA. Therefore, stylometric approach is more widely used in authorship attribution. When this approach is adopted, researchers tend to locate a set of characteristics from texts, establish them as variables and conduct statistical analysis. In authorship attribution, researchers making use of forensic stylistic approach mostly attempt to look for characteristics with saliency from the linguistic forms, such as upper and lower case, punctuation, date format, and etc. These characteristics are treated as idiolect and analyzed accordingly. Nevertheless, with the widespread use of computers, some of these characteristics, for example date formula, are automatically generated by computer systems. What is more, as these characteristics are easily noticeable, authors can change them without difficulties so as to disguise their real identity. As a consequence, the result of authorship attribution is not as accurate as expected. On the other hand, although stylometric approach also searches for clues from linguistic forms, researchers primarily select some quantifiable variables from sample texts, such as word length, sentence length, and frequency of function words, employ statistical techniques and infer that the author of these texts has certain writing habits, which are looked upon as idiolect and used for comparison among texts. Since most of these writing habits are scarcely perceptible, they tend to be more reliable and thus the accuracy of authorship attribution is improved. The suggestions for research fellows and postgraduates would be to have a comprehensive knowledge of forensic linguistics and read the latest literatures in fields like authorship attribution, plagiarism detection and forensic speaker identification. Besides, being proficient in statistical techniques is also of great significance.