NSF ITR Program
ITR Project # 0122466
Multilingual Access to Large Spoken Archives
University of Southern California
IBM Thomas J. Watson
Johns Hopkins University
University of Maryland
University of West Bohemia
AITIA International, Inc.
Digital archiving of the spoken word is emerging as an important method for capturing the human experience; in the future a great deal of our cultural heritage will be archived in this form. If we are to learn from our past, teachers, students, historians, and others will need effective access to these resources. The enormous scale of these collections and the tremendous expense of manually cataloging multilingual audiovisual materials will make it impractical to rely on manual techniques alone. At present, however, fully automatic techniques are far from adequate.
We will overcome these difficulties by utilizing a unique collection assembled by the Shoah Visual History Foundation. Presently the world's largest coherent archive of videotaped oral histories, it contains 116,000 hours of digitized interviews in 32 languages from 52,000 survivors, liberators, rescuers and witnesses of the Nazi Holocaust.
We propose to dramatically improve access to large multilingual collections of recorded speech by advancing the state of the art in technologies that work together to achieve this objective:
In all of these efforts, we will automate the transfer of capabilities developed originally for English to other languages. We will provide access to multilingual materials by combining knowledge-based and corpus-based techniques to extend existing thesauri to new languages and by supporting cross-language searching of manually prepared segment-level summaries and automatic speech recognition transcripts. Advancing the state of the art in this technology will produce significantly improved access to this collection as well as to other artifacts of our cultural heritage.
|Site Maintained by katyn [at] umd.edu.|