What is Language Technology?
by Hans Uszkoreit
Language technology (LT) — often also referred to as human language technology — comprises computational methods, computer programs and electronic devices that are specialized for analyzing, producing, modifying and translating text and speech. Such systems must be based on some knowledge or mathematical models of human language. Therefore language technology defines the engineering branch of computational linguistics. At the same time, the successful methods and insights of language technology belong to the core of artificial intelligence, since language is a central element of human cognition.
The field of LT is currently experiencing a boom because of both strong scientific advances and a greatly increased market demand. Considerable progress can be attributed to new powerful techniques in machine learning but also to the increased availability of mass data and formalized world knowledge. A further factor is big-data technology, based on advances in hardware and algorithms for big-data processing.
computers to communicate with people.
Although existing LT systems are far from modeling the human language faculty, they have numerous possible applications. The goal is to create software solutions that master some aspects of human language use. Such products are going to change our lives. They are urgently needed for improving human-machine interaction since the main obstacle in the interaction between people and computers — or between humans and technology in general — is a communication problem. Today's machines do not understand our language but computer languages are difficult to learn and do not correspond to the structure of human thought. Even if the language the machine understands and its domain of discourse are very restricted, the use of human language can increase the acceptance of software and the productivity of its users.
should listen and speak.
Natural language interfaces enable the user to communicate with the computer in French, English, German or another human language. Among the applications of such interfaces are database queries, information retrieval from texts or from the entire web, dialogue systems for decision support, and the control of cars, robots and any kind of machinery. Recent advances in the recognition of spoken language have given rise to the expectation that conversational interfaces will soon be the predominant scheme for human technology interaction.
Communication with technology using spoken language will have a lasting impact upon the work environment and completely new areas of application for information technology will open up. Through the emerging Internet of Things, buildings, vehicles and simple household devices will become responsive to human language queries and commands. However, spoken language needs to be combined with other modes of communication such as pointing with mouse or finger. If such multimodal communication is finally embedded in an effective general model of cooperation, we have succeeded in turning the machine into a partner.
also help people communicate with each other.
Much older than communication problems between human beings and machines are those between people with different mother tongues. One of the original aims of computational linguistics has always been the fully automatic translation between human languages. From bitter experience scientists have realized that they are still far away from achieving the ambitious goal of correctly translating unrestricted texts. Nevertheless, they have been able to create software systems that simplify the work of human translators and clearly improve their productivity. Less than perfect automatic translations have become of great help to information seekers who have to search through large amounts of texts in foreign languages. A big boost to automatic translation is about to happen due to brand-new progress in neural machine translation, i.e., the translation of human language by neural nets that have learned from large volumes of professional translations. For the first time, machines can now reach near-human translation quality for some languages and text types—but in a fraction of the time. The dreams of the babelfish interpreter or of ambient translation, which just happens when you speak wherever you happen to be, have come into reachable distance.
the fabric of the web.
The rapid growth of the Internet/WWW and the emergence of the information society have posed exciting new challenges to language technology. Although the new media combine text, graphics, sound and movies, the whole world of multimedia information can only be structured, indexed and navigated through language. For browsing, navigating, filtering and processing the information on the web, we need software that can get at the contents of documents. Language technology for analyzing digital content is a necessary precondition for turning the wealth of digital information into collective knowledge. Today’s search will be replaced by immediate access, i.e., by answering questions and sometimes even by offering information before one can ask for it.
The increased multilinguality of the web constitutes an additional challenge for language technology. The global web can only be mastered with the help of multilingual tools for indexing, navigating and querying. Systems for crosslingual information and knowledge management will soon surmount language barriers for e-commerce, education and international cooperation.
combines ambitious visions and realistic applications.
The future of language technology will be determined by the growing need for user-friendly software. Even though the successful simulation of human language competence is not to be expected in the near future, researchers have numerous realistic short-term goals involving the design, realization and maintenance of systems which facilitate everyday work, such as grammar checkers for word processing and content authoring programs, intelligent email sorting and response generation, document categorization and summarization software, and systems for extracting selected information from large volumes of text. Thus work on language technology spans a wide spectrum of ambitious tasks ranging from the study of human language and thought via the development of novel computational techniques all the way to the marketing of profitable LT products.
The real boom
is yet to come and since we won’t wait, it will come very soon.
The currently perceived upswing of LT is just the first stage of a much larger boom. The real explosion will happen in the near future when research manages to combine the following five components of future LT: powerful deep learning techniques, tractable semantic technologies, performant big-data technologies, large structured knowledge resources and very large volumes of meaningful data, especially language data enriched by context data.
The chatting companions Siri, Alexa and Google Home are just the first harbingers of a new age of technology and collective knowledge, in which the trend of growing alienation is reversed. This reversal will lead to a technology that interacts with its human masters in a friendly and intuitive manner and at the same time also connects people in a more natural way to the entire world around them by providing them access to all of human knowledge through the same smooth and effective means we use to interact with each other: listening, understanding and talking.
© 2010, 2016 Hans Uszkoreit