Speech by Guido Gybels: Changing lives through technology: a glimpse of the future for deaf and hard of hearing people

RNID New Technologies Logo
 
Home » Information and Publications » Speeches and Presentations » Changing lives through technology: a glimpse of the future for deaf and hard of hearing people

Changing lives through technology: a glimpse of the future for deaf and hard of hearing people

Good afternoon ladies and gentlemen, it is a pleasure to see colleagues from all across Europe gathered here today in Oslo. First of all, I would like to thank EFHOH, in particular Lillian and Marcel, for organising this event and for the invitation to speak.

But may I first spend a few words introducing myself and the organisation I work for. RNID, the Royal National Institute for Deaf People is the largest charity representing the 9 million deaf and hard of hearing people in the UK. We are a membership charity with over 37.000 members, giving us the largest membership of all disability charities in the UK. We currently employ over 1.300 staff and we spent about 47 million pounds during our last financial year trying to dramatically improve the lives of deaf, hard of hearing and speech-impaired people.

Our mission is to achieve a radically better quality of life for deaf and hard of hearing people. We do this of course by campaigning and lobbying, by providing information and raising awareness and by delivering training courses and consultancy on deafness in particular and disability in general. In addition, we are the largest single communication support provider in the UK and our services include sign language interpreters, lip speakers, as well as speech-to-text operators and note takers. Furthermore, RNID runs educational programmes seeking lasting change in education as well as comprehensive employment programmes to help deaf people into work. We also operate care services for deaf and hard of hearing people with additional needs in care homes throughout the UK.

RNID of course also manages RNID Typetalk, the national telephone relay service that was founded in the late eighties, allowing textphone users to communicate with voice telephone users by translating text into voice and vice versa. Similarly, we established RNID SignTalk, a sign language relay service, a few years ago. RNID Products is the main supplier in the UK of equipment and products for deaf and hard of hearing people and we have extensive and unique programmes for social, biomedical and technical research.

My own New Technologies department is a research and development group consisting of engineers, scientists, computer programmers and researchers. We pursue every opportunity to harness information and communication technology to tear down the barriers to opportunity and fulfilment that sadly enough still exist in our modern world.

This brings me to my talk today. Technology, and information and communication technology specifically, plays an ever more dominating role in our daily lives. Being a fully enabled citizen in this modern world can no longer be separated from the extent to which we have access to, and interact with, services and products such as television, telecommunication and the Internet.

This new world is often called: "The Information Society" and it has spectacularly changed the way we live, educate, work, and also entertain ourselves.

In terms of what this means for deaf and hard of hearing people, I would say that it is very much a two edged sword. There are clear opportunities to tear down some of the barriers to opportunity and fulfilment that deaf and hard of hearing face in daily life. But at the same time, there is also the danger of new barriers being raised and of social and economic exclusion or deprivation that will result from these new obstacles.

This is something I have often spoken about in the past and many of you will have heard me discuss this dual nature of technological and scientific evolution. So, today, I want to deal with different aspects of the ICT revolution: I want to look ahead, to explore what the future might look like for deaf and hard of hearing people, rather than spending a lot of time talking about what went wrong in the past.

Don't get me wrong: it is important that we learn the lessons from the past, for example about how new products and services can result in serious societal exclusion for large groups of people when they are not developed with the needs and preferences of all users. We all know the example of how GSM mobile telephones have revolutionised our lives: in countries like the UK, penetration of mobile phones is already above 100%, meaning that on average people have more than one mobile handset. Yet, because the needs of deaf and hard of hearing people were not fully taken into account as this technology was developed and then rolled out, we ended up with a situation where deaf and hard of hearing people faced new barriers to opportunity and fulfilment: mobile textphones were not part of the standard, there are problems with interference between mobile phones and hearing aids, there are shortcomings in audio quality and volume control, and so on. So, we must learn from these examples from the past to make sure we don't repeat the same mistakes in the future. And I will talk later on about how we as users, user groups and lobby and campaigning organisations have to take responsibility to bring these lessons to the attention of policy makers, the industry and any other stakeholder.

But today, I want to mainly look forward, to the future and to the opportunities that converged, digital technology offers in our battle to make the Information Society a more equal place for deaf and hard of hearing people.

In doing so, you will hear some terminology again and again and I will refer a lot to new, digital networks and products. So perhaps, it is useful to quickly explain a few basic facts about these digital technologies. You don't need to understand every detail of what I am about to show you, but it will hopefully help you to comprehend better what I am talking about.

The first thing that I would like to explain is the term digital technology. Firstly, nature itself is often not digital, rather it is "analogue", meaning: continuous, with gradual changes across that continuum. An analogue audio signal in an analogue device for example, is an electrical signal that directly represents the continuous changes in frequency and amplitude of sound waves into an electrical signal. In other words, the electrical signal is equivalent, similar to the sound waves.

A digital signal on the other hand consists of a clearly defined set of discrete values. Most digital systems today are binary systems and for the rest of my talk when I say digital, you can assume I mean binary digital.

Such a system only knows two distinct states: 0 and 1, sometimes also called true and false or high and low. Now of course, it is not really enough to have two digits or values to describe the real world: nature is more complex than that. But you can combine several digits to describe more than two states. The more digits you use, the more different values you can express. For example, if I use 2 digits, or bits as we usually call them, each of which can be 0 or 1, then I can in fact represent 4 different values: 00, 01, 10 and 11. By adding more digits, the number of values increases by powers of 2, so with 4 bits, I can already represent 16 values and when I use 8 bits this goes up to 256. What you should remember is: the more bits I use, the more values I can express. More values means more differentiation and thus more detail. In other words, if I use more bits, I can represent reality with higher accuracy.

Here is a simple example of that relationship between detail, or resolution as we like to call it, and the number of bits used to describe reality. This is a picture of some fat bloke in the gardens of Blenheim Palace. I have zoomed in on a part of the picture. The first picture is a fairly hires image, using 24-bit colour: in other words the colour of each dot, each pixel in the picture is expressed by 24 bits, so 24 ones and zeros. So, each pixel can have one colour out of a possible 16.7 million.

The second image is the same picture, but now only using 8 bits per pixel, or one third. The big picture looks still a lot like the original, but zooming in you can see that we have lost quite a bit of detail.

The last picture uses only 4 bits per pixel, leaving only 16 colours to choose from and you can see that even the large image is now badly affected and that our image representation of reality is severely affected.

Now, why do we use digital at all - if most of the world around us is analogue in nature, why not just use analogue technologies? There are several reasons for that, but one of the most important is the need to process and transmit information accurately. When you send an analogue signal from one point to another, or indeed when you process it within an electronic circuit, noise is added to the signal. This noise accumulates and step-by-step changes the signal. So, what goes in at one end is no longer the same as what comes out at the other end.

With digital technology on the other hand, adding noise does not change the information: a one remains a one, a zero remains a zero. I am slightly simplifying all of this, but the important thing to remember is that a digital signal can much more accurately be processed and transmitted.

Also, having discrete values means that you can build systems using very simple logic, as indeed is the case in computers: they use very simple concepts such as and/or circuits. This means that in many respects the design and handling of digital information is actually simpler than is the case for analogue information and certainly designing the logic is a much simpler affair.

And of course, because digital information is nothing more than a series of ones and zeros, it doesn't really matter what you use the system for or what information it represents: digital systems are therefore more generic. For example, the traditional, analogue PSTN telephone system is a network whereby the signals sent over the phone line are electrical representations of speech. And that is what the network and all its components are designed for. If you want to send something else across this network, then you have to represent it as a telephony signal, which might not be very suitable. That is by the way what a computer modem does: it takes your digital information, your ones and zeros, and transforms it into telephone signals that can be sent over the analogue PSTN.

In contrast, a digital network like for example the Internet is nothing else but a way of getting ones and zeros from one point to another. The network does not really care what those ones and zeros represent and it does not need to know either. So, whether or not you're sending an email, talking over an Internet phone to someone, downloading a picture or watching a video clip on YouTube, it all boils down to ones and zeros going back and forward.

Sorry for putting you through all of this and thank you for your patience in sitting through my abridged version of an analogue versus digital technology seminar. Let's go back to the Information Society and what the future might bring for deaf and hard of hearing people.

As I have said many times before, the ability to communicate effectively, anytime, anywhere with anyone is not a gadget, it is a critical component of citizenship today. Clearly, access to telephony has been one of the major challenges for deaf and hard of hearing people and there is no doubt that being unable to communicate over the phone as effectively as hearing people can, has a serious impact on their employability and their economic status, but also on their social interactions.

The three most important challenges for hard of hearing people with regard to telephony concern voice quality and tone and volume control, hearing aid compatibility and use of text as an alternative in one or both directions.

New technologies are addressing all three of these questions and I will briefly explain what we can expect to see in the future.

Let's talk about voice quality first. The hearing range for humans is between about 20Hz and up to 18,000 to 20,000Hz, although for speech the range is typically between about 60Hz and 12,000Hz. However, the traditional analogue telephone network, the PSTN, has a frequency range of only about 3,500 Hz. So, when transmitting speech over the telephone, this typical frequency range of approximately 12KHz must be squeezed into a range only about 1/3rd of that size. As a result, someone's voice sounds distinctly different, with less contrast, when heard over the phone. That is why for example it is hard to spell over the telephone and we quickly revert to saying things like "G for Golf", "U for Uniform", etc.

Hearing people can still recognise speech even if it is compressed into a much smaller frequency range like this because there is quite a bit of redundancy in speech and in fact we recognise broader patterns in their context, rather than individual phonemes.

Remember the pictures that I showed earlier on? With speech telephony, we have a similar effect: by reducing detail in the way I represent speech, the less it will sound like the real thing. At some point, the loss of detail will become so big that I can no longer understand the information.

With telephony the basic voice signal from the traditional telephone network is already fairly low resolution, a lot of detail is already gone. But where hearing people can still manage to understand it, for people with a hearing loss, it can mean that they will no longer be able to use the phone or only with great difficulty.

There are a number of ways however in which converged, digital technology can offer a way forward. Remember again the story about the number of bits. Modern digital devices are much more powerful today than they used to be. So, they can more easily process more information and do it faster. Also, our networks have evolved and have become faster as well. The development of broadband technology means that we now have more bandwidth available at lower cost than we had 10 or 20 years ago.

So, newer telephone technologies such as 3G and VoIP can offer wideband voice telephony, based on higher bandwidth codecs, so using more bitrate, more bits, more ones and zeros, and thus offering better quality audio. This better quality means that a lot of hard of hearing people who could not use the phone before, now can use it because there is a broader frequency range and a higher sampling rate. This means that the speech is clearer and more comprehensible.

I have two examples of telephone speech to illustrate this. I know that not everyone will be able to hear this, so you will also see a graphic representation of the sound. This is the first example, it is traditional analogue telephony speech.

The second example uses a wideband audio codec, so uses more bits. Those who are able to hear the audio will notice that it is distinctly clearer and this is also visible in the graphic representation of the signal.

Unfortunately, it will take a while before you can call anyone anywhere and use these wideband audio codecs, because of course both ends of the conversation must support the newer codec and the network in between the two end points must have enough bandwidth as well. But high speed networks are becoming more and more standard and more and more newer devices or even software clients are now supporting these newer codecs. It will take some time, but the promise of better quality audio is going to become reality.

Nevertheless, for significant numbers of deaf and hard of hearing people, using voice telephony remains impossible, even with better quality audio. So, using text as an alternative to voice is then the next step.

I can hear you say that this is nothing new. And indeed, captioned telephones, whereby the incoming speech as also translated into text and textphones are not exactly new concepts. And of course that is true, but captioned telephones and textphones have also failed in many respects.

Captioned telephones and text telephony in their traditional, analogue form are not mainstream products. And because they don't reside in the mainstream market, they haven't benefited from competition and mainstream developments. As a result, the user experience in text telephony is nowhere near what mainstream telephony users enjoy. Captioned telephone services and textphones are not widely available, in many parts of Europe they are even completely absent. Because of the existence of a plethora of textphone protocols, users face all kind of difficulties in trying to call other textphone users. Can you imagine that hearing people would only be able to call other users if they were on the same local network and were using similar phones? Yet, this is exactly what textphone users have to put up with. Network support for setting up captioned or textphone calls is extremely limited, not to mention facilities to allow different textphones to work together or to make international calls. In the UK at least we have the TextDirect platform that allows non-compatible textphones to talk to each other, but it is not yet available across different networks, let alone in other parts of the world. As devices textphones are not nearly as user-friendly and modern as other phone terminals and textphone users certainly have not the same kind of choice in equipment and services that voice users have. While hearing people can walk into any phone shop in the high street and buy whatever equipment they like, deaf and hard of hearing people do not have that freedom of choice.

And again, I know that I am repeating myself, but this is an important issue: without that freedom of choice, without the ability to call anyone, anytime in the same ubiquitous manner that hearing people can pick up a phone or use a mobile to call somebody, without that ability deaf and hard of hearing people are hampered in their ability to be fully enabled citizens of the modern world.

Now, again, I want to look ahead, rather than to the past. New technology is changing the face of real-time text as well. With Internet technology comes the promise of one single standard, one single technology for real-time text. This technology will not be, as was the case for analogue textphones, a specialised, non-mainstream facility, it is already embedded at the very core of new multimedia products and services. Arnoud and others, including myself, have worked with technical communities such as the Internet Engineering Task Force to make sure that the telecoms standards, networks and terminals of tomorrow can deal with text, or indeed any combination of text, voice, video, in any direction. And those networks of tomorrow will support text relay services, or video relay services, as well as telephone captioning so that a translation of one medium into another can automatically be provided, based on individual abilities and preferences of the calling parties. So, if you are a textphone user calling a hearing person, the network will automatically route any speech coming back from the hearing person through a text relay service and you will receive it in text. Or, in text and speech if you like. Or even with video as well. And every other party in the call will be able to do the same. And best of all, this will all go over your standard broadband connection, for no extra cost.

And while the full realisation of all of this will take some time, what I am describing is already starting to happen as we speak. RNID has developed mobile textphones for many years and they have been available in the UK since 2002. The newest generation of this software, called TalkByText Mobile Edition, uses this new, open Internet standard for real-time text. There is a Windows version of TalkByText that turns any PC into a textphone, but using the proper keyboard and the good quality screen of a computer, rather than having to buy some expensive, clumsy, hard to use and ugly analogue textphone. TalkByText is an Internet based product and it uses your broadband connection to communicate, not just with other Internet users, but with legacy textphones as well, by using a gateway in the network that acts as a bridge between the Internet and the traditional analogue telephone network. This software has been available since last year. There is a Web based version of it too and all these solutions use the same underlying, mainstream technology and the Internet to provide the connection between them and all of that a lot cheaper than the cost of buying and using an old-fashioned textphone.

Let me now turn my attention for a moment to television. Hardly a new service, but one that is amongst the most widespread technologies ever. Access to television is not just a luxury. It is essential for the modern citizen. And deaf and hard of hearing people have had to fight hard for their right to get accessible television. In the UK, we are fortunate to have very high levels of subtitling. There are now more than 90 channels that carry subtitling and for the biggest channels, the levels of subtitling have been well above 90% for many years. Other countries are catching up, although it is clear that a lot more needs to happen across Europe.

But so far, television has mostly been a unidirectional, linear affair. This is now changing. Already, digital television offers more interactivity. But the real revolution is in technologies such as IPTV, television over the Internet. When I say over the Internet, I don't just mean streaming television to a web browser. In future, IPTVs will look quite similar to your current television. But the broadcast will come over the Internet just as all the rest of your network services. And again, this opens the door for a whole raft of new applications. Subtitles on television today are pretty much a "one size fits all" affair. Everyone sees them in the same way. As a user, you have no control over where they appear, in what font, which size, and so one. In future, this will all change. The television subtitles will be separate text streams. You will be able to position them around the screen, or indeed onto a different screen if you like. You will be able to select the font size and the colour, whether or not you want them on a background or transparently, etc. And because they are just text streams over the Internet, it is no longer necessary for the broadcaster to deliver them. Indeed, you could create a competitive market where different providers specialise in subtitles. This could lead to more subtitling and better quality because of competition between providers. It would also allow subtitles to be shared between broadcasters and even countries.

Also, apart from higher quality audio, in future multichannel audio for IPTV could separate music and dialogue on different streams. So, if you happen to be watching one of those dreadful channels that plays obnoxious and loud background music over news items or other broadcasts, you would be able to turn the music just off at the press of a button or to set its volume independently to a lower level. Now, would that not benefit all of us?!

For those who rely on sign language, there is little signed content of television and hearing people often complain about the presence of a signer on television. With IPTV, the signer would again be a separate video stream, which you could just turn on at the touch of a button and position wherever you want on the screen, changing the size as well. Again, I'm not talking about Star Trek fantasies: RNID has worked for several years together with the BBC on this type of technology and we demonstrated it 2 years ago to the industry at the IBC exhibition in Amsterdam.

Now, I cannot speak about future technologies for deaf and hard of hearing people without at least mentioning speech recognition technology. Would it not be great if we could stick our mobiles under the noses of anyone and have their speech translated into text in real-time? Of course that would be grand, and there are some promises that speech recognition will deliver on. But there also still significant problems that need to be overcome.

First, we must understand how speech recognition really works. It's important to realise that computers do not "understand" speech. Computers cannot think, they have no intelligence in the human sense. Computers can perform complex calculations very well, but they have no idea of what they do or what the data that they shuffle around incredibly fast, really means.

Speech recognition systems basically use complex algorithms to try and convert the audio that conveys spoken language into text. This is done by using statistical rules and pattern recognition to match a given speech utterance to syllables, words and word groups. The computer really has no understanding of the informational content, the meaning, of the speech itself (implicit knowledge is needed to differentiate between "I scream" and "ice cream" for example). It merely sees a pattern of ones and zeros that represent the audio signal and applies complex processing rules to match that pattern to words and phrases.

This will work reasonably well if the speech (the audio signal) is fairly clear, well structured and not obscured by lots of background noises or other artefacts. In other words, the audio quality needs to be very good and the spoken phrases must be structured properly according to the rules of grammar and syntax. That is why people can go out and buy dictation software for their personal computer. To use this software reliably, you will need to use a good quality microphone and train the software on your voice. You also need to be in a quiet environment, speak at a reasonable pace and articulate clearly, while making sure that your phrases have an acceptable structure. Under those conditions, you can get quite decent results out of speech recognition, although the actual success rate can differ quite a bit from one person to another.

Unfortunately, this is not how humans speak. Free, natural speech is unconstrained, unstructured and often many people are speaking at the same time. In the real world we seldom are alone in a quiet room with little background noise and people speaking at a reasonable pace, taking turns in a disciplined way. This means that the audio signal is confused and that the patterns the computer is looking for in order to determine what was said, are also mixed up. The result is that the recognition becomes very poor indeed.

There are broadly speaking two major types of speech recognition technology in use today. Firstly, there are the "command and control" systems that allow people to operate equipment through spoken commands. Such systems typically only need to recognise a fairly small set of commands, a few dozen to a few hundred. Also, the commands often follow certain patterns, for example naming a device or component, then following that by an action ("lights on", "volume up",...). Because the vocabulary is fairly small and the possible combinations limited, fairly high accuracy can be achieved, even in less than ideal acoustic circumstances. But such systems obviously cannot do anything outside the systems and commands that they have been designed for.

The second type of speech recognition technology are large vocabulary based dictation systems. These can recognise thousands, even tens of thousands of words, but need much much higher quality audio input and require much more processing power from the computer. To get acceptable results, the system must be trained on the speaker's voice, good quality microphones and other audio components must be used and the environment must be quiet and interference-free.

The other major restriction in using speech recognition freely on the move is the limited capacity of mobile devices when compared to desktops or server machines. While mobile phones and other handheld equipment has certainly become much more powerful over the years, they are still not anywhere close to desktop computers in terms of processing power, memory and quality of audio components. Also, mobile devices are often used while on the move, which is exactly the type of environment where lots of background noise and other artefacts will mess up the audio signal.

In summary, speech recognition today can be used reasonably well for things like command and control or large vocabulary dictation and for well-structured speech, i.e. speech that is grammatically and syntactically well formed and based on a high quality audio signal. For natural speech, as in human-to-human conversation, or in lectures and the like, as well as for recorded audio and so on, speech recognition performs very poorly indeed.

But nevertheless, we can see a point in the future where artificial intelligence and neuro-networks will start to solve some of these problems and plain processing power will help overcome some of the others. In the mean time, speech recognition is already used for applications such as subtitling, telephone captioning and relay and notetaking. At RNID, we are also working on a personal speech-to-text translator that deaf and hard of hearing people will be carrying around with them.

I have now come to the last part of my speech, but before I finish, I want to address briefly the political and campaigning challenges that lie ahead of us.

I have tried to sketch a future of promise and hope. A future in which we harness technology to overcome the barriers to opportunity and fulfilment that deaf and hard of hearing people still face. To me, this is all about life chances. It is not about gadgets or fancy toys. It as about the Information Society and what it means to be a citizen in the modern world.

Unfortunately, technology alone is not enough. In fact, technology itself is neither good nor bad. A great deal depends on how we as human beings use science and technology. And this requires political action, this requires us to make sure that the needs of deaf and hard of hearing people are actually taken into account by decision makers in industry, government and everywhere else in society. Because the future is what we make of it. All the promises embedded in new technologies which I have set out here today can only become true if we make sure they become reality.

The future is what we make of it. If we don't ensure that these promises are achieved, then they will not and the opportunity to dramatically change the lives of deaf and hard of hearing people will be lost.

Already now there is a danger that High Definition Television in Europe will not implement some of the things I spoke about. If we don't push policy makers and industry to actually adopt a single real-time text standard and roll it out across the European Union, then it will not happen.

So, now that we have seen this promised land, and now that we know what technology can do, we must above all make sure that it will become reality.

To achieve that, we need to work together. We can debate amongst ourselves as much as we want, but we must speak with a united voice to the rest of the world. The Dutch attendees in this room have already seen the results of such collaboration and joined approach. Their SOAP! Consortium brought together organisations for deaf and hard of hearing people across the spectrum. And their joint campaign has worked. Because united we stand, but if we are divided we will lose.

So, we must join efforts, we must bundle our strengths and resources. If we do that, then all of us will benefit from it, and ultimately every single individual in society will gain as well.

Technology and the problems of inclusion are a challenge that we all share. Above all, we need a common strategy and a common goal. This is not to say that there are no differences between different countries. But underneath those differences still reside the same fundamental principles of equality.

The rewards for all of us are potentially colossal. I have shown you a few glimpses of what a more inclusive future might look like. If we succeed, deaf and hard of hearing people will emerge as the ultimate winners of this battle. Let us make this vision come true.

Thank you very much for your attention.

Speech by Guido Gybels, given at the EFHOH 2007 AGM, Oslo, Norway, on 5 May 2007.

Slides for: Changing lives through technology: a glimpse of the future for deaf and hard of hearing people (Microsoft Powerpoint 2000 Show 7.91MB)