How to Create a Voice-Controlled Web App Using JavaScript and Web Speech API
Step-by-Step Guide to Implement Voice Commands in Your Web App
Table of contents
Voice-controlled applications are rapidly becoming an essential part of modern user interfaces. From smart assistants like Alexa to voice commands in mobile apps, interacting with technology via voice is no longer a futuristic idea — it’s here and now. But did you know that you can build your own voice-controlled web app with just JavaScript and the Web Speech API?
In this guide, we'll walk you through the steps of creating a simple voice-controlled web app that listens to your commands and performs actions based on them. You'll learn how to integrate the Web Speech API into your project and understand its key features.
What is the Web Speech API?
The Web Speech API provides two main functionalities:
Speech Recognition: Converts spoken words into text.
Speech Synthesis: Converts text into spoken words (text-to-speech).
In this tutorial, we'll focus on Speech Recognition to turn your voice into actionable commands in a web app.
Setting Up the Speech Recognition API
The Web Speech API is supported by most modern browsers, particularly Google Chrome. The main interface you'll work with for speech recognition is SpeechRecognition
(or webkitSpeechRecognition
for cross-browser compatibility).
To get started, let’s create a basic HTML page with JavaScript to listen to voice commands and display recognized text.
Step 1: Basic HTML Structure
Here’s a simple HTML structure with a button to start voice recognition and a div to display the recognized text:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Voice Controlled Web App</title>
<style>
body {
font-family: Arial, sans-serif;
padding: 20px;
background-color: #f4f4f4;
}
#output {
margin-top: 20px;
padding: 10px;
border: 1px solid #ccc;
background-color: white;
width: 100%;
max-width: 500px;
min-height: 50px;
}
button {
padding: 10px 20px;
background-color: #007BFF;
color: white;
border: none;
cursor: pointer;
}
button:hover {
background-color: #0056b3;
}
</style>
</head>
<body>
<h1>Voice-Controlled Web App</h1>
<p>Click the button and speak. Your voice will be recognized and displayed as text!</p>
<button id="startBtn">Start Voice Recognition</button>
<div id="output">Recognized text will appear here...</div>
<script src="app.js"></script>
</body>
</html>
This basic setup includes:
A button to start listening to voice input.
A div to display the recognized speech text.
Step 2: Using the Web Speech API
Now, let's implement the speech recognition logic in JavaScript. Create a file called app.js
and add the following code:
// Check for browser compatibility
const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
if (SpeechRecognition) {
console.log('Your browser supports speech recognition.');
// Create a new instance of SpeechRecognition
const recognition = new SpeechRecognition();
// Set recognition properties
recognition.continuous = true; // Keep listening until manually stopped
recognition.interimResults = false; // Don't show interim results
recognition.lang = 'en-US'; // Set language
// Start recognition when the button is clicked
const startBtn = document.getElementById('startBtn');
const output = document.getElementById('output');
startBtn.addEventListener('click', () => {
recognition.start();
console.log('Voice recognition started. Speak into the microphone.');
});
// Handle the result event
recognition.addEventListener('result', (event) => {
const transcript = event.results[event.resultIndex][0].transcript;
output.textContent = transcript; // Display the recognized speech
console.log('Recognized Text:', transcript);
// Perform actions based on voice commands (optional)
if (transcript.toLowerCase().includes('hello')) {
output.textContent += ' - You said hello!';
}
});
// Handle errors
recognition.addEventListener('error', (event) => {
console.error('Speech recognition error:', event.error);
});
} else {
console.log('Speech recognition is not supported in this browser.');
}
Key Components:
SpeechRecognition
: We check for browser support using theSpeechRecognition
orwebkitSpeechRecognition
API.recognition.start()
: This starts the speech recognition process.recognition.addEventListener('result')
: This event fires every time speech is recognized, allowing us to capture the recognized text and display it.
Step 3: Adding Voice Commands
Now that we can recognize speech, let's take it one step further by allowing the web app to respond to specific commands. For example, we can listen for keywords like "background" to change the page’s background color.
recognition.addEventListener('result', (event) => {
const transcript = event.results[event.resultIndex][0].transcript;
output.textContent = transcript; // Display the recognized speech
// Perform actions based on voice commands
if (transcript.toLowerCase().includes('background')) {
document.body.style.backgroundColor = getRandomColor();
output.textContent += ' - Background color changed!';
}
});
// Function to generate a random color
function getRandomColor() {
const letters = '0123456789ABCDEF';
let color = '#';
for (let i = 0; i < 6; i++) {
color += letters[Math.floor(Math.random() * 16)];
}
return color;
}
In this example:
If the user says something containing the word "background," the background color of the page will change to a random color.
We use a simple
getRandomColor()
function to generate the new background color.
Step 4: Extending the App with More Commands
You can extend the app with additional voice commands to perform more actions. Here are some ideas:
Voice navigation: Use commands like "go to about page" to navigate to different sections of your website.
Form filling: Allow users to dictate form data such as name or email.
Media controls: Control music or video playback using voice commands like "play," "pause," or "next."
Let’s add a voice command to control font size:
if (transcript.toLowerCase().includes('bigger text')) {
document.body.style.fontSize = 'larger';
output.textContent += ' - Text size increased!';
}
if (transcript.toLowerCase().includes('smaller text')) {
document.body.style.fontSize = 'smaller';
output.textContent += ' - Text size decreased!';
}
Now, the app responds to voice commands like “bigger text” and “smaller text” by changing the font size of the page.
Step 5: Handling Multiple Languages
The Web Speech API also supports multiple languages. You can change the language by modifying the recognition.lang
property. For example, to recognize speech in Spanish:
recognition.lang = 'es-ES'; // Set to Spanish (Spain)
By supporting multiple languages, you can create a more inclusive and global voice-controlled experience.
Final Project: Voice-Controlled To-Do List
Let’s combine everything we’ve learned and build a simple voice-controlled to-do list. This app will allow users to add tasks by speaking them out loud.
HTML:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Voice-Controlled To-Do List</title>
<style>
body {
font-family: Arial, sans-serif;
padding: 20px;
background-color: #f4f4f4;
}
#output {
margin-top: 20px;
padding: 10px;
border: 1px solid #ccc;
background-color: white;
width: 100%;
max-width: 500px;
min-height: 50px;
}
button {
padding: 10px 20px;
background-color: #007BFF;
color: white;
border: none;
cursor: pointer;
}
ul {
list-style-type: none;
}
li {
padding: 10px 0;
}
</style>
</head>
<body>
<h1>Voice-Controlled To-Do List</h1>
<button id="startBtn">Start Voice Recognition</button>
<ul id="todoList"></ul>
<div id="output">Say "add task" to create a to-do.</div>
<script src="app.js"></script>
</body>
</html>
JavaScript (app.js):
const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
const recognition = new SpeechRecognition();
recognition.continuous = true;
recognition.lang = 'en-US';
const startBtn = document.getElementById('startBtn');
const output = document.getElementById('output');
const todoList = document.getElementById('todoList');
startBtn.addEventListener('click', () => {
recognition.start();
});
recognition.addEventListener('result', (event) => {
const transcript = event.results[event.resultIndex][0].transcript;
output.textContent = transcript;
if (transcript.toLowerCase().includes('add task')) {
const task = transcript.replace('add task', '').trim();
const li = document.createElement('li');
li.textContent = task;
todoList.appendChild(li);
output.textContent += ' - Task added!';
}
});
recognition.addEventListener('error', (event) => {
console.error('Error:', event.error);
});
Conclusion
With the basic foundation in place, you can now extend this further to create more sophisticated voice-controlled web applications. Whether it’s for accessibility, productivity, or just fun, the possibilities are vast!
Happy coding!