I work adjacent to a group that does speech recognition. There’s a massive amount of variation in regional dialects and that’s before you get to non-native speakers. The you have people like my mother in law who doesn’t have an accent, but her diction and grammar are… unique.
If someone is speaking in sentences you can use context clues to infer intent, but it’s a lot more challenging when you’re just getting spoken commands.
I suspect it’s a training/sample gap, but it’s likely going to be really hard to get to 100%.
I work adjacent to a group that does speech recognition. There’s a massive amount of variation in regional dialects and that’s before you get to non-native speakers. The you have people like my mother in law who doesn’t have an accent, but her diction and grammar are… unique.
If someone is speaking in sentences you can use context clues to infer intent, but it’s a lot more challenging when you’re just getting spoken commands.
I suspect it’s a training/sample gap, but it’s likely going to be really hard to get to 100%.