I’m Aravind, the Lead Developer of Search Commands, and I’d like to tell you about our motivation and vision for Search Commands.
When we started this project, our goal was to advance the state of user interface technology to make it more natural. Initially, we thought that a “natural user interface” was mostly about speaking to the computer, but as we looked into user problems by looking through help query logs, newsgroups, books, etc., we came to realize that the real challenge was to make it easy for the computer to better understand the user, and for the user to better understand the application. A conceptual misunderstanding on the user’s part, of how the application works (not EVER the user’s fault by the way) means that neither written nor spoken words can represent the whole picture of the user’s goal. For example, if a user wanted to change the background image on all slides of a PowerPoint presentation and she wasn't aware of the Master Slide feature, she might express her intent as "change background" instead of "setup a master slide". For her to understand that there is a better option, there needs to be some way for the application to have a conversation with her to figure out what she wants.
For example, let’s take a real life example of ordering food in a restaurant. When you are given the menu, you read through it and while some items are clear, for some you might need additional information. When the waiter comes to take your order, you can:
i. Point to a dish you want.
ii. Say the number next to the dish. (e.g. V32 for Mongolian Tofu).
iii. Ask the waiter to tell you which dishes fit particular criteria (e.g., the “vegetarian” options or the dishes that don’t have much spice).
iv. Have a conversation with the waiter where you learn more about the different dishes and the waiter gets a better idea of what you want. For example, listen to the specials, find out what ingredients they have, and pick the one you want.
Depending on your familiarity with the cuisine and the restaurant, you might pick different ways of ordering. In Starbucks I always get a 2% extra hot tall latte, but in an unfamiliar restaurant I would have to talk to the waiter to better understand the different options.
Imagine now that restaurants banned the use of language and so waiters can only take your order if you point to a dish you want or say its number on the menu. You are not allowed to ask any clarifying questions, and so you are limited to just reading the one line descriptions of each item, and pointing to the one you want to order.
In such a world, because of the limitations of how you can express your intent, the design of the menu becomes very important, and restaurants will have to think carefully about their dishes in terms of the ease of ordering rather than the merits of the dish itself. You might take the trouble of reading a book on how to order food in your favorite restaurant. That way when you go there, you will be able to point quickly to the dish you want. Experts might even offer courses in “how to order food.”
After several years of only pointing to menu items in restaurants, restaurants will work on making user-friendly menu systems. They will add picture menus, icons for spice levels, etc. People will debate the merits of different menu designs:
• Some people would wonder why restaurants offer so many choices. Twenty years ago, the beverage menu had only three choices – coffee, tea, and hot chocolate. Now you can order 2% grande mochas with whipped cream. Does anyone really need so many choices? After all, no one orders more than ten different items from the menu. Shouldn’t the menu have these ten choices only and nothing else? Then we can just learn the numbers of the dishes we want, and they will be easy to order.
• Some people would have become experts in a particular menu. If the menu changes, there will feel frustrated – where did number 56 (Mongolian chicken with fried rice) go?
• Some restaurants will advertise their ease of ordering as the differentiating factor. Their simple, clean menus with a small number of items will appeal to people who don’t want to spend a lot of time reading the menu and looking for stuff to point to.
• After years of training, when asked if they would prefer to just tell their order to a waiter, some experts in the old system might think it is ridiculous to talk to the waiter. After all, ordering food is about pointing and choosing – if you need to talk, the restaurant menu is badly designed, and you should just go to another place to eat.
These examples sound silly as we can’t relate to the idea of restaurant menu’s being difficult to use or that the design of their menu’s as being important. That’s because the menu is only a starting point, and we can use language to find out more about different dishes and to place our order. There is no need for us to learn the positions of different dishes on the menu.
However, with computer software we are limited in our ability to express our intent, and are forced to spend a lot of time learning the menu design of applications, when we really only care about doing our work. Ideally, the menu should be the starting point, and you should be able to have a conversation with the application.
Even the best designed menu will still need you to learn some concepts about the application. For example, I came to the U.S. as a grad student, and on my first day I went to a McDonalds. As I’m a vegetarian, I read the menu in order to find something to eat that wouldn’t have meat in it, and I picked the cheeseburger!
Changing the interface of software applications to allow for language input and the ability to have a conversation is a very old and ambitious goal. The challenge for us was to how to take a baby step and still solve a real problem. Search Commands is what we came up with. Given that building true language understanding is very hard, we decided to treat it as a search and UI problem. We would just create a list of keywords and map each keyword to different commands. We would use the frequency of command use to rank the results and use feedback to improve the system. That way if someone types in a word we don’t have in our word list, we can add it later and the system keeps getting better. The second part is about dialog. For selected queries that are ambiguous and require the user to understand a concept, we built lightweight wizards or guided helps (An example is the change background wizard in PowerPoint). In my next blog post, I’ll talk about some of our designs, our findings, and ideas for the future.