July 13th, 2007
Browser-based spelling autocorrection described in Microsoft Patent app
Technology described in a newly published Microsoft Patent application would enable automated spell-checking of text entered by the user in a networked Web browser.
The way I read this Patent’s literature, an example of this functionality would be if, when you were accessing the Web through a network-connected broswer, the technology would automatically self-correct your spelling errors as you type them.
For example if you mis-typed: “I orderd this merchandice three weeks ago and it has not yet arrived” into a web based form on an ecommerce website, it would automatically correct your spelling. Or, it could spell-check and autocorrect the text in an email by sending your text to a spelling server for autocorrection before the email is delivered to its ultimate recipient.
The Patent app is entitled Spell checking in network browser based applications. The Abstract furnishes a useful primer about what is going on here:
Spell checking of a document in a network browser based application is performed automatically. Spell checking may be performed in a content page in response to user editing of the document text.
Text entered into a document through a browser application interface is divided into nodes. The nodes may be associated with a section, line or word of text. Each node may be assigned one or more parameters which may indicate whether the node has been spell checked or not.
Selected nodes are sent to a spell check service for spell checking. Correction information is received in response to the spell check request. Once a client device receives the correction information, words within the text range of the document that match identified misspelled words are processed. In one embodiment, matching words are highlighted with a visual indicator to indicate that they may be incorrectly spelled.
I have more details, which are embodied in Figures 5A, 5B and 6A of this Patent application. So let’s go there now for the art, and the relevant text for each section.
FIG. 5A is an example of text to be spell checked. FIG. 5A includes text block 510 and cursor 520. In one embodiment, the text block 510 may be provided in an interface of a content page provided by an email service. The text in text block 510 includes three lines of text.
The first line of text is “The dog,” the second line of text is “jomps over,” and the last line of text is “the fence.” Cursor 520 is currently placed between the o and the m in the word “jumps” in the second line of text.
FIG. 5B is an example of a node tree generated from text entered by a user into an interface. The node tree that a browser builds from interface text is similar to that of an XML tree. In one embodiment, the node tree of FIG. 5B is generated from the text provided in FIG. 5A.
The node tree includes a root node of DIV, and three primary nodes of P. Each P node has a child node, or text node. The text content “The dog,” “jomps over” and “the fence.” are each associated with a particular text node. Each node, including the text nodes, can be associated with one or more node parameters.
In the embodiment illustrated, the node parameters may include a node number and an IsSpellChecked parameter. For example, the root node comprising the block tag DIV has a node number of zero and the child node with the text “The dog” has a node number of two and ISSpellChecked parameter set to true.
As discussed above, the IsSpellChecked parameter indicates if the text associated with the particular node has been spellchecked. In one embodiment, if a node is marked as IsSpellChecked=true, all of that node’s children nodes are assumed to be spellchecked as well.
FIG. 6 illustrates a flowchart of an embodiment of a process for sending text to a server to be spell checked. In general, the text being generated by a user is analyzed to determine portions that have not been spellchecked. The portions not spell-checked are selected and sent to a spell check server.
In one embodiment, less than the entire text may be analyzed. For example, a user may be generating a reply to an email. In this case, the content of the user’s reply is spell checked but the original email content is not spell checked. This limits the spell checking resources to the user generated text, thereby saving time and resources used to spell check the text.
First, a first node in a node tree is selected at step 610. In the node tree of FIG. 5B, the first node would be the P node with node number equaling one and having a child node of “The dog.” Next, a determination is made as to whether the selected node has an IsSpellChecked parameter equal to true at step 615.
This determination identifies nodes that should be spell checked. As discussed above, a node may have an IsSpellChecked parameter equal true, false or be missing the IsSpellChecked parameter. If the IsSpellChecked parameter for the selected node is true, then operation continues to step 645. If the parameter is not true or is not available for that node, operation continues to step 620.
Next, a determination is made as to whether the selected node has children nodes which have an IsSpellChecked parameter equal to true at step 620. With respect to FIG. 5B, the child node of the first node selected has a node number of two and includes the text content “The dog.”
The determination as to whether the child node has an IsSpellChecked parameter with a value of true is the same as that discussed above with respect to step 615. If the selected node has a child node with IsSpellChecked equal to true, operation continues to step 645. If none of the children nodes have IsSpellChecked equal to true, then operation continues to step 625.
For each child node not having IsSpellChecked set to true, the node content is added to an array at step 625. The array includes a list of text to be spell checked. Next, block tag content added to the array is converted into a space at step 640. This step is optional as indicated by the dashed lines comprising the box at step 630 in FIG. 6.
Block tags may include a P tag, BR tag and a DIV tag. A P tag defines a paragraph, a BR tag inserts a simple line break, and a DIV tag defines a division of a section in a document. These block tags may be used in HTML, XHTML and other documents. The block tags are converted into a space in order to avoid concatenating words inadvertently.
Next, the IsSpellChecked parameter associated with the selected child node is changed to set to true at step 635. This indicates that the particular node is considered to be spell checked. Next, boundary nodes associated with the text to be spell checked are updated at step 640. Boundary nodes mark the range of text covered by the spell check being generated. Thus, the boundary nodes indicate the first and last node of the particular range of text being checked.
A determination is made as to whether more nodes should be analyzed at step 645. In one embodiment, the additional nodes to be analyzed are additional child nodes for the root node.
In some instances, although additional child nodes may exist to be spell checked, other limitations may prevent the additional nodes from being added to the array. For example, the text sent to a server to be spell checked may be limited to a maximum size. In one embodiment, a maximum size of a spell check request may be 2K in memory size.
Thus, if additional text in the current node would cause the content of the array to exceed 2 k size, the array may be considered full. In this case, the flow chart would continue to step 655 and selection of nodes for a new array would begin with the current node. If more nodes exist to be analyzed, operation continues to step 650 wherein the next node is selected.
After selecting the next node, the flowchart of FIG. 6 returns to step 615. If no further nodes exist to be analyzed, operation continues to step 655. In one embodiment, although more nodes may exist in the node tree, analyzing the nodes may end before all the nodes have been checked.
This may be the case if a maximum amount of content has already been selected to be spell checked. In this instance, operation would proceed to step 655.
The generated array is combined into a single string of text at step 655. The single string of text can later be processed by a spell checking service.
In one embodiment, elements within the array are separated with a space to avoid processing multiple words as a single word. After combining the array into a single string, the string and the boundary node information are submitted to a spell check server at step 660. In one embodiment, the string and boundary nodes are packaged in a request.
n some embodiments, the boundary nodes are optional and need not be packaged in the request. In this case, they are maintained at the client device in client memory. In this case, the application would pair the boundary information to the set of words to be spell checked. The request is then sent to network server 140. Network server 140 receives the request and forwards the request to any of spell check servers 110-130.
In one embodiment, a document identifier may be packaged in the request in addition to the string and boundary node information. The document identifier identifies a current page or document for which the boundary nodes and text string apply. The document identifier can then be returned in a content response identifying the page.
Browser application 165 may use the document identifier to ensure that any corrections or processing of text was applied to the correct document.
Russell Shaw is an enterprise computing journalist, analyst and author based in Portland, Oregon. See his full profile and disclosure of his industry affiliations.













