org.apache.commons.codec.language

Class Soundex

public class Soundex extends Object implements StringEncoder

Encodes a string into a Soundex value. Soundex is an encoding used to relate similar names, but can also be used as a general purpose scheme to find word with similar phonemes.

Version: $Id: Soundex.java 130399 2004-07-07 23:15:24Z ggregory $

Author: Apache Software Foundation

Field Summary
intmaxLength
The maximum length of a Soundex code - Soundex codes are only four characters by definition.
char[]soundexMapping
Every letter of the alphabet is "mapped" to a numerical value.
static SoundexUS_ENGLISH
An instance of Soundex using the US_ENGLISH_MAPPING mapping.
static char[]US_ENGLISH_MAPPING
This is a default mapping of the 26 letters used in US English.
static StringUS_ENGLISH_MAPPING_STRING
This is a default mapping of the 26 letters used in US English.
Constructor Summary
Soundex()
Creates an instance using US_ENGLISH_MAPPING
Soundex(char[] mapping)
Creates a soundex instance using the given mapping.
Method Summary
intdifference(String s1, String s2)
Encodes the Strings and returns the number of characters in the two encoded Strings that are the same.
Objectencode(Object pObject)
Encodes an Object using the soundex algorithm.
Stringencode(String pString)
Encodes a String using the soundex algorithm.
chargetMappingCode(String str, int index)
Used internally by the SoundEx algorithm.
intgetMaxLength()
Returns the maxLength.
char[]getSoundexMapping()
Returns the soundex mapping.
charmap(char ch)
Maps the given upper-case character to it's Soudex code.
voidsetMaxLength(int maxLength)
Sets the maxLength.
voidsetSoundexMapping(char[] soundexMapping)
Sets the soundexMapping.
Stringsoundex(String str)
Retreives the Soundex code for a given String object.

Field Detail

maxLength

private int maxLength

Deprecated: This feature is not needed since the encoding size must be constant. Will be removed in 2.0.

The maximum length of a Soundex code - Soundex codes are only four characters by definition.

soundexMapping

private char[] soundexMapping
Every letter of the alphabet is "mapped" to a numerical value. This char array holds the values to which each letter is mapped. This implementation contains a default map for US_ENGLISH

US_ENGLISH

public static final Soundex US_ENGLISH
An instance of Soundex using the US_ENGLISH_MAPPING mapping.

See Also: US_ENGLISH_MAPPING

US_ENGLISH_MAPPING

public static final char[] US_ENGLISH_MAPPING
This is a default mapping of the 26 letters used in US English. A value of 0 for a letter position means do not encode.

See Also: (char[])

US_ENGLISH_MAPPING_STRING

public static final String US_ENGLISH_MAPPING_STRING
This is a default mapping of the 26 letters used in US English. A value of 0 for a letter position means do not encode.

(This constant is provided as both an implementation convenience and to allow Javadoc to pick up the value for the constant values page.)

See Also: US_ENGLISH_MAPPING

Constructor Detail

Soundex

public Soundex()
Creates an instance using US_ENGLISH_MAPPING

See Also: (char[]) US_ENGLISH_MAPPING

Soundex

public Soundex(char[] mapping)
Creates a soundex instance using the given mapping. This constructor can be used to provide an internationalized mapping for a non-Western character set. Every letter of the alphabet is "mapped" to a numerical value. This char array holds the values to which each letter is mapped. This implementation contains a default map for US_ENGLISH

Parameters: mapping Mapping array to use when finding the corresponding code for a given character

Method Detail

difference

public int difference(String s1, String s2)
Encodes the Strings and returns the number of characters in the two encoded Strings that are the same. This return value ranges from 0 through 4: 0 indicates little or no similarity, and 4 indicates strong similarity or identical values.

Parameters: s1 A String that will be encoded and compared. s2 A String that will be encoded and compared.

Returns: The number of characters in the two encoded Strings that are the same from 0 to 4.

Throws: EncoderException if an error occurs encoding one of the strings

Since: 1.3

See Also: difference MS T-SQL DIFFERENCE

encode

public Object encode(Object pObject)
Encodes an Object using the soundex algorithm. This method is provided in order to satisfy the requirements of the Encoder interface, and will throw an EncoderException if the supplied object is not of type java.lang.String.

Parameters: pObject Object to encode

Returns: An object (or type java.lang.String) containing the soundex code which corresponds to the String supplied.

Throws: EncoderException if the parameter supplied is not of type java.lang.String IllegalArgumentException if a character is not mapped

encode

public String encode(String pString)
Encodes a String using the soundex algorithm.

Parameters: pString A String object to encode

Returns: A Soundex code corresponding to the String supplied

Throws: IllegalArgumentException if a character is not mapped

getMappingCode

private char getMappingCode(String str, int index)
Used internally by the SoundEx algorithm. Consonants from the same code group separated by W or H are treated as one.

Parameters: str the cleaned working string to encode (in upper case). index the character position to encode

Returns: Mapping code for a particular character

Throws: IllegalArgumentException if the character is not mapped

getMaxLength

public int getMaxLength()

Deprecated: This feature is not needed since the encoding size must be constant. Will be removed in 2.0.

Returns the maxLength. Standard Soundex

Returns: int

getSoundexMapping

private char[] getSoundexMapping()
Returns the soundex mapping.

Returns: soundexMapping.

map

private char map(char ch)
Maps the given upper-case character to it's Soudex code.

Parameters: ch An upper-case character.

Returns: A Soundex code.

Throws: IllegalArgumentException Thrown if ch is not mapped.

setMaxLength

public void setMaxLength(int maxLength)

Deprecated: This feature is not needed since the encoding size must be constant. Will be removed in 2.0.

Sets the maxLength.

Parameters: maxLength The maxLength to set

setSoundexMapping

private void setSoundexMapping(char[] soundexMapping)
Sets the soundexMapping.

Parameters: soundexMapping The soundexMapping to set.

soundex

public String soundex(String str)
Retreives the Soundex code for a given String object.

Parameters: str String to encode using the Soundex algorithm

Returns: A soundex code for the String supplied

Throws: IllegalArgumentException if a character is not mapped

commons-codec version 1.3 - Copyright © 2002-2004 - Apache Software Foundation