Windows Mobile Support

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Friday, 8 March 2013

Compare two person name (Are the names similar?)

Posted on 00:50 by Unknown

Working on a personal project I need a mechanism to compare two names and find out if they are similar. The first think that I done was to try to define a regular expression that could help me.
I end up studying different algorithms that can measure the distance between two sequences.  In this post I will talk about what algorithms I found and what I finally used.
First algorithm that I found was the based on Hammed distance. This algorithm measure the minimum number of substitutions that need to be done to change a string to another string. This is a great algorithm that works very well when we have string with the same length.
Another interesting algorithm was based on Euclidean distance and is used to measure the distance between two points. It is based on Pythagorean formula. It is a pretty complicated algorithm and in my case was too complicated. I need a simple solution, that doesn't consume a lot of resources. I don’t need exact match each time.
The 3th algorithm that I discover was Jaccard similarity coefficient. Using this solution you can obtain the similarity and diversity value of two sets – in our case two strings.
In the end I found what I need. The Levenshtein distance compare two sequences and measure the difference between them. It will give us how many changes we need to make to the first sequence to obtain the second one.
It is pretty similar to Hammed algorithm, but in my case this solution is good and resolves my problem. The algorithm returns a positive number. In my case when the returned value is 2 or lower I consider that the two names are similar.
Of course there are also a lot of solutions, different components that resolve this problem. In my case this was the perfect solution. The implementation in C# can be found at the end of the post.
You will observe that the algorithm is recursive. We can rewrite it in different ways of course.
Recursive version:
int LevenshteinDistance(string s, string t)
{
int n = s.Length;
int m = t.Length;

if (n == 0)
{
return m;
}
if (m == 0)
{
return n;
}

int cost = s[n - 1] == t[m - 1]
? 0
: 1;

return
Math.Min(
Math.Min(LevenshteinDistance(s.Substring(0, n - 1), t) + 1,
LevenshteinDistance(s, t.Substring(0, m - 1)) + 1),
LevenshteinDistance(s.Substring(0, n - 1), t.Substring(0, m - 1)) + cost);
}
Non recursive version:
int LevenshteinDistance(string s, string t)
{
int n = s.Length;
int m = t.Length;
int[,] d = new int[n + 1, m + 1];

if (n == 0)
{
return m;
}

if (m == 0)
{
return n;
}

for (int i = 0; i <= n; d[i, 0] = i++){ }
for (int j = 0; j <= m; d[0, j] = j++){ }

for (int i = 1; i <= n; i++)
{
for (int j = 1; j <= m; j++)
{
int cost = (t[j - 1] == s[i - 1])
? 0
: 1;
d[i, j] = Math.Min(
Math.Min(
d[i - 1, j] + 1,
d[i, j - 1] + 1),
d[i - 1, j - 1] + cost);
}
}
return d[n, m];
}
Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest
Posted in | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Service Bus Topic - Automatic forward messages from a subscription to a topic
    Windows Azure Service Bus Topic is a service that enables us to distribute the same messages to different consumers without having to know e...
  • Patterns in Windows Azure Service Bus - Message Splitter Pattern
    In one of my post about Service Bus Topics from Windows Azure I told you that I will write about a post that describe how we can design an a...
  • CDN is not the only solution to improve the page speed - Reverse Caching Proxy
    I heard more and more often think like this: “If your website is to slow, you should use a CDN.” Great, CDN is THE solution for any kind of ...
  • E-Learning Vendors Attempt to Morph Mobile
    The sign should read: " Don't touch! Wet Paint !" I had a good chuckle today after receiving my latest emailed copy of the eLe...
  • Content Types - Level 6: Rich Media
    Level 6: Rich Media NOTE: This is part 7 of 7 and the conclusion of this continuing series; please see earlier posts for more background inf...
  • Publishing our CellCast Widget for iPad
    The rush has been on this week as our development team worked to design a new version of our CellCast Widget specifically for Apple's up...
  • Content Types - Level 5: Courseware
    Level 5: Content and Courseware NOTE: This is part 6 of 7 in a continuing series; please see earlier posts for more background information. ...
  • SQL - UNION and UNION ALL
    I think that all of us used until now UNION in a SQLstatement. Using this operator we can combine the result of 2 queries. For example we wa...
  • Cum sa salvezi un stream direct intr-un fisier
    Cred ca este a 2-a oara când întâlnesc aceasta cerința in decurs de câteva săptămâni. Se da un stream și o locație unde trebuie salvat, se c...
  • Content Types - Level 4: Reference
    Level 4: Reference Materials & Static Content NOTE: This is part 5 of 7 in a continuing series; please see earlier posts for more backgr...

Categories

  • .NET
  • .NET nice to have
  • #if DEBUG
  • 15 iunie 2011
  • 15 octombrie 2011
  • 2011
  • abstracta
  • action
  • adaugare
  • ajax
  • Amsterdam
  • Android
  • aplicatii
  • App Fabric
  • Apple iSlate
  • array
  • as
  • ASP.NET
  • AsReadOnly
  • Assembly comun
  • async
  • Asynchronous programming
  • asyncron
  • Autofac
  • AutoMapper
  • az
  • Azure
  • Azure AppFabric Cache
  • Azure backup solution
  • Azure Storage Explorer
  • azure. cloud
  • backup
  • BCP utility
  • bing maps v7
  • BitArray
  • BlackBerry
  • blob
  • BlobContainerPublicAccessType
  • breakpoint
  • bucuresti
  • C#
  • cache
  • CallerMemberName
  • CellCast
  • Certificate
  • CES
  • change
  • ChannelFactory
  • clasa
  • classinitialize
  • clean code
  • click event
  • close
  • Cloud
  • Cluj
  • cluj-napoca
  • Code contracts
  • code retrat
  • codecamp
  • CollectionAssert
  • Compact Edition
  • compara
  • Comparer T .Default
  • CompareTo
  • comparison
  • comunitate
  • concurs
  • Conditional attribute
  • configurare
  • connection string
  • container
  • content type
  • control
  • Convert
  • convertAll
  • convertor
  • cross platform
  • CRUD
  • css
  • custom properties
  • custom request
  • DACPAC
  • Daniel Andres
  • data sync service
  • database
  • date time
  • datetime
  • debug
  • default
  • delegate
  • dependency injection
  • deploy
  • DeploymentItem
  • design patterns
  • Dev de Amsterdam
  • development stoage
  • dictionary
  • diferente
  • digging
  • director
  • Directory.Exist
  • disable
  • dispatcher
  • dispose
  • dropdown
  • dynamic
  • EF
  • email
  • encoding
  • entity framework
  • enum
  • enumerable
  • Environment.NewLine
  • error
  • error 404
  • error handling
  • eveniment
  • event
  • ews
  • excel
  • exception
  • exchange
  • exita
  • explicit
  • export
  • extension
  • field
  • File.Exist
  • finalize
  • fire and forget
  • Fluent interface pattern
  • format
  • func
  • GC.SuppressFinalize
  • generic
  • getdirectoryname
  • globalization
  • gmail
  • hackathon
  • Hadoop
  • handle
  • HTML
  • html 5
  • Html.ActionLink
  • http://www.blogger.com/img/blank.gif
  • HttpModule
  • IComparable
  • IE
  • ienumerable
  • IIS
  • image
  • implicit
  • import
  • int
  • internationalization
  • Internet Explorer
  • interop
  • Ioc
  • IP Filter
  • iPhone
  • iQuest
  • IStructuralEquatable
  • ITCamp
  • itspark
  • java script
  • javascript
  • July 2012
  • KeyedByTypeCollection
  • KeyNotFoundException
  • Kinect SDK
  • lambda expression
  • LightSwitch Microsoft Silverlight
  • linq
  • list
  • lista
  • lista servicii
  • liste
  • Live Connect
  • Live ID
  • load
  • localization
  • lock
  • m-learning
  • MAC
  • Mango
  • map
  • mapare
  • mapare propietati
  • messagequeue
  • meta properties
  • method
  • MethodImpl
  • Metro App
  • Microsoft
  • Microsoft Sync Framework
  • mlearning
  • mlearning devices
  • Mobile Apps
  • mobile in the cloud
  • mobile learning
  • mobile services
  • Mobile Web
  • mongoDb
  • monitorizare
  • msmq
  • multitasking
  • MVC
  • MVC 3
  • MVVM
  • namespace
  • nextpartitionkey
  • nextrowkey
  • Ninject
  • nivel acces
  • no result
  • normalize
  • nosql
  • null expcetion
  • null object pattern
  • NullReferenceException
  • OAuth API
  • office
  • offline
  • Open ID
  • openhackeu2011
  • operations
  • operator
  • optimization
  • option
  • outputcache
  • OutputCacheProvider
  • override
  • paginare
  • pagination
  • path
  • persistare
  • Portable Library tool
  • Post event – CodeCamp Cluj-Napoca
  • predicate
  • predictions
  • prezentare
  • process
  • proiect
  • property
  • propietati
  • query
  • ReadOnlyCollection
  • ReadOnlyDictionary
  • referinta
  • reflection
  • remote
  • reply command
  • request
  • request response
  • resouce
  • REST
  • REST Client
  • RESTSharp
  • ronua
  • rss
  • rulare
  • salvare in fisier
  • sc
  • schimbare timp
  • select
  • select nodes
  • send
  • serializare
  • serialization
  • Server.Transfer. Resposen.Redirect
  • service bus
  • ServiceBase
  • servicecontroller
  • sesiune
  • session
  • Session_End
  • Session_Start
  • setup
  • Sibiu
  • signalR
  • Silverlight
  • sincronizare
  • Single Responsibility Principle
  • SkyDrive
  • skype
  • smartphones
  • smtp
  • Snapguide
  • sniffer
  • socket
  • solid
  • spec#
  • sql
  • Sql Azure
  • SQL CE
  • sql server 2008 RC
  • SRP
  • startuptype
  • stateful
  • stateless
  • static
  • stergere
  • store
  • store procedure
  • stream
  • string
  • string.join
  • struct
  • StructuralEqualityComparer
  • submit
  • switch
  • Symbian
  • Synchronized
  • system
  • tabele
  • table
  • techEd 2012
  • tempdata
  • test
  • testcleanup
  • testinitialize
  • testmethod
  • thread
  • timer
  • ToLower
  • tool
  • tostring
  • Total Cost Calculator
  • trace ASP.NET
  • transcoding
  • tuplu
  • tutorial
  • TWmLearning
  • type
  • unit test
  • unittest
  • UrlParameter.Optional
  • Validate
  • validation
  • verificare
  • video
  • view
  • ViewBag
  • virtual
  • visual studio
  • VM role
  • Vunvulea Radu
  • wallpaper
  • WCF
  • WebBrower
  • WebRequest
  • where clause
  • Windows
  • windows 8
  • Windows Azure
  • Windows Azure Service Management CmdLets
  • windows live messenger
  • Windows Mobile
  • Windows Phone
  • windows service
  • windows store application
  • Windows Task
  • WinRT
  • word
  • workaround
  • XBox
  • xml
  • xmlns
  • XNA
  • xpath
  • YMesseger
  • Yonder
  • Zip

Blog Archive

  • ▼  2013 (139)
    • ►  November (17)
    • ►  October (12)
    • ►  September (10)
    • ►  August (7)
    • ►  July (8)
    • ►  June (15)
    • ►  May (12)
    • ►  April (17)
    • ▼  March (16)
      • (Part 2) Testing the limits of Windows Azure Servi...
      • Microsoft comes to Cluj-Napoca to talk about Windo...
      • (Part 1) Testing the limits of Windows Azure Servi...
      • Cluj IT Innovation Days, 03.22.2013 - Why people s...
      • [Post Event] Codecamp event in Cluj-Napoca, March 20
      • WCF Client and multiply IP address - How to spec...
      • Windows Azure Service Bus - Schedule Message Delivery
      • Shadows in a Windows 8 App
      • What are the steps done to authentication Windows ...
      • Microsoft in the Open-Source World
      • Different methods to cancel a Task
      • Compare two person name (Are the names similar?)
      • Today Software Magazine, number 9 - Microsoft in o...
      • Windows Azure - Filter Subscriptions feature
      • Codecamp event in Cluj-Napoca, March 20
      • Windows Azure Mobile Services supports Android
    • ►  February (9)
    • ►  January (16)
  • ►  2012 (251)
    • ►  December (9)
    • ►  November (19)
    • ►  October (26)
    • ►  September (13)
    • ►  August (35)
    • ►  July (28)
    • ►  June (27)
    • ►  May (24)
    • ►  April (18)
    • ►  March (17)
    • ►  February (20)
    • ►  January (15)
  • ►  2011 (127)
    • ►  December (11)
    • ►  November (20)
    • ►  October (8)
    • ►  September (8)
    • ►  August (8)
    • ►  July (10)
    • ►  June (5)
    • ►  May (8)
    • ►  April (9)
    • ►  March (14)
    • ►  February (20)
    • ►  January (6)
  • ►  2010 (26)
    • ►  December (1)
    • ►  November (1)
    • ►  October (1)
    • ►  June (2)
    • ►  May (1)
    • ►  April (4)
    • ►  March (1)
    • ►  February (1)
    • ►  January (14)
Powered by Blogger.

About Me

Unknown
View my complete profile