|
我有两组数据。现有客户和潜在客户。& G4 x" G/ b2 W" O( |; J. f% b7 ?
我的主要目标是找出是否有潜在客户已经是现有客户。然而,跨数据集的客户名称不一致。/ N0 {1 c# m' Y6 W9 b
现有客户 k' P0 o8 B. T: [
Customer / IDEd's Barbershop / 1002GroceryTown / 1003Candy Place / 1004Handy Man / 1005潜在客户
) Q2 g+ J5 { |) L; {% D; Y8 p8 d* aCustomerEds BarbershopGrocery TownCandy PlaceHandee ManBeauty SalonThe Apple FarmIgloo Ice CreamRide-a-Long Bikes我想写一些选择性的句子来实现我的目标:6 p$ y5 {% q# m
SELECT a.Customer,b.IDFROM PotentialCustomers a LEFT JOIN ExistingCustomers B ON a.Customer = b.Customer结果如下:
* ^. {8 m2 l& @7 I2 D: V! J$ \Customer / IDEds Barbershop / 1002Grocery Town Candy Place / 1004Handee Man / 1005Beauty Salon / NULLThe Apple Farm / NULLIgloo Ice Cream / NULLRide-a-Long Bikes / NULL我对Levenshtein距离和Double Metaphone概念含糊其词,但我不确定如何在这里应用。
2 Y! J8 B3 |/ B3 g9 ?" L理想情况下,我希望SELECT语句的JOIN部分阅读类似以下内容:LEFT JOIN ExistingCustomers as B WHEREa.Customer LIKE b.Customer但我知道语法不正确。
/ ]9 @7 v' a1 L( P; [! f2 X8 Z3 r欢迎提出任何建议。!$ [, u0 y; l- ~6 }2 A$ T: x
$ D7 ]) p- U& n+ [- E, ^1 }3 E- Z 解决方案:
- p$ P2 B4 M8 O; \, u$ R7 ? 这是使用Levenshtein Distance可完成的方法:
2 G7 t$ J# M& \" D创建此函数:(首先执行此操作)
/ ^2 a' d( L1 z! }, ?, ^1 NCREATE FUNCTION ufn_levenshtein(@s1 nvarchar(3999),@s2 nvarchar(3999))RETURNS intASBEGIN DECLARE @s1_len int,@s2_len int DECLARE @i int,@j int,@s1_char nchar,@c int,@c_temp int DECLARE @cv0 varbinary(8000),@cv1 varbinary(8000) SELECT @s1_len = LEN(@s1), @s2_len = LEN(@s2), @cv1 = 0x0000, @j = 1,@i = 1,@c = 0 WHILE @j @c_temp SET @c = @c_temp SET @c_temp = CAST(SUBSTRING(@cv1,@j @j 1,2) AS int) 1 IF @c > @c_temp SET @c = @c_temp SELECT @cv0 = @cv0 CAST(@c AS binary(2)),@j = @j 1 END SELECT @cv1 = @cv0,@i = @i 1 END RETURN @cEND(功能由Joseph Gama开发)7 E& `+ i" l. c$ ~2 t J- L7 ?
然后只需使用此查询即可获得匹配项! t* V( Q: C; L
SELECT A.Customer, b.ID, b.CustomerFROM #POTENTIALCUSTOMERS a LEFT JOIN #ExistingCustomers b ON dbo.ufn_levenshtein(REPLACE(A.Customer,REPLACE(B.Customer, ,') 创建此函数后,请完成脚本:
( ?- y7 ^% G2 h$ rIF OBJECT_ID('tempdb..#ExistingCustomers') IS NOT NULL DROP TABLE #ExistingCustomers;CREATE TABLE #ExistingCustomers(Customer VARCHAR(255),ID INT);INSERT INTO #ExistingCustomersVALUES('Ed''s Barbershop',1002);INSERT INTO #ExistingCustomersVALUES('GroceryTown',1003);INSERT INTO #ExistingCustomersVALUES('Candy Place',1004);INSERT INTO #ExistingCustomersVALUES('Handy Man',1005);IF OBJECT_ID('tempdb..#POTENTIALCUSTOMERS') IS NOT NULL DROP TABLE #POTENTIALCUSTOMERS;CREATE TABLE #POTENTIALCUSTOMERS(Customer VARCHAR(255));INSERT INTO #POTENTIALCUSTOMERSVALUES('Eds Barbershop');INSERT INTO #POTENTIALCUSTOMERSVALUES('Grocery Town');INSERT INTO #POTENTIALCUSTOMERSVALUES('Candy Place');INSERT INTO #POTENTIALCUSTOMERSVALUES('Handee Man');INSERT INTO #POTENTIALCUSTOMERSVALUES('Beauty Salon');INSERT INTO #POTENTIALCUSTOMERSVALUES('The Apple Farm');INSERT INTO #POTENTIALCUSTOMERSVALUES('Igloo Ice Cream');INSERT INTO #POTENTIALCUSTOMERSVALUES('Ride-a-Long Bikes');SELECT A.Customer, b.ID, b.CustomerFROM #POTENTIALCUSTOMERS a LEFT JOIN #ExistingCustomers b ON dbo.ufn_levenshtein(REPLACE(A.Customer,REPLACE(B.Customer, ,') 在这里,您可以在以下网站上找到它T-
3 u; D2 z" E: Q8 \( ASQL示例:http://www.kodyaz.com/articles/fuzzy-string-matching-using-levenshtein-) h, W n2 V" `
distance-sql-server.aspx |
|