Python计算序列相似度的算法实例
作者:Eyizoha
这篇文章主要介绍了Python计算序列相似度的算法实例,求两个序列转换的最少交换步骤和最小交换距离,本文提供了部分实现代码与解决思路,对开发非常有帮助,需要的朋友可以参考下
Python计算序列相似度的算法
代码
位方差(location square deviation, LSD)
def location_square_deviation(lst_1, lst_2=None): n = len(lst_1) lst = lst_1.copy() if lst_2 is not None: if n != len(lst_2): return False for i in range(n): # 以lst2为映射表,将lst1映射为lst可直接与[0,1,2,...]比较 lst[lst_1.index(lst_2[i])] = i s = 0 for i in range(n): s += (lst[i]-i) ** 2 s /= n return s
位均差(location mean deviation, LMD)
def location_mean_deviation(lst_1, lst_2=None): n = len(lst_1) lst = lst_1.copy() if lst_2 is not None: if n != len(lst_2): return False for i in range(n): lst[lst_1.index(lst_2[i])] = i s = 0 for i in range(n): s += abs(lst[i]-i) s /= n return s
交换差(swap deviation, SD)
def swap_deviation(lst_1, lst_2=None): n = len(lst_1) lst = lst_1.copy() if lst_2 is not None: if n != len(lst_2): return False for i in range(n): lst[lst_1.index(lst_2[i])] = i count = 0 # 计算序列中的循环数 for i in range(n): if lst[i] == -1: continue p = i while lst[p] != -1: q = lst[p] lst[p] = -1 p = q count += 1 return n - count # 序列长减去循环数即为最小交换次数
交换距离差(swap distance deviation, SDD)
def swap_distance_deviation(lst_1, lst_2=None): n = len(lst_1) lst = lst_1.copy() if lst_2 is not None: if n != len(lst_2): return False for i in range(n): lst[lst_1.index(lst_2[i])] = i swap_lst = [] weight = 0 while location_mean_deviation(lst) != 0: r_best = 0 # 最佳交换收益 i_best = 0 j_best = 0 for i in range(n): for j in range(i+1, n): # 遍历所有交换,寻找最佳交换步骤 # 交换收益r=交换后位均差的下降值ΔLMD(A,B)/交换距离(j-i) # 令交换距离恒为1可求最少交换步骤&最少交换次数 r = ((abs(lst[i]-i)+abs(lst[j]-j)) - (abs(lst[j]-i)+abs(lst[i]-j)))/(j-i) if r > r_best: r_best = r i_best = i j_best = j lst[i_best], lst[j_best] = lst[j_best], lst[i_best] weight += (j_best-i_best) swap_lst.append((i_best, j_best)) # return swap_lst # 求最小交换距离的步骤(交换距离为1则是求最少交换步骤) return weight
值方差(value square deviation, VSD)
def value_square_deviation(lst_1, lst_2=None): n = len(lst_1) if lst_2 is not None: if n != len(lst_2): return False else: lst_2 = [i for i in range(n)] s = 0 for i in range(n): s += (lst_1[i] - lst_2[i]) ** 2 s /= n return s
值均差(value mean deviation, VMD)
def value_mean_deviation(lst_1, lst_2=None): n = len(lst_1) if lst_2 is not None: if n != len(lst_2): return False else: lst_2 = [i for i in range(n)] s = 0 for i in range(n): s += abs(lst_1[i] - lst_2[i]) s /= n return s
点积比(dot product ratio, DPR)
def dot_product_ratio(lst_1, lst_2=None): n = len(lst_1) if lst_2 is not None: if n != len(lst_2): return False else: lst_2 = [i for i in range(n)] s = 0 max_s = 0 for i in range(n): s += lst_1[i] * lst_2[i] max_s += lst_1[i] ** 2 s /= max_s return s
归一化点积比(normalization dot product ratio, NDPR)
def normalization_dot_product_ratio(lst_1, lst_2=None): n = len(lst_1) if lst_2 is not None: if n != len(lst_2): return False else: lst_2 = [i for i in range(n)] s = (2*n-1)/(n+1)*dot_product_ratio(lst_1, lst_2)-(n-2)/(n+1) return s
到此这篇关于Python计算序列相似度的算法实例的文章就介绍到这了,更多相关Python序列相似度内容请搜索脚本之家以前的文章或继续浏览下面的相关文章希望大家以后多多支持脚本之家!