計算字符串相似度可以使用一些算法,常用的算法包括編輯距離算法(Levenshtein Distance)、Jaccard相似度等。
以下是使用編輯距離算法計算字符串相似度的示例代碼:
def levenshtein_distance(s1, s2):
if len(s1) < len(s2):
return levenshtein_distance(s2, s1)
if len(s2) == 0:
return len(s1)
previous_row = range(len(s2) + 1)
for i, c1 in enumerate(s1):
current_row = [i + 1]
for j, c2 in enumerate(s2):
insertions = previous_row[j + 1] + 1
deletions = current_row[j] + 1
substitutions = previous_row[j] + (c1 != c2)
current_row.append(min(insertions, deletions, substitutions))
previous_row = current_row
return previous_row[-1]
def similarity(s1, s2):
max_length = max(len(s1), len(s2))
distance = levenshtein_distance(s1, s2)
similarity = 1 - distance / max_length
return similarity
s1 = "hello"
s2 = "hallo"
similarity_score = similarity(s1, s2)
print(f"The similarity score between '{s1}' and '{s2}' is {similarity_score}")
這段代碼會計算字符串 “hello” 和 “hallo” 之間的相似度,輸出結果為:
The similarity score between 'hello' and 'hallo' is 0.8
你可以根據需要修改代碼來計算其他字符串的相似度。