BigQuery FARM_FINGERPRINT Collision case

0

The farm_fingerprint value in BigQuery is same for two different strings. Any Ideas why? It returns -2660876244907183769

SELECT id1, id2, id1=id2 AS is_equal
FROM (SELECT FARM_FINGERPRINT(TO_JSON_STRING(STRUCT('19BD0AF0854E2B90E10080000A802438','599D7E2A47B31E20E10080000A7824B8','001','020','100'))) AS id1,
FARM_FINGERPRINT(TO_JSON_STRING(STRUCT('DCE500729B5800F0E10080010A7824BA','5AF0A97293195320E10080010A782421','001','001','110'))) AS id2)
google-bigquery hash
2021-11-24 00:09:05
1

0

In general it is rather trivial to find collisions in any 64 bit hash. So, none of 64 bit hashes can guarantee you uniqueness when large amount of values is indexed. FARM_FINGERPRINT uses Fingerprint64 function in farmhash library which is a 64bit hash algorithm, so you might as well use a different hashing function like MD5, SHA256, SHA512, etc. as it's more standardized. See more hashing functions.

Also a public issue tracker was opened regarding this similar issue but it was eventually closed since collisions using any hash algorithm is bound to happen. But it might still be a very long time. See https://crypto.stackexchange.com/questions/47809/why-havent-any-sha-256-collisions-been-found-yet

2021-11-24 05:20:21

Thanks for the clarification!
Shawn

@Shawn, if this answered your question, consider accepting it by clicking the check mark on the left side. Also see What should I do when someone answers my question?
Dondi

In other languages

This page is in other languages

Русский
..................................................................................................................
Italiano
..................................................................................................................
Polski
..................................................................................................................
Română
..................................................................................................................
한국어
..................................................................................................................
हिन्दी
..................................................................................................................
Français
..................................................................................................................
Türk
..................................................................................................................
Česk
..................................................................................................................
Português
..................................................................................................................
ไทย
..................................................................................................................
中文
..................................................................................................................
Español
..................................................................................................................
Slovenský
..................................................................................................................