Before tackling the differences, let’s recap the similiarities:
2. Unlike their Excel counterparts, they will all return 0 if the substring under scrutiny cannot be found
To better illustrate the differences between the functions, I’ll be using the ‘TED Talks’ dataset from Kaggle, to look at the tags given to each Ted talk.
I want to identify the first position of both ‘technology’ and ‘TED‘, so that I could extract them in a separate string
(1) SQL Server: charindex() and patindex()
Consult the snippet below:
——————–SQL Server——————————————- | |
select | |
tags, | |
charindex('technology', tags) as char_position_tech, | |
patindex('technology', tags) as pat_position_tech, | |
patindex('%technology%', tags) as pat_position_tech_2, | |
charindex('Technology', tags) as char_position_Tech, | |
patindex('%Technology%', tags) as pat_position_Tech, | |
charindex('TED%', tags) as char_position_TED, | |
patindex('%TED%', tags) as pat_position_TED | |
from datageeking.dbo.ted_talks | |
/*tags |char_position_tech |pat_position_tech |pat_position_tech_2 | | |
————————————————————————————-|——————-|——————|——————–| | |
['alternative energy', 'energy', 'exploration', 'green', 'science', 'technology'] |70 |0 |70 | | |
['DNA', 'biology', 'business', 'genetics', 'life', 'science', 'technology'] |64 |0 |64 | | |
['TED Brain Trust', 'children', 'creativity', 'education', 'invention'] |0 |0 |0 | | |
['TEDx', 'business', 'creativity', 'culture', 'design', 'fashion', 'law', 'media'] |0 |0 |0 | | |
['TED Fellows', 'children', 'culture', 'film', 'politics', 'suicide', 'war'] |0 |0 |0 | | |
['AIDS', 'HIV', 'Vaccines', 'disease', 'global issues', 'health', 'medicine'] |0 |0 |0 | | |
['cities', 'music', 'performance', 'poetry', 'storytelling'] |0 |0 |0 | | |
['Internet', 'TEDx', 'law', 'music', 'online video', 'technology', 'web'] |55 |0 |55 | | |
tags |char_position_Tech |pat_position_Tech | | |
————————————————————————————-|——————-|——————| | |
['alternative energy', 'energy', 'exploration', 'green', 'science', 'technology'] |70 |70 | | |
['DNA', 'biology', 'business', 'genetics', 'life', 'science', 'technology'] |64 |64 | | |
['TED Brain Trust', 'children', 'creativity', 'education', 'invention'] |0 |0 | | |
['TEDx', 'business', 'creativity', 'culture', 'design', 'fashion', 'law', 'media'] |0 |0 | | |
['TED Fellows', 'children', 'culture', 'film', 'politics', 'suicide', 'war'] |0 |0 | | |
['AIDS', 'HIV', 'Vaccines', 'disease', 'global issues', 'health', 'medicine'] |0 |0 | | |
['cities', 'music', 'performance', 'poetry', 'storytelling'] |0 |0 | | |
['Internet', 'TEDx', 'law', 'music', 'online video', 'technology', 'web'] |55 |55 |x | |
tags |char_position_TED |pat_position_TED | |
————————————————————————————-|——————|———- | |
['alternative energy', 'energy', 'exploration', 'green', 'science', 'technology'] |0 |0 | |
['DNA', 'biology', 'business', 'genetics', 'life', 'science', 'technology'] |0 |0 | |
['TED Brain Trust', 'children', 'creativity', 'education', 'invention'] |0 |3 | |
['TEDx', 'business', 'creativity', 'culture', 'design', 'fashion', 'law', 'media'] |0 |3 | |
['TED Fellows', 'children', 'culture', 'film', 'politics', 'suicide', 'war'] |0 |3 | |
['AIDS', 'HIV', 'Vaccines', 'disease', 'global issues', 'health', 'medicine'] |0 |0 | |
['cities', 'music', 'performance', 'poetry', 'storytelling'] |0 |0 | |
['Internet', 'TEDx', 'law', 'music', 'online video', 'technology', 'web'] |0 |15 | |
*/ |
Whilst both are case insensitive and could easily be used interchangeably if you’re searching for a precise string, that’s not the case if you’re after a string that could have differences in spelling or vary slightly throughout the dataset
NB! Note that patindex() always requires you to encircle your string in % signs. Else, it won’t find it. Conversely, adding % signs to charindex() will return nothing but 0s
Charindex(), however, allows you to specify the starting position of your search. That’s not the case with patindex()
(2) MySQL: instr() and locate()
Alike charindex() in SQL Server, neither instr() or locate() will work as expected if a % sign is included in the search.
Both instr() and locate() are case-insensitive
/* MySQL */ | |
select | |
tags, | |
instr(tags, 'TED'), | |
locate('TED', tags), | |
locate('TED', tags, 5), | |
instr(tags, 'ted'), | |
locate('ted', tags), | |
locate('ted', tags, 5), | |
instr(tags, 'TED%'), | |
locate('TED%', tags), | |
locate('TED%', tags, 5) | |
from datageeking.ted_talks | |
/* | |
tags |instr(tags, 'TED') |locate('TED', tags) |locate('TED', tags, 5) | | |
———————————————-|——————-|——————–|———————–| | |
['children', 'creativity', 'culture', 'dance' |0 |0 |0 | | |
['alternative energy', 'cars', 'climate chang |0 |0 |0 | | |
['computers', 'entertainment', 'interface des |0 |0 |0 | | |
['MacArthur grant', 'activism', 'business', ' |0 |0 |0 | | |
['Africa', 'Asia', 'Google', 'demo', 'economi |0 |0 |0 | | |
['business', 'culture', 'entertainment', 'goa |0 |0 |0 | | |
['Christianity', 'God', 'atheism', 'comedy', |0 |0 |0 | | |
['architecture', 'collaboration', 'culture', |0 |0 |0 | | |
['God', 'TED Brain Trust', 'atheism', 'brain' |10 |10 |10 | | |
['Christianity', 'God', 'culture', 'happiness |0 |0 |0 | | |
['activism', 'architecture', 'collaboration', |0 |0 |0 | | |
['TED Prize', 'art', 'culture', 'entertainmen |3 |3 |0 | | |
['TED Prize', 'collaboration', 'disease', 'eb |3 |3 |0 | | |
['demo', 'design', 'interface design', 'techn |0 |0 |0 | | |
['children', 'design', 'education', 'entrepre |0 |0 |0 | | |
tags |instr(tags, 'ted') |locate('ted', tags) |locate('ted', tags, 5) | | |
———————————————-|——————-|——————–|———————–| | |
['children', 'creativity', 'culture', 'dance' |0 |0 |0 | | |
['alternative energy', 'cars', 'climate chang |0 |0 |0 | | |
['computers', 'entertainment', 'interface des |0 |0 |0 | | |
['MacArthur grant', 'activism', 'business', ' |0 |0 |0 | | |
['Africa', 'Asia', 'Google', 'demo', 'economi |0 |0 |0 | | |
['business', 'culture', 'entertainment', 'goa |0 |0 |0 | | |
['Christianity', 'God', 'atheism', 'comedy', |0 |0 |0 | | |
['architecture', 'collaboration', 'culture', |0 |0 |0 | | |
['God', 'TED Brain Trust', 'atheism', 'brain' |10 |10 |10 | | |
['Christianity', 'God', 'culture', 'happiness |0 |0 |0 | | |
['activism', 'architecture', 'collaboration', |0 |0 |0 | | |
['TED Prize', 'art', 'culture', 'entertainmen |3 |3 |0 | | |
['TED Prize', 'collaboration', 'disease', 'eb |3 |3 |0 | | |
['demo', 'design', 'interface design', 'techn |0 |0 |0 | | |
['children', 'design', 'education', 'entrepre |0 |0 |0 | | |
tags |instr(tags, 'TED%') |locate('TED%', tags) |locate('TED%', tags, 5) | | |
———————————————-|——————–|———————|————————| | |
['children', 'creativity', 'culture', 'dance' |0 |0 |0 | | |
['alternative energy', 'cars', 'climate chang |0 |0 |0 | | |
['computers', 'entertainment', 'interface des |0 |0 |0 | | |
['MacArthur grant', 'activism', 'business', ' |0 |0 |0 | | |
['Africa', 'Asia', 'Google', 'demo', 'economi |0 |0 |0 | | |
['business', 'culture', 'entertainment', 'goa |0 |0 |0 | | |
['Christianity', 'God', 'atheism', 'comedy', |0 |0 |0 | | |
['architecture', 'collaboration', 'culture', |0 |0 |0 | | |
['God', 'TED Brain Trust', 'atheism', 'brain' |0 |0 |0 | | |
['Christianity', 'God', 'culture', 'happiness |0 |0 |0 | | |
['activism', 'architecture', 'collaboration', |0 |0 |0 | | |
['TED Prize', 'art', 'culture', 'entertainmen |0 |0 |0 | | |
['TED Prize', 'collaboration', 'disease', 'eb |0 |0 |0 | | |
['demo', 'design', 'interface design', 'techn |0 |0 |0 | | |
['children', 'design', 'education', 'entrepre |0 |0 |0 | | |
*/ |
The only time when locate() can potentially return different results from instr() is when it’s been used in its 3-argument form and a starting position has been specified for the search.
By contrast, instr() always starts searching at the first position – there’s no option to define that yourself.
(3) PostgreSQL: position() and strpos()
Both position() and strpos() are case sensitive and neither endorses wild characters (i.e will work as expected if a % sign is included in the search string)
Also, neither of them supports an optional argument that will allow you to specify the starting position of your search, so the only difference between the two is the one in syntax
——————————–PostgreSQL—————————————– | |
select | |
tags, | |
position('technology' in tags) as position_tech, | |
strpos(tags, 'technology') as strpos_tech | |
position('Technology' in tags) as position_Tech, | |
strpos(tags, 'Technology') as strpos_Tech, | |
position('TED%' in tags) as position_TED_wild, | |
strpos(tags, 'TED%') as strpos_TED_wild | |
from public.ted_talks | |
/* tags |position_tech |strpos_tech | |
——————————————————————————————————————————|————–|—- | |
['computers'; 'entertainment'; 'interface design'; 'media'; 'music'; 'performance'; implicity'; oftware'; 'technology'] |108 |108 | |
['MacArthur grant'; 'activism'; 'business'; 'cities'; 'environment'; 'green'; 'inequality'; 'politics'; 'pollution'] |0 |0 | |
['Africa'; 'Asia'; 'Google'; 'demo'; 'economics'; 'global development'; 'global issues'; 'health'; 'math'; tatistics'] |0 |0 | |
['activism'; 'architecture'; 'collaboration'; 'culture'; 'design'; 'disaster relief'; 'global issues'; 'invention'] |0 |0 | |
['TED Prize'; 'art'; 'culture'; 'entertainment'; 'film'; 'global issues'; 'movies'; 'peace'; ocial change'] |0 |0 | |
['TED Prize'; 'collaboration'; 'disease'; 'ebola'; 'global issues'; 'health'; cience'; 'technology'] |89 |89 | |
['demo'; 'design'; 'interface design'; 'technology'] |41 |41 | |
['children'; 'design'; 'education'; 'entrepreneur'; 'global issues'; 'philanthropy'; ocial change'; 'technology'] |102 |102 | |
['entertainment'; 'music'; 'performance'; 'violin'; 'wunderkind'; 'youth'] |0 |0 | |
['creativity'; 'entertainment'; 'music'; 'performance'; 'piano'; 'wunderkind'] |0 |0 | |
['business'; 'collaboration'; 'culture'; 'invention'; 'media'; 'open-source'; 'technology'; 'wikipedia'] |80 |80 | |
['TED Fellows'; 'disease'; 'health care'; 'innovation'; 'invention'; 'medicine'; 'product design'] |0 |0 | |
['business'; 'collaboration'; 'culture'; 'global issues'; 'library'; 'open-source'; 'technology'] |86 |86 | |
['collaboration'; 'comedy'; 'community'; 'culture'; 'dance'; 'demo'; 'entertainment'; 'humor'; 'performance'] |0 |0 | |
tags |position_Tech |strpos_Tech | |
——————————————————————————————————————————|————–|—- | |
['computers'; 'entertainment'; 'interface design'; 'media'; 'music'; 'performance'; implicity'; oftware'; 'technology'] |0 |0 | |
['MacArthur grant'; 'activism'; 'business'; 'cities'; 'environment'; 'green'; 'inequality'; 'politics'; 'pollution'] |0 |0 | |
['Africa'; 'Asia'; 'Google'; 'demo'; 'economics'; 'global development'; 'global issues'; 'health'; 'math'; tatistics'] |0 |0 | |
['activism'; 'architecture'; 'collaboration'; 'culture'; 'design'; 'disaster relief'; 'global issues'; 'invention'] |0 |0 | |
['TED Prize'; 'art'; 'culture'; 'entertainment'; 'film'; 'global issues'; 'movies'; 'peace'; ocial change'] |0 |0 | |
['TED Prize'; 'collaboration'; 'disease'; 'ebola'; 'global issues'; 'health'; cience'; 'technology'] |0 |0 | |
['demo'; 'design'; 'interface design'; 'technology'] |0 |0 | |
['children'; 'design'; 'education'; 'entrepreneur'; 'global issues'; 'philanthropy'; ocial change'; 'technology'] |0 |0 | |
['entertainment'; 'music'; 'performance'; 'violin'; 'wunderkind'; 'youth'] |0 |0 | |
['creativity'; 'entertainment'; 'music'; 'performance'; 'piano'; 'wunderkind'] |0 |0 | |
['business'; 'collaboration'; 'culture'; 'invention'; 'media'; 'open-source'; 'technology'; 'wikipedia'] |0 |0 | |
['TED Fellows'; 'disease'; 'health care'; 'innovation'; 'invention'; 'medicine'; 'product design'] |0 |0 | |
['business'; 'collaboration'; 'culture'; 'global issues'; 'library'; 'open-source'; 'technology'] |0 |0 | |
['collaboration'; 'comedy'; 'community'; 'culture'; 'dance'; 'demo'; 'entertainment'; 'humor'; 'performance'] |0 |0 | |
tags |position_TED_wild |strpos_TED_wild | |
——————————————————————————————————————————|——————|—- | |
['computers'; 'entertainment'; 'interface design'; 'media'; 'music'; 'performance'; implicity'; oftware'; 'technology'] |0 |0 | |
['MacArthur grant'; 'activism'; 'business'; 'cities'; 'environment'; 'green'; 'inequality'; 'politics'; 'pollution'] |0 |0 | |
['Africa'; 'Asia'; 'Google'; 'demo'; 'economics'; 'global development'; 'global issues'; 'health'; 'math'; tatistics'] |0 |0 | |
['activism'; 'architecture'; 'collaboration'; 'culture'; 'design'; 'disaster relief'; 'global issues'; 'invention'] |0 |0 | |
['TED Prize'; 'art'; 'culture'; 'entertainment'; 'film'; 'global issues'; 'movies'; 'peace'; ocial change'] |0 |0 | |
['TED Prize'; 'collaboration'; 'disease'; 'ebola'; 'global issues'; 'health'; cience'; 'technology'] |0 |0 | |
['demo'; 'design'; 'interface design'; 'technology'] |0 |0 | |
['children'; 'design'; 'education'; 'entrepreneur'; 'global issues'; 'philanthropy'; ocial change'; 'technology'] |0 |0 | |
['entertainment'; 'music'; 'performance'; 'violin'; 'wunderkind'; 'youth'] |0 |0 | |
['creativity'; 'entertainment'; 'music'; 'performance'; 'piano'; 'wunderkind'] |0 |0 | |
['business'; 'collaboration'; 'culture'; 'invention'; 'media'; 'open-source'; 'technology'; 'wikipedia'] |0 |0 | |
['TED Fellows'; 'disease'; 'health care'; 'innovation'; 'invention'; 'medicine'; 'product design'] |0 |0 | |
['business'; 'collaboration'; 'culture'; 'global issues'; 'library'; 'open-source'; 'technology'] |0 |0 | |
['collaboration'; 'comedy'; 'community'; 'culture'; 'dance'; 'demo'; 'entertainment'; 'humor'; 'performance'] |0 |0 | |
* | |
***/ |
To sum up:
Syntax | Does it support optional arguments? | Does it support wild characters? | Is it case sensitive? |
CHARINDEX ( expressionToFind , expressionToSearch [ , start_location ] | yes | no | no |
PATINDEX ( ‘%pattern%’ , expression ) | no | yes | no |
INSTR (str,substr) | no | no | no |
LOCATE (substr,str,[pos]) | yes | no | no |
POSITION (substring in string) | no | no | yes |
STRPOS (string, substring) | no | no | yes |
Happy SQL-ing!
Awesome work!
LikeLike