String Functions: Nim vs Python
— Kaushal ModiWhile learning the Nim language and trying to correlate that with my Python 3 knowledge, I came across this awesome comparison table of string manipulation functions between the two languages.
My utmost gratitude goes to the developers of Nim, Python, Org,
ob-nim
and ob-python
, and of course Hugo which allowed me to
publish my notes in this presentable format.
Here are the code samples and their outputs. In each section below, you will find a Python code snippet, followed by its output, and then the same implementation in Nim, followed by the output of that.
Updates #
- Update to Nim 1.5.1
- Update to Nim 1.1.1
- Update to Nim 0.19.0 (just confirmed that all code blocks work as before—needed no modification).
- Add Better IsLower/IsUpper section; update Python to 3.7.0 and Nim to the latest devel as of today.
- Update the Nim snippets output using the improved
echo
! Theecho
output difference is notable in the.split
examples. This fixes the issue about confusingecho
outputs that I raised in Nim Issue #6225. Big thanks to @bluenote10 from GitHub for Nim PR #6825! - Update the Understanding the
^N
syntax example that gave incorrect output before Nim Issue #6223 got fixed. - Update the
.join
example that did not work before Nim Issue #6210 got fixed. - Use the binary operator
..<
instead of the combination of binary operator..
and the deprecated unary<
operator.
String slicing #
All characters except last #
str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
print(str[:-1])
a bc def aghij cklm danopqrstuv adefwxyz zy
var str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
# Always add a space around the .. and ..< operators
echo str[0 ..< str.high]
# or
echo str[0 .. ^2]
a bc def aghij cklm danopqrstuv adefwxyz zy
a bc def aghij cklm danopqrstuv adefwxyz zy
Understanding the ^N
syntax #
var str = "abc"
# Always add a space around the .. and ..< operators
echo "1st char(0) to last, including \\0(^0) : ", str[0 .. ^0] # Interestingly, this also prints the NULL character in the output.. looks like "abc^@" in Emacs
echo "1st char(0) to last (^1) «3rd» : ", str[0 .. ^1]
echo "1st char(0) to 2nd-to-last(^2) «2nd» : ", str[0 .. ^2]
echo "1st char(0) to 3rd-to-last(^3) «1st» : ", str[0 .. ^3]
echo "1st char(0) to 4th-to-last(^4) «0th» : ", str[0 .. ^4]
# echo "1st char(0) to 4th-to-last(^4) «0th» : ", str[0 .. ^5] # Error: unhandled exception: value out of range: -1 [RangeError]
# echo "2nd char(1) to 4th-to-last(^4) «0th» : ", str[1 .. ^4] # Error: unhandled exception: value out of range: -1 [RangeError]
echo "2nd char(1) to 3rd-to-last(^3) «1st» : ", str[1 .. ^3]
echo "2nd char(1) to 2nd-to-last(^2) «2nd» : ", str[1 .. ^2]
echo "2nd char(1) to last, (^1) «3rd» : ", str[1 .. ^1]
echo "Now going a bit crazy .."
echo " 2nd-to-last(^2) «2nd» char to 3rd(2) : ", str[^2 .. 2]
echo " 2nd-to-last(^2) «2nd» char to last(^1) «3rd» : ", str[^2 .. ^1]
echo " 3rd-to-last(^3) «1st» char to 3rd(2) : ", str[^3 .. 2]
1st char(0) to last, including \0(^0) : abc
1st char(0) to last (^1) «3rd» : abc
1st char(0) to 2nd-to-last(^2) «2nd» : ab
1st char(0) to 3rd-to-last(^3) «1st» : a
1st char(0) to 4th-to-last(^4) «0th» :
2nd char(1) to 3rd-to-last(^3) «1st» :
2nd char(1) to 2nd-to-last(^2) «2nd» : b
2nd char(1) to last, (^1) «3rd» : bc
Now going a bit crazy ..
2nd-to-last(^2) «2nd» char to 3rd(2) : bc
2nd-to-last(^2) «2nd» char to last(^1) «3rd» : bc
3rd-to-last(^3) «1st» char to 3rd(2) : abc
- Notes
It is recommended to always use a space around the
..
and..<
binary operators to get consistent results (and no compilation errors!). Examples:[0 ..< str.high]
,[0 .. str.high]
,[0 .. ^2]
,[ .. ^2]
. This is based on the tip by @Araq from GitHub (also one of the core devs of Nim). You will find the full discussion around this topic of dots and spaces in Nim Issue #6216.Special ascii chars like
%
.
&
$
are collected into a single operator token. – AraqTo repeat: Always add a space around the
..
and..<
operators.As of 70ea45cdba, the
<
unary operator is deprecated! So do[0 ..< str.high]
instead of[0 .. <str.high]
(see Nim Issue #6788).With the example
str
variable value being"abc"
, earlier bothstr[0 .. ^5]
andstr[1 .. ^4]
returned an empty string incorrectly! (see Nim Issue #6223). That now got fixed in b74a5148a9. After the fix, those will cause this error:system.nim(3534) [] system.nim(2819) sysFatal Error: unhandled exception: value out of range: -1 [RangeError]
Also after this fix,
str[0 .. ^0]
outputsabc^@
(where^@
is the representation of NULL character).. very cool!
All characters except first #
str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
print(str[1:])
bc def aghij cklm danopqrstuv adefwxyz zyx
var str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
# echo str[1 .. ] # Does not work.. Error: expression expected, but found ']'
# https://github.com/nim-lang/Nim/issues/6212
# Always add a space around the .. and ..< operators
echo str[1 .. str.high]
# or
echo str[1 .. ^1] # second(1) to last(^1)
bc def aghij cklm danopqrstuv adefwxyz zyx
bc def aghij cklm danopqrstuv adefwxyz zyx
All characters except first and last #
str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
print(str[1:-1])
bc def aghij cklm danopqrstuv adefwxyz zy
var str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
# Always add a space around the .. and ..< operators
echo str[1 ..< str.high]
# or
echo str[1 .. ^2] # second(1) to second-to-last(^2)
bc def aghij cklm danopqrstuv adefwxyz zy
bc def aghij cklm danopqrstuv adefwxyz zy
Count #
str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
print(str.count('a'))
print(str.count('de'))
4
2
import strutils
var str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
echo str.count('a')
echo str.count("de")
4
2
Starts/ends with #
Starts With #
str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
print(str.startswith('a'))
print(str.startswith('a\t'))
print(str.startswith('z'))
True
True
False
import strutils
var str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
echo str.startsWith('a') # Recommended Nim style
# or
echo str.startswith('a')
# or
echo str.starts_with('a')
echo str.startsWith("a\t")
echo str.startsWith('z')
true
true
true
true
false
- Notes
- All Nim identifiers are case and underscore insensitive (except
for the first character of the identifier), as seen in the above
example. So any of
startsWith
orstartswith
orstarts_with
would work the exact same way. - Though, it has to be noted that using the camelCase variant
(
startsWith
) is preferred in Nim.
- All Nim identifiers are case and underscore insensitive (except
for the first character of the identifier), as seen in the above
example. So any of
Ends With #
str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
print(str.endswith('x'))
print(str.endswith('yx'))
print(str.endswith('z'))
True
True
False
import strutils
var str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
echo str.endsWith('x')
echo str.endsWith("yx")
echo str.endsWith('z')
true
true
false
Expand Tabs #
str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
print(str.expandtabs())
print(str.expandtabs(4))
a bc def aghij cklm danopqrstuv adefwxyz zyx
a bc def aghij cklm danopqrstuv adefwxyz zyx
import strmisc
var str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
echo str.expandTabs()
echo str.expandTabs(4)
a bc def aghij cklm danopqrstuv adefwxyz zyx
a bc def aghij cklm danopqrstuv adefwxyz zyx
Find/Index #
Find (from left) #
str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
print(str.find('a'))
print(str.find('b'))
print(str.find('c'))
print(str.find('zyx'))
print(str.find('aaa'))
0
2
3
41
-1
import strutils
var str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
echo str.find('a')
echo str.find('b')
echo str.find('c')
echo str.find("zyx")
echo str.find("aaa")
0
2
3
41
-1
Find from right #
str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
print(str.rfind('a'))
print(str.rfind('b'))
print(str.rfind('c'))
print(str.rfind('zyx'))
print(str.rfind('aaa'))
32
2
15
41
-1
import strutils
var str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
echo str.rfind('a')
echo str.rfind('b')
echo str.rfind('c')
echo str.rfind("zyx")
echo str.rfind("aaa")
32
2
15
41
-1
Index (from left) #
From Python 3 docs,
Like
find()
, but raiseValueError
when the substring is not found.
str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
print(str.index('a'))
print(str.index('b'))
print(str.index('c'))
print(str.index('zyx'))
# print(str.index('aaa')) # Throws ValueError: substring not found
0
2
3
41
Nim does not have an error raising index
function like that
out-of-box, but something like that can be done with:
import strutils
var str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
# https://nim-lang.org/docs/strutils.html#find,string,string,Natural,Natural
# proc find(s, sub: string; start: Natural = 0; last: Natural = 0): int {..}
proc index(s, sub: auto; start: Natural = 0; last: Natural = 0): int =
result = s.find(sub, start, last)
if result<0:
raise newException(ValueError, "$1 not found in $2".format(sub, s))
echo str.index('a')
echo str.index('b')
echo str.index('c')
echo str.index("zyx")
# echo str.index("aaa") # Error: unhandled exception: aaa not found in a bc def aghij cklm danopqrstuv adefwxyz zyx [ValueError]
0
2
3
41
- Notes
- No Nim equivalent, but I came up with my own
index
proc for Nim above.
- No Nim equivalent, but I came up with my own
Index from right #
From Python 3 docs,
Like
rfind()
, but raiseValueError
when the substring is not found.
str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
print(str.rindex('a'))
print(str.rindex('b'))
print(str.rindex('c'))
print(str.rindex('zyx'))
# print(str.rindex('aaa')) # Throws ValueError: substring not found
32
2
15
41
Nim does not have an error raising rindex
function like that
out-of-box, but something like that can be done with:
import strutils
var str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
# https://nim-lang.github.io/Nim/strutils.html#rfind%2Cstring%2Cstring%2CNatural
# proc rfind(s, sub: string; start: Natural = 0; last = - 1): int {..}
proc rindex(s, sub: auto; start: Natural = 0; last = - 1): int =
result = s.rfind(sub, start, last)
if result<0:
raise newException(ValueError, "$1 not found in $2".format(sub, s))
echo str.rindex('a')
echo str.rindex('b')
echo str.rindex('c')
echo str.rindex("zyx")
# echo str.rindex("aaa") # Error: unhandled exception: aaa not found in a bc def aghij cklm danopqrstuv adefwxyz zyx [ValueError]
32
2
15
41
- Notes
- No Nim equivalent, but I came up with my own
rindex
proc for Nim above.
- No Nim equivalent, but I came up with my own
String Predicates #
Is Alphanumeric? #
print('abc'.isalnum())
print('012'.isalnum())
print('abc012'.isalnum())
print('abc012_'.isalnum())
print('{}'.isalnum())
print('Unicode:')
print('ábc'.isalnum())
True
True
True
False
False
Unicode:
True
import std/[strutils, sequtils]
echo "abc".allIt(it.isAlphaNumeric())
echo "012".allIt(it.isAlphaNumeric())
echo "abc012".allIt(it.isAlphaNumeric())
echo "abc012_".allIt(it.isAlphaNumeric())
echo "{}".allIt(it.isAlphaNumeric())
echo "[Wrong] ", "ábc".allIt(it.isAlphaNumeric()) # Returns false! isAlphaNumeric works only for ascii.
true
true
true
false
false
[Wrong] false
TODO Figure out how to write unicode-equivalent of isAlphaNumeric
#
Is Alpha? #
print('abc'.isalpha())
print('012'.isalpha())
print('abc012'.isalpha())
print('abc012_'.isalpha())
print('{}'.isalpha())
print('Unicode:')
print('ábc'.isalpha())
True
False
False
False
False
Unicode:
True
import strutils except isAlpha
import std/[unicode, sequtils]
echo "abc".allIt(it.isAlphaAscii())
echo "012".allIt(it.isAlphaAscii())
echo "abc012".allIt(it.isAlphaAscii())
echo "abc012_".allIt(it.isAlphaAscii())
echo "{}".allIt(it.isAlphaAscii())
echo "Unicode:"
echo unicode.isAlpha("ábc")
# or
echo isAlpha("ábc") # unicode prefix is not needed
# because of import strutils except isAlpha
# or
echo "ábc".isAlpha() # from unicode
true
false
false
false
false
Unicode:
true
true
true
- Notes
Thanks to the tip from @dom96 from GitHub on the use of
except
inimport
:import strutils except isAlpha import unicode
That prevents the ambiguous call error like below as we are specifying that
import
everything fromstrutils
,except
for theisAlpha
proc. Thus theunicode
version ofisAlpha
proc is used automatically.nim_src_28505flZ.nim(14, 13) Error: ambiguous call; both strutils.isAlpha(s: string)[declared in lib/pure/strutils.nim(289, 5)] and unicode.isAlpha(s: string)[declared in lib/pure/unicode.nim(1416, 5)] match for: (string)
Is Digit? #
print('abc'.isdigit())
print('012'.isdigit())
print('abc012'.isdigit())
print('abc012_'.isdigit())
print('{}'.isdigit())
False
True
False
False
False
import std/[strutils, sequtils]
echo "abc".allIt(it.isDigit())
echo "012".allIt(it.isDigit())
echo "abc012".allIt(it.isDigit())
echo "abc012_".allIt(it.isDigit())
echo "{}".allIt(it.isDigit())
false
true
false
false
false
Better IsLower/IsUpper #
Nim Issue #7963 did not get resolved as I would have liked. This
section has isLowerAsciiPlus
, isUpperAsciiPlus
, isLowerPlus
,
isUpperPlus
procs that accept a string input that replace their
non-Plus equivalents from strutils
and unicode
modules.
import strutils except isLower, isUpper
import unicode
template isCaseImpl(s, charProc) =
var hasAtleastOneAlphaChar = false
if s.len == 0: return false
for c in s:
var charIsAlpha = c.isAlphaAscii()
if not hasAtleastOneAlphaChar:
hasAtleastOneAlphaChar = charIsAlpha
if charIsAlpha and (not charProc(c)):
return false
return hasAtleastOneAlphaChar
proc isLowerAsciiPlus(s: string): bool =
## Checks whether ``s`` is lower case.
##
## This checks ASCII characters only.
##
## Returns true if all alphabetical characters in ``s`` are lower
## case. Returns false if none of the characters in ``s`` are
## alphabetical.
##
## Returns false if ``s`` is an empty string.
isCaseImpl(s, isLowerAscii)
proc isUpperAsciiPlus(s: string): bool =
## Checks whether ``s`` is upper case.
##
## This checks ASCII characters only.
##
## Returns true if all alphabetical characters in ``s`` are upper
## case. Returns false if none of the characters in ``s`` are
## alphabetical.
##
## Returns false if ``s`` is an empty string.
isCaseImpl(s, isUpperAscii)
template runeCaseCheck(s, runeProc) =
## Common code for rune.isLower and rune.isUpper.
if len(s) == 0: return false
var
i = 0
rune: Rune
hasAtleastOneAlphaRune = false
while i < len(s):
fastRuneAt(s, i, rune, doInc=true)
var runeIsAlpha = isAlpha(rune)
if not hasAtleastOneAlphaRune:
hasAtleastOneAlphaRune = runeIsAlpha
if runeIsAlpha and (not runeProc(rune)):
return false
return hasAtleastOneAlphaRune
proc isLowerPlus(s: string): bool =
## Checks whether ``s`` is lower case.
##
## Returns true if all alphabetical runes in ``s`` are lower case.
## Returns false if none of the runes in ``s`` are alphabetical.
##
## Returns false if ``s`` is an empty string.
runeCaseCheck(s, isLower)
proc isUpperPlus(s: string): bool =
## Checks whether ``s`` is upper case.
##
## Returns true if all alphabetical runes in ``s`` are upper case.
## Returns false if none of the runes in ``s`` are alphabetical.
##
## Returns false if ``s`` is an empty string.
runeCaseCheck(s, isUpper)
Is Lower? #
print('a'.islower())
print('A'.islower())
print('abc'.islower())
print('Abc'.islower())
print('aBc'.islower())
print('012'.islower())
print('{}'.islower())
print('ABC'.islower())
print('À'.islower())
print('à'.islower())
print('a b'.islower()) # Precedence for https://github.com/nim-lang/Nim/issues/7963
print('ab?!'.islower()) # Precedence for https://github.com/nim-lang/Nim/issues/7963
print('1, 2, 3 go!'.islower()) # Precedence for https://github.com/nim-lang/Nim/issues/7963
print(' '.islower()) # checking this proc on a non-alphabet char
print('(*&#@(^#$ '.islower()) # checking this proc on a non-alphabet string
True
False
True
False
False
False
False
False
False
True
True
True
True
False
False
<<islower_isupper_plus>>
echo 'a'.isLowerAscii()
echo 'A'.isLowerAscii()
echo "abc".isLowerAsciiPlus()
echo "Abc".isLowerAsciiPlus()
echo "aBc".isLowerAsciiPlus()
echo "012".isLowerAsciiPlus()
echo "{}".isLowerAsciiPlus()
echo "ABC".isLowerAsciiPlus()
echo "À".isLowerAsciiPlus()
echo "[Wrong] ", "à".isLowerAsciiPlus() # Returns false! As the name suggests, works only for ascii.
echo "À".isLowerPlus()
echo isLowerPlus("à")
echo "à".isLowerPlus()
echo "a b".isLowerAsciiPlus()
echo "a b".isLowerPlus()
echo "ab?!".isLowerPlus()
echo "1, 2, 3 go!".isLowerPlus()
echo ' '.isLowerAscii() # checking this proc on a non-alphabet char
echo ' '.Rune.isLower() # checking this proc on a non-alphabet Rune
echo "(*&#@(^#$ ".isLowerPlus() # checking this proc on a non-alphabet string
true
false
true
false
false
false
false
false
false
[Wrong] false
false
true
true
true
true
true
true
false
false
false
DONE Presence of space and punctuations in string makes isLower return false #
Nim Issue #7963 did not get resolved as I would have liked. So I just rolled my own procs in Better IsLower/IsUpper to fix this issue.
- Notes
isLower
fromstrutils
is deprecated. UseisLowerAscii
instead, orisLower
fromunicode
(as done above).- To check if a non-ascii alphabet is in lower case, use
unicode.isLower
.
Is Upper? #
print('a'.isupper())
print('A'.isupper())
print('abc'.isupper())
print('Abc'.isupper())
print('aBc'.isupper())
print('012'.isupper())
print('{}'.isupper())
print('ABC'.isupper())
print('À'.isupper())
print('à'.isupper())
print('A B'.isupper()) # Precedence for https://github.com/nim-lang/Nim/issues/7963
print('AB?!'.isupper()) # Precedence for https://github.com/nim-lang/Nim/issues/7963
print('1, 2, 3 GO!'.isupper()) # Precedence for https://github.com/nim-lang/Nim/issues/7963
print(' '.isupper()) # checking this function on a non-alphabet char
print('(*&#@(^#$ '.isupper()) # checking this proc on a non-alphabet string
False
True
False
False
False
False
False
True
True
False
True
True
True
False
False
<<islower_isupper_plus>>
echo 'a'.isUpperAscii()
echo 'A'.isUpperAscii()
echo "abc".isUpperAsciiPlus()
echo "Abc".isUpperAsciiPlus()
echo "aBc".isUpperAsciiPlus()
echo "012".isUpperAsciiPlus()
echo "{}".isUpperAsciiPlus()
echo "ABC".isUpperAsciiPlus()
echo "[Wrong] ", "À".isUpperAsciiPlus() # Returns false! As the name suggests, works only for ascii.
echo "à".isUpperAsciiPlus()
echo "À".isUpperPlus() # from unicode
echo isUpperPlus("À")
echo "à".isUpperPlus() # from unicode
echo "A B".isUpperAsciiPlus() #
echo "A B".isUpperPlus()
echo "AB?!".isUpperPlus()
echo "1, 2, 3 GO!".isUpperPlus()
echo ' '.isUpperAscii() # checking this proc on a non-alphabet char
echo ' '.Rune.isUpper() # checking this proc on a non-alphabet Rune
echo "(*&#@(^#$ ".isUpperPlus() # checking this proc on a non-alphabet string
false
true
false
false
false
false
false
true
[Wrong] false
false
true
true
false
true
true
true
true
false
false
false
DONE Presence of space and punctuations in string makes isUpper return false #
Nim Issue #7963 did not get resolved as I would have liked. So I just rolled my own procs in Better IsLower/IsUpper to fix this issue.
Is Space? #
print(''.isspace())
print(' '.isspace())
print('\t'.isspace())
print('\r'.isspace())
print('\n'.isspace())
print(' \t\n'.isspace())
print('abc'.isspace())
print('Testing with ZERO WIDTH SPACE unicode character below:')
print(''.isspace())
False
True
True
True
True
True
False
Testing with ZERO WIDTH SPACE unicode character below:
False
import strutils except isSpace
import std/[unicode, sequtils]
proc isSpaceAscii(s: string): bool =
if s == "":
return false
s.allIt(it.isSpaceAscii())
echo "".isSpaceAscii() # empty string has to be in double quotes
echo ' '.isSpaceAscii()
echo '\t'.isSpaceAscii()
echo '\r'.isSpaceAscii()
echo "\n".isSpaceAscii() # \n is a string, not a character in Nim
echo " \t\n".isSpaceAscii()
echo "abc".isSpaceAscii()
echo "Testing with ZERO WIDTH SPACE unicode character below:"
echo "[Wrong] ", "".isSpaceAscii() # Returns false! As the name suggests, works only for ascii.
echo "".isSpace() # from unicode
false
true
true
true
true
true
false
Testing with ZERO WIDTH SPACE unicode character below:
[Wrong] false
false
- Notes
- Empty string results in a false result for both Python and Nim
variants of
isspace
. \n
is a string, not a character in Nim, because based on the OS,\n
can comprise of one or more characters.isSpace
fromstrutils
is deprecated. UseisSpaceAscii
instead, orisSpace
fromunicode
(as done above).- To check if a non-ascii alphabet is in space case, use
unicode.isSpace
. - Interestingly, Nim’s
isSpace
fromunicode
module returns true forZERO WIDTH SPACE
unicode character (0x200b
) as input, but Python’sisspace
returns false. I believe Python’s behavior here is incorrect.
- Empty string results in a false result for both Python and Nim
variants of
Is Title? #
print(''.istitle())
print('T'.istitle())
print('Dž'.istitle())
print('The Quick? (“Brown”) Fox Can’t Jump 32.3 Feet, Right?'.istitle()) # Python's output is wrong
print('this is not a title'.istitle())
print('This Is A Title'.istitle())
print('This Is À Title'.istitle())
print('This Is Not a Title'.istitle())
False
True
True
False
False
True
True
False
import std/[unicode, strformat]
# https://github.com/nim-lang/Nim/issues/14348#issuecomment-629414257
proc isTitle(s: string): bool =
proc isUpperOrTitle(r: Rune): bool = r.isUpper() or r.isTitle()
var
alphaSeen = false
for word in s.split(): # Split s into a sequence of words
result = true
var
upperSeen = false
let
runes = word.toRunes()
for r in runes:
if not r.isAlpha():
continue
alphaSeen = true
if not upperSeen:
if r.isUpperOrTitle():
upperSeen = true
else:
return false
else:
if r.isUpperOrTitle():
return false
if not alphaSeen:
return false
echo "".isTitle()
echo "T".isTitle()
echo "Dž".isTitle()
echo "The Quick? (“Brown”) Fox Can’t Jump 32.3 Feet, Right?".isTitle()
echo "this is not a title".isTitle()
echo "This Is A Title".isTitle()
echo "This Is À Title".isTitle()
echo "This Is Not a Title".isTitle()
false
true
true
true
false
true
true
false
Join #
print(' '.join(['a', 'b', 'c']))
print('xx'.join(['a', 'b', 'c']))
a b c
axxbxxc
import strutils
echo "Sequences:"
# echo @["a", "b", "c"].join(' ') # Error: type mismatch: got (seq[string], char)
echo @["a", "b", "c"].join(" ")
echo join(@["a", "b", "c"], " ")
echo @["a", "b", "c"].join("xx")
echo @['a', 'b', 'c'].join("") # join characters to form strings
echo "Lists:"
echo ["a", "b", "c"].join(" ") # Works after Nim issue # 6210 got fixed.
echo (["a", "b", "c"].join(" ")) # Works!
echo join(["a", "b", "c"], " ") # Works!
var list = ["a", "b", "c"]
echo list.join(" ") # Works too!
Sequences:
a b c
a b c
axxbxxc
abc
Lists:
a b c
a b c
a b c
a b c
- Notes
- The second arg to join, the separator argument has to be a string, cannot be a character.
echo ["a", "b", "c"].join(" ")
did not work prior to the fix in ddc131cf07 (see Nim Issue #6210), but now it does!
Justify with filling #
Center Justify with filling #
str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
print(str.center(80))
print(str.center(80, '*'))
a bc def aghij cklm danopqrstuv adefwxyz zyx
******************a bc def aghij cklm danopqrstuv adefwxyz zyx******************
import strutils
var str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
echo str.center(80)
echo str.center(80, '*')
# or
echo center(str, 80, '*')
a bc def aghij cklm danopqrstuv adefwxyz zyx
******************a bc def aghij cklm danopqrstuv adefwxyz zyx******************
******************a bc def aghij cklm danopqrstuv adefwxyz zyx******************
Left Justify with filling #
print('abc'.ljust(2, '*'))
print('abc'.ljust(10, '*'))
abc
abc*******
import strutils
echo "abc".alignLeft(2, '*')
echo "abc".alignLeft(10, '*')
abc
abc*******
Right Justify with filling #
print('abc'.rjust(10, '*'))
*******abc
import strutils
echo "abc".align(10, '*')
*******abc
Zero Fill #
print('42'.zfill(5))
print('-42'.zfill(5))
print(' -42'.zfill(5))
00042
-0042
0 -42
import strutils
echo "Using align:"
echo "42".align(5, '0')
echo "-42".align(5, '0')
echo "Using zfill:"
proc zfill(s: string; count: Natural): string =
let strlen = len(s)
if strlen < count:
if s[0]=='-':
result = "-"
result.add("0".repeat(count-strlen))
result.add(s[1 .. s.high])
else:
result = "0".repeat(count-strlen)
result.add(s)
else:
result = s
echo "42".zfill(5)
echo "-42".zfill(5)
echo " -42".zfill(5)
Using align:
00042
00-42
Using zfill:
00042
-0042
0 -42
- Notes
- The
align
in Nim does not do the right thing as the Pythonzfill
does when filling zeroes on the left in strings representing negative numbers. - No Nim equivalent, but I came up with my own
zfill
proc for Nim above.
- The
Case conversion #
To Lower #
print('a'.lower())
print('A'.lower())
print('abc'.lower())
print('Abc'.lower())
print('aBc'.lower())
print('012'.lower())
print('{}'.lower())
print('ABC'.lower())
print('À'.lower())
print('à'.lower())
a
a
abc
abc
abc
012
{}
abc
à
à
import strutils except toLower
import unicode
echo 'a'.toLowerAscii()
echo 'A'.toLowerAscii()
echo "abc".toLowerAscii()
echo "Abc".toLowerAscii()
echo "aBc".toLowerAscii()
echo "012".toLowerAscii()
echo "{}".toLowerAscii()
echo "ABC".toLowerAscii()
echo "[Wrong] ", "À".toLowerAscii() # Does not work! As the name suggests, works only for ascii.
echo "à".toLowerAscii()
echo "À".toLower() # from unicode
echo "à".toLower() # from unicode
a
a
abc
abc
abc
012
{}
abc
[Wrong] À
à
à
à
- Notes
toLower
fromstrutils
is deprecated. UsetoLowerAscii
instead, ortoLower
fromunicode
(as done above).- To convert a non-ascii alphabet to lower case, use
unicode.toLower
.
To Upper #
print('a'.upper())
print('A'.upper())
print('abc'.upper())
print('Abc'.upper())
print('aBc'.upper())
print('012'.upper())
print('{}'.upper())
print('ABC'.upper())
print('À'.upper())
print('à'.upper())
A
A
ABC
ABC
ABC
012
{}
ABC
À
À
import strutils except toUpper
import unicode
echo 'a'.toUpperAscii()
echo 'A'.toUpperAscii()
echo "abc".toUpperAscii()
echo "Abc".toUpperAscii()
echo "aBc".toUpperAscii()
echo "012".toUpperAscii()
echo "{}".toUpperAscii()
echo "ABC".toUpperAscii()
echo "À".toUpperAscii()
echo "[Wrong] ", "à".toUpperAscii() # Does not work! As the name suggests, works only for ascii.
echo "À".toUpper() # from unicode
echo "à".toUpper() # from unicode
A
A
ABC
ABC
ABC
012
{}
ABC
À
[Wrong] à
À
À
- Notes
toUpper
fromstrutils
is deprecated. UsetoUpperAscii
instead, ortoUpper
fromunicode
(as done above).- To convert a non-ascii alphabet to upper case, use
unicode.toUpper
.
Capitalize #
str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
print(str.capitalize())
A bc def aghij cklm danopqrstuv adefwxyz zyx
import strutils
var str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
echo str.capitalizeAscii
# or
echo capitalizeAscii(str)
A bc def aghij cklm danopqrstuv adefwxyz zyx
A bc def aghij cklm danopqrstuv adefwxyz zyx
To Title #
print('convert this to title á û'.title())
Convert This To Title Á Û
import unicode
echo "convert this to title á û".title()
Convert This To Title Á Û
Swap Case #
print('Swap CASE example á û Ê'.swapcase())
print('Swap CASE example á û Ê'.swapcase().swapcase())
sWAP case EXAMPLE Á Û ê
Swap CASE example á û Ê
import unicode
echo "Swap CASE example á û Ê".swapcase()
echo "Swap CASE example á û Ê".swapcase().swapcase()
sWAP case EXAMPLE Á Û ê
Swap CASE example á û Ê
- Notes
- See this SO Q/A to read about few cases where
s.swapcase().swapcase()==s
is not true (at least for Python).
- See this SO Q/A to read about few cases where
Strip #
Left/leading and right/trailing Strip #
print('«' + ' spacious '.strip() + '»')
print('«' + '\n string \n \n\n'.strip() + '»')
print('«' + '\n'.strip() + '»')
print('www.example.com'.strip('cmowz.'))
print('mississippi'.strip('mipz'))
«spacious»
«string»
«»
example
ssiss
import strutils
echo "«" & " spacious ".strip() & "»"
echo "«" & "\n string \n \n\n".strip() & "»"
echo "«" & "\n".strip() & "»"
echo "www.example.com".strip(chars={'c', 'm', 'o', 'w', 'z', '.'})
echo "mississippi".strip(chars={'m', 'i', 'p', 'z'})
«spacious»
«string»
«»
example
ssiss
- Notes
- Python
strip
takes a string as an argument to specify the letters that need to be stripped off the input string. But Nimstrip
requires a Set of characters.
- Python
Left/leading Strip #
print('«' + ' spacious '.lstrip() + '»')
print('www.example.com'.lstrip('cmowz.'))
print('mississippi'.lstrip('mipz'))
«spacious »
example.com
ssissippi
import strutils
echo "«", " spacious ".strip(trailing=false), "»"
echo "www.example.com".strip(trailing=false, chars={'c', 'm', 'o', 'w', 'z', '.'})
echo "mississippi".strip(trailing=false, chars={'m', 'i', 'p', 'z'})
«spacious »
example.com
ssissippi
Right/trailing Strip #
print('«' + ' spacious '.rstrip() + '»')
print('www.example.com'.rstrip('cmowz.'))
print('mississippi'.rstrip('mipz'))
« spacious»
www.example
mississ
import strutils
echo "«", " spacious ".strip(leading=false), "»"
echo "www.example.com".strip(leading=false, chars={'c', 'm', 'o', 'w', 'z', '.'})
echo "mississippi".strip(leading=false, chars={'m', 'i', 'p', 'z'})
« spacious»
www.example
mississ
Partition #
First occurrence partition #
print('ab:ce:ef:ce:ab'.partition(':'))
print('ab:ce:ef:ce:ab'.partition('ce'))
('ab', ':', 'ce:ef:ce:ab')
('ab:', 'ce', ':ef:ce:ab')
import strmisc
echo "ab:ce:ef:ce:ab".partition(":") # The argument is a string, not a character
echo "ab:ce:ef:ce:ab".partition("ce")
("ab", ":", "ce:ef:ce:ab")
("ab:", "ce", ":ef:ce:ab")
Right partition or Last occurrence partition #
print('ab:ce:ef:ce:ab'.rpartition(':'))
print('ab:ce:ef:ce:ab'.rpartition('ce'))
('ab:ce:ef:ce', ':', 'ab')
('ab:ce:ef:', 'ce', ':ab')
import strmisc
echo "ab:ce:ef:ce:ab".rpartition(":") # The argument is a string, not a character
# or
echo "ab:ce:ef:ce:ab".partition(":", right=true)
echo "ab:ce:ef:ce:ab".rpartition("ce")
# or
echo "ab:ce:ef:ce:ab".partition("ce", right=true)
("ab:ce:ef:ce", ":", "ab")
("ab:ce:ef:ce", ":", "ab")
("ab:ce:ef:", "ce", ":ab")
("ab:ce:ef:", "ce", ":ab")
Replace #
print('abc abc abc'.replace(' ab', '-xy'))
print('abc abc abc'.replace(' ', '')) # Strip all spaces
print('abc abc abc'.replace(' ab', '-xy', 0))
print('abc abc abc'.replace(' ab', '-xy', 1))
print('abc abc abc'.replace(' ab', '-xy', 2))
abc-xyc-xyc
abcabcabc
abc abc abc
abc-xyc abc
abc-xyc-xyc
import strutils
echo "abc abc abc".replace(" ab", "-xy")
echo "abc abc abc".replace(" ", "") # Strip all spaces
# echo "abc abc abc".replace(" ab", "-xy", 0) # Invalid, does not expect a count:int argument
# echo "abc abc abc".replace(" ab", "-xy", 1) # Invalid, does not expect a count:int argument
# echo "abc abc abc".replace(" ab", "-xy", 2) # Invalid, does not expect a count:int argument
abc-xyc-xyc
abcabcabc
- Notes
- Nim does not allow specifying the number of occurrences to be
replaced using a
count
argument as in the Python version ofreplace
.
- Nim does not allow specifying the number of occurrences to be
replaced using a
Split #
Split (from left) #
print('1,2,3'.split(','))
print('1,2,3'.split(',', maxsplit=1))
print('1,2,,3,'.split(','))
print('1::2::3'.split('::'))
print('1::2::3'.split('::', maxsplit=1))
print('1::2::::3::'.split('::'))
['1', '2', '3']
['1', '2,3']
['1', '2', '', '3', '']
['1', '2', '3']
['1', '2::3']
['1', '2', '', '3', '']
import strutils
echo "1,2,3".split(',')
echo "1,2,3".split(',', maxsplit=1)
echo "1,2,,3,".split(',')
echo "1::2::3".split("::")
echo "1::2::3".split("::", maxsplit=1)
echo "1::2::::3::".split("::")
@["1", "2", "3"]
@["1", "2,3"]
@["1", "2", "", "3", ""]
@["1", "2", "3"]
@["1", "2::3"]
@["1", "2", "", "3", ""]
Split from right #
rsplit
behaves just like split
unless the maxsplit
argument is
given
print('1,2,3'.rsplit(','))
print('1,2,3'.rsplit(',', maxsplit=1))
print('1,2,,3,'.rsplit(','))
print('1::2::3'.rsplit('::'))
print('1::2::3'.rsplit('::', maxsplit=1))
print('1::2::::3::'.rsplit('::'))
['1', '2', '3']
['1,2', '3']
['1', '2', '', '3', '']
['1', '2', '3']
['1::2', '3']
['1', '2', '', '3', '']
import strutils
echo "1,2,3".rsplit(',')
echo "1,2,3".rsplit(',', maxsplit=1)
echo "1,2,,3,".rsplit(',')
echo "1::2::3".rsplit("::")
echo "1::2::3".rsplit("::", maxsplit=1)
echo "1::2::::3::".rsplit("::")
@["1", "2", "3"]
@["1,2", "3"]
@["1", "2", "", "3", ""]
@["1", "2", "3"]
@["1::2", "3"]
@["1", "2", "", "", "3", ""]
Split Lines #
print('ab c\n\nde fg\rkl\r\n'.splitlines())
print('ab c\n\nde fg\rkl\r\n'.splitlines(keepends=True))
['ab c', '', 'de fg', 'kl']
['ab c\n', '\n', 'de fg\r', 'kl\r\n']
import strutils
echo "ab c\n\nde fg\rkl\r\n".splitLines()
echo "ab c\n\nde fg\rkl\r\n".splitLines(keepEol = true)
@["ab c", "", "de fg", "kl", ""]
@["ab c\n", "\n", "de fg\r", "kl\r\n", ""]
- Notes
- The Nim version creates separate splits for the
\r
and\n
. Note the last""
split created by Nim, but not by Python for the same input string.
- The Nim version creates separate splits for the
Convert #
See the encodings
module for equivalents of Python decode
and
encode
functions.
Others #
There is no equivalent for the Python translate
function ,
in Nim as of writing this ( ).