programming

String Functions: Nim vs Python

Wed Aug 9, 2017

While learning the Nim language and trying to correlate that with my Python 3 knowledge, I came across this awesome comparison table of string manipulation functions between the two languages.

My utmost gratitude goes to the developers of Nim, Python, Org, ob-nim and ob-python, and of course Hugo which allowed me to publish my notes in this presentable format.

Here are the code samples and their outputs. In each section below, you will find a Python code snippet, followed by its output, and then the same implementation in Nim, followed by the output of that.

Updates #

<2021-05-27 Thu>: Update to Nim 1.5.1
<2020-02-10 Mon>: Update to Nim 1.1.1
<2018-10-03 Wed>: Update to Nim 0.19.0 (just confirmed that all code blocks work as before—needed no modification).
<2018-06-28 Thu>: Add Better IsLower/IsUpper section; update Python to 3.7.0 and Nim to the latest devel as of today.
<2017-12-13 Wed>: Update the Nim snippets output using the improved echo! The echo output difference is notable in the .split examples. This fixes the issue about confusing echo outputs that I raised in Nim Issue #6225. Big thanks to @bluenote10 from GitHub for Nim PR #6825!
<2017-11-29 Wed>: Update the Understanding the ^N syntax example that gave incorrect output before Nim Issue #6223 got fixed.
<2017-11-28 Tue>: Update the .join example that did not work before Nim Issue #6210 got fixed.
<2017-11-27 Mon>: Use the binary operator ..< instead of the combination of binary operator .. and the deprecated unary < operator.

String slicing #

All characters except last #

str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
print(str[:-1])

a	bc	def	aghij	cklm	danopqrstuv	adefwxyz	zy

var str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
# Always add a space around the .. and ..< operators
echo str[0 ..< str.high]
# or
echo str[0 .. ^2]

a	bc	def	aghij	cklm	danopqrstuv	adefwxyz	zy
a	bc	def	aghij	cklm	danopqrstuv	adefwxyz	zy

Understanding the `^N` syntax #

var str = "abc"
# Always add a space around the .. and ..< operators
echo "1st char(0) to last, including \\0(^0) : ", str[0 .. ^0] # Interestingly, this also prints the NULL character in the output.. looks like "abc^@" in Emacs
echo "1st char(0) to last       (^1) «3rd»  : ", str[0 .. ^1]
echo "1st char(0) to 2nd-to-last(^2) «2nd»  : ", str[0 .. ^2]
echo "1st char(0) to 3rd-to-last(^3) «1st»  : ", str[0 .. ^3]
echo "1st char(0) to 4th-to-last(^4) «0th»  : ", str[0 .. ^4]
# echo "1st char(0) to 4th-to-last(^4) «0th»  : ", str[0 .. ^5] # Error: unhandled exception: value out of range: -1 [RangeError]
# echo "2nd char(1) to 4th-to-last(^4) «0th»  : ", str[1 .. ^4] # Error: unhandled exception: value out of range: -1 [RangeError]
echo "2nd char(1) to 3rd-to-last(^3) «1st»  : ", str[1 .. ^3]
echo "2nd char(1) to 2nd-to-last(^2) «2nd»  : ", str[1 .. ^2]
echo "2nd char(1) to last,      (^1) «3rd»  : ", str[1 .. ^1]
echo "Now going a bit crazy .."
echo " 2nd-to-last(^2) «2nd» char to 3rd(2)         : ", str[^2 .. 2]
echo " 2nd-to-last(^2) «2nd» char to last(^1) «3rd» : ", str[^2 .. ^1]
echo " 3rd-to-last(^3) «1st» char to 3rd(2)         : ", str[^3 .. 2]

1st char(0) to last, including \0(^0) : abc
1st char(0) to last       (^1) «3rd»  : abc
1st char(0) to 2nd-to-last(^2) «2nd»  : ab
1st char(0) to 3rd-to-last(^3) «1st»  : a
1st char(0) to 4th-to-last(^4) «0th»  :
2nd char(1) to 3rd-to-last(^3) «1st»  :
2nd char(1) to 2nd-to-last(^2) «2nd»  : b
2nd char(1) to last,      (^1) «3rd»  : bc
Now going a bit crazy ..
 2nd-to-last(^2) «2nd» char to 3rd(2)         : bc
 2nd-to-last(^2) «2nd» char to last(^1) «3rd» : bc
 3rd-to-last(^3) «1st» char to 3rd(2)         : abc

Notes

It is recommended to always use a space around the .. and ..< binary operators to get consistent results (and no compilation errors!). Examples: [0 ..< str.high], [0 .. str.high], [0 .. ^2], [ .. ^2]. This is based on the tip by @Araq from GitHub (also one of the core devs of Nim). You will find the full discussion around this topic of dots and spaces in Nim Issue #6216.
Special ascii chars like % . & $ are collected into a single operator token. – Araq
To repeat: Always add a space around the .. and ..< operators.
As of 70ea45cdba, the < unary operator is deprecated! So do [0 ..< str.high] instead of [0 .. <str.high] (see Nim Issue #6788).
With the example str variable value being "abc", earlier both str[0 .. ^5] and str[1 .. ^4] returned an empty string incorrectly! (see Nim Issue #6223). That now got fixed in b74a5148a9. After the fix, those will cause this error:
```
system.nim(3534)         []
system.nim(2819)         sysFatal
Error: unhandled exception: value out of range: -1 [RangeError]
```
Also after this fix, str[0 .. ^0] outputs abc^@ (where ^@ is the representation of NULL character).. very cool!

All characters except first #

str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
print(str[1:])

	bc	def	aghij	cklm	danopqrstuv	adefwxyz	zyx

var str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
# echo str[1 .. ] # Does not work.. Error: expression expected, but found ']'
# https://github.com/nim-lang/Nim/issues/6212
# Always add a space around the .. and ..< operators
echo str[1 .. str.high]
# or
echo str[1 .. ^1] # second(1) to last(^1)

	bc	def	aghij	cklm	danopqrstuv	adefwxyz	zyx
	bc	def	aghij	cklm	danopqrstuv	adefwxyz	zyx

All characters except first and last #

str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
print(str[1:-1])

	bc	def	aghij	cklm	danopqrstuv	adefwxyz	zy

var str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
# Always add a space around the .. and ..< operators
echo str[1 ..< str.high]
# or
echo str[1 .. ^2] # second(1) to second-to-last(^2)

	bc	def	aghij	cklm	danopqrstuv	adefwxyz	zy
	bc	def	aghij	cklm	danopqrstuv	adefwxyz	zy

Count #

str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
print(str.count('a'))
print(str.count('de'))

4
2

import strutils
var str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
echo str.count('a')
echo str.count("de")

4
2

Starts/ends with #

Starts With #

str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
print(str.startswith('a'))
print(str.startswith('a\t'))
print(str.startswith('z'))

True
True
False

import strutils
var str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
echo str.startsWith('a') # Recommended Nim style
# or
echo str.startswith('a')
# or
echo str.starts_with('a')
echo str.startsWith("a\t")
echo str.startsWith('z')

true
true
true
true
false

Notes

All Nim identifiers are case and underscore insensitive (except for the first character of the identifier), as seen in the above example. So any of startsWith or startswith or starts_with would work the exact same way.
Though, it has to be noted that using the camelCase variant (startsWith) is preferred in Nim.

Ends With #

str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
print(str.endswith('x'))
print(str.endswith('yx'))
print(str.endswith('z'))

True
True
False

import strutils
var str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
echo str.endsWith('x')
echo str.endsWith("yx")
echo str.endsWith('z')

true
true
false

Expand Tabs #

str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
print(str.expandtabs())
print(str.expandtabs(4))

a       bc      def     aghij   cklm    danopqrstuv     adefwxyz        zyx
a   bc  def aghij   cklm    danopqrstuv adefwxyz    zyx

import strmisc
var str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
echo str.expandTabs()
echo str.expandTabs(4)

a       bc      def     aghij   cklm    danopqrstuv     adefwxyz        zyx
a   bc  def aghij   cklm    danopqrstuv adefwxyz    zyx

Find/Index #

Find (from left) #

str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
print(str.find('a'))
print(str.find('b'))
print(str.find('c'))
print(str.find('zyx'))
print(str.find('aaa'))

import strutils
var str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
echo str.find('a')
echo str.find('b')
echo str.find('c')
echo str.find("zyx")
echo str.find("aaa")

Find from right #

str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
print(str.rfind('a'))
print(str.rfind('b'))
print(str.rfind('c'))
print(str.rfind('zyx'))
print(str.rfind('aaa'))

import strutils
var str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
echo str.rfind('a')
echo str.rfind('b')
echo str.rfind('c')
echo str.rfind("zyx")
echo str.rfind("aaa")

Index (from left) #

From Python 3 docs,

Like find(), but raise ValueError when the substring is not found.

str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
print(str.index('a'))
print(str.index('b'))
print(str.index('c'))
print(str.index('zyx'))
# print(str.index('aaa')) # Throws ValueError: substring not found

Nim does not have an error raising index function like that out-of-box, but something like that can be done with:

import strutils
var str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
# https://nim-lang.org/docs/strutils.html#find,string,string,Natural,Natural
# proc find(s, sub: string; start: Natural = 0; last: Natural = 0): int {..}
proc index(s, sub: auto; start: Natural = 0; last: Natural = 0): int =
  result = s.find(sub, start, last)
  if result<0:
    raise newException(ValueError, "$1 not found in $2".format(sub, s))

echo str.index('a')
echo str.index('b')
echo str.index('c')
echo str.index("zyx")
# echo str.index("aaa") # Error: unhandled exception: aaa not found in a	bc	def	aghij	cklm	danopqrstuv	adefwxyz	zyx [ValueError]

Notes

No Nim equivalent, but I came up with my own index proc for Nim above.

Index from right #

From Python 3 docs,

Like rfind(), but raise ValueError when the substring is not found.

str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
print(str.rindex('a'))
print(str.rindex('b'))
print(str.rindex('c'))
print(str.rindex('zyx'))
# print(str.rindex('aaa')) # Throws ValueError: substring not found

Nim does not have an error raising rindex function like that out-of-box, but something like that can be done with:

import strutils
var str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
# https://nim-lang.github.io/Nim/strutils.html#rfind%2Cstring%2Cstring%2CNatural
# proc rfind(s, sub: string; start: Natural = 0; last = - 1): int {..}
proc rindex(s, sub: auto; start: Natural = 0; last = - 1): int =
  result = s.rfind(sub, start, last)
  if result<0:
    raise newException(ValueError, "$1 not found in $2".format(sub, s))

echo str.rindex('a')
echo str.rindex('b')
echo str.rindex('c')
echo str.rindex("zyx")
# echo str.rindex("aaa") # Error: unhandled exception: aaa not found in a	bc	def	aghij	cklm	danopqrstuv	adefwxyz	zyx [ValueError]

Notes

No Nim equivalent, but I came up with my own rindex proc for Nim above.

String Predicates #

Is Alphanumeric? #

print('abc'.isalnum())
print('012'.isalnum())
print('abc012'.isalnum())
print('abc012_'.isalnum())
print('{}'.isalnum())
print('Unicode:')
print('ábc'.isalnum())

True
True
True
False
False
Unicode:
True

import std/[strutils, sequtils]

echo "abc".allIt(it.isAlphaNumeric())
echo "012".allIt(it.isAlphaNumeric())
echo "abc012".allIt(it.isAlphaNumeric())
echo "abc012_".allIt(it.isAlphaNumeric())
echo "{}".allIt(it.isAlphaNumeric())
echo "[Wrong] ", "ábc".allIt(it.isAlphaNumeric()) # Returns false! isAlphaNumeric works only for ascii.

true
true
true
false
false
[Wrong] false

TODO Figure out how to write unicode-equivalent of `isAlphaNumeric` #

Is Alpha? #

print('abc'.isalpha())
print('012'.isalpha())
print('abc012'.isalpha())
print('abc012_'.isalpha())
print('{}'.isalpha())
print('Unicode:')
print('ábc'.isalpha())

True
False
False
False
False
Unicode:
True

import strutils except isAlpha
import std/[unicode, sequtils]

echo "abc".allIt(it.isAlphaAscii())
echo "012".allIt(it.isAlphaAscii())
echo "abc012".allIt(it.isAlphaAscii())
echo "abc012_".allIt(it.isAlphaAscii())
echo "{}".allIt(it.isAlphaAscii())
echo "Unicode:"
echo unicode.isAlpha("ábc")
# or
echo isAlpha("ábc") # unicode prefix is not needed
                    # because of import strutils except isAlpha
# or
echo "ábc".isAlpha() # from unicode

true
false
false
false
false
Unicode:
true
true
true

Notes

Thanks to the tip from @dom96 from GitHub on the use of except in import:
```
import strutils except isAlpha
import unicode
```
That prevents the ambiguous call error like below as we are specifying that import everything from strutils, except for the isAlpha proc. Thus the unicode version of isAlpha proc is used automatically.
nim_src_28505flZ.nim(14, 13) Error: ambiguous call; both strutils.isAlpha(s: string)[declared in lib/pure/strutils.nim(289, 5)] and unicode.isAlpha(s: string)[declared in lib/pure/unicode.nim(1416, 5)] match for: (string)

Is Digit? #

print('abc'.isdigit())
print('012'.isdigit())
print('abc012'.isdigit())
print('abc012_'.isdigit())
print('{}'.isdigit())

False
True
False
False
False

import std/[strutils, sequtils]

echo "abc".allIt(it.isDigit())
echo "012".allIt(it.isDigit())
echo "abc012".allIt(it.isDigit())
echo "abc012_".allIt(it.isDigit())
echo "{}".allIt(it.isDigit())

false
true
false
false
false

Better IsLower/IsUpper #

Nim Issue #7963 did not get resolved as I would have liked. This section has isLowerAsciiPlus, isUpperAsciiPlus, isLowerPlus, isUpperPlus procs that accept a string input that replace their non-Plus equivalents from strutils and unicode modules.

import strutils except isLower, isUpper
import unicode

template isCaseImpl(s, charProc) =
  var hasAtleastOneAlphaChar = false
  if s.len == 0: return false
  for c in s:
    var charIsAlpha = c.isAlphaAscii()
    if not hasAtleastOneAlphaChar:
      hasAtleastOneAlphaChar = charIsAlpha
    if charIsAlpha and (not charProc(c)):
      return false
  return hasAtleastOneAlphaChar

proc isLowerAsciiPlus(s: string): bool =
  ## Checks whether ``s`` is lower case.
  ##
  ## This checks ASCII characters only.
  ##
  ## Returns true if all alphabetical characters in ``s`` are lower
  ## case.  Returns false if none of the characters in ``s`` are
  ## alphabetical.
  ##
  ## Returns false if ``s`` is an empty string.
  isCaseImpl(s, isLowerAscii)

proc isUpperAsciiPlus(s: string): bool =
  ## Checks whether ``s`` is upper case.
  ##
  ## This checks ASCII characters only.
  ##
  ## Returns true if all alphabetical characters in ``s`` are upper
  ## case.  Returns false if none of the characters in ``s`` are
  ## alphabetical.
  ##
  ## Returns false if ``s`` is an empty string.
  isCaseImpl(s, isUpperAscii)

template runeCaseCheck(s, runeProc) =
  ## Common code for rune.isLower and rune.isUpper.
  if len(s) == 0: return false

  var
    i = 0
    rune: Rune
    hasAtleastOneAlphaRune = false

  while i < len(s):
    fastRuneAt(s, i, rune, doInc=true)
    var runeIsAlpha = isAlpha(rune)
    if not hasAtleastOneAlphaRune:
      hasAtleastOneAlphaRune = runeIsAlpha
    if runeIsAlpha and (not runeProc(rune)):
      return false
  return hasAtleastOneAlphaRune

proc isLowerPlus(s: string): bool =
  ## Checks whether ``s`` is lower case.
  ##
  ## Returns true if all alphabetical runes in ``s`` are lower case.
  ## Returns false if none of the runes in ``s`` are alphabetical.
  ##
  ## Returns false if ``s`` is an empty string.
  runeCaseCheck(s, isLower)

proc isUpperPlus(s: string): bool =
  ## Checks whether ``s`` is upper case.
  ##
  ## Returns true if all alphabetical runes in ``s`` are upper case.
  ## Returns false if none of the runes in ``s`` are alphabetical.
  ##
  ## Returns false if ``s`` is an empty string.
  runeCaseCheck(s, isUpper)

Is Lower? #

print('a'.islower())
print('A'.islower())
print('abc'.islower())
print('Abc'.islower())
print('aBc'.islower())
print('012'.islower())
print('{}'.islower())
print('ABC'.islower())
print('À'.islower())
print('à'.islower())
print('a b'.islower()) # Precedence for https://github.com/nim-lang/Nim/issues/7963
print('ab?!'.islower()) # Precedence for https://github.com/nim-lang/Nim/issues/7963
print('1, 2, 3 go!'.islower()) # Precedence for https://github.com/nim-lang/Nim/issues/7963
print(' '.islower()) # checking this proc on a non-alphabet char
print('(*&#@(^#$ '.islower()) # checking this proc on a non-alphabet string

True
False
True
False
False
False
False
False
False
True
True
True
True
False
False

<<islower_isupper_plus>>

echo 'a'.isLowerAscii()
echo 'A'.isLowerAscii()
echo "abc".isLowerAsciiPlus()
echo "Abc".isLowerAsciiPlus()
echo "aBc".isLowerAsciiPlus()
echo "012".isLowerAsciiPlus()
echo "{}".isLowerAsciiPlus()
echo "ABC".isLowerAsciiPlus()
echo "À".isLowerAsciiPlus()
echo "[Wrong] ", "à".isLowerAsciiPlus() # Returns false! As the name suggests, works only for ascii.
echo "À".isLowerPlus()
echo isLowerPlus("à")
echo "à".isLowerPlus()
echo "a b".isLowerAsciiPlus()
echo "a b".isLowerPlus()
echo "ab?!".isLowerPlus()
echo "1, 2, 3 go!".isLowerPlus()
echo ' '.isLowerAscii() # checking this proc on a non-alphabet char
echo ' '.Rune.isLower() # checking this proc on a non-alphabet Rune
echo "(*&#@(^#$ ".isLowerPlus() # checking this proc on a non-alphabet string

true
false
true
false
false
false
false
false
false
[Wrong] false
false
true
true
true
true
true
true
false
false
false

DONE Presence of space and punctuations in string makes isLower return false #

Nim Issue #7963 did not get resolved as I would have liked. So I just rolled my own procs in Better IsLower/IsUpper to fix this issue.

Notes

isLower from strutils is deprecated. Use isLowerAscii instead, or isLower from unicode (as done above).
To check if a non-ascii alphabet is in lower case, use unicode.isLower.

Is Upper? #

print('a'.isupper())
print('A'.isupper())
print('abc'.isupper())
print('Abc'.isupper())
print('aBc'.isupper())
print('012'.isupper())
print('{}'.isupper())
print('ABC'.isupper())
print('À'.isupper())
print('à'.isupper())
print('A B'.isupper()) # Precedence for https://github.com/nim-lang/Nim/issues/7963
print('AB?!'.isupper()) # Precedence for https://github.com/nim-lang/Nim/issues/7963
print('1, 2, 3 GO!'.isupper()) # Precedence for https://github.com/nim-lang/Nim/issues/7963
print(' '.isupper()) # checking this function on a non-alphabet char
print('(*&#@(^#$ '.isupper()) # checking this proc on a non-alphabet string

False
True
False
False
False
False
False
True
True
False
True
True
True
False
False

<<islower_isupper_plus>>

echo 'a'.isUpperAscii()
echo 'A'.isUpperAscii()
echo "abc".isUpperAsciiPlus()
echo "Abc".isUpperAsciiPlus()
echo "aBc".isUpperAsciiPlus()
echo "012".isUpperAsciiPlus()
echo "{}".isUpperAsciiPlus()
echo "ABC".isUpperAsciiPlus()
echo "[Wrong] ", "À".isUpperAsciiPlus() # Returns false! As the name suggests, works only for ascii.
echo "à".isUpperAsciiPlus()
echo "À".isUpperPlus() # from unicode
echo isUpperPlus("À")
echo "à".isUpperPlus() # from unicode
echo "A B".isUpperAsciiPlus()  #
echo "A B".isUpperPlus()
echo "AB?!".isUpperPlus()
echo "1, 2, 3 GO!".isUpperPlus()
echo ' '.isUpperAscii() # checking this proc on a non-alphabet char
echo ' '.Rune.isUpper() # checking this proc on a non-alphabet Rune
echo "(*&#@(^#$ ".isUpperPlus() # checking this proc on a non-alphabet string

false
true
false
false
false
false
false
true
[Wrong] false
false
true
true
false
true
true
true
true
false
false
false

DONE Presence of space and punctuations in string makes isUpper return false #

Nim Issue #7963 did not get resolved as I would have liked. So I just rolled my own procs in Better IsLower/IsUpper to fix this issue.

Is Space? #

print(''.isspace())
print(' '.isspace())
print('\t'.isspace())
print('\r'.isspace())
print('\n'.isspace())
print(' \t\n'.isspace())
print('abc'.isspace())
print('Testing with ZERO WIDTH SPACE unicode character below:')
print(''.isspace())

False
True
True
True
True
True
False
Testing with ZERO WIDTH SPACE unicode character below:
False

import strutils except isSpace
import std/[unicode, sequtils]

proc isSpaceAscii(s: string): bool =
  if s == "":
    return false
  s.allIt(it.isSpaceAscii())

echo "".isSpaceAscii() # empty string has to be in double quotes
echo ' '.isSpaceAscii()
echo '\t'.isSpaceAscii()
echo '\r'.isSpaceAscii()
echo "\n".isSpaceAscii() # \n is a string, not a character in Nim
echo " \t\n".isSpaceAscii()
echo "abc".isSpaceAscii()
echo "Testing with ZERO WIDTH SPACE unicode character below:"
echo "[Wrong] ", "".isSpaceAscii() # Returns false! As the name suggests, works only for ascii.
echo "".isSpace() # from unicode

false
true
true
true
true
true
false
Testing with ZERO WIDTH SPACE unicode character below:
[Wrong] false
false

Notes

Empty string results in a false result for both Python and Nim variants of isspace.
\n is a string, not a character in Nim, because based on the OS, \n can comprise of one or more characters.
isSpace from strutils is deprecated. Use isSpaceAscii instead, or isSpace from unicode (as done above).
To check if a non-ascii alphabet is in space case, use unicode.isSpace.
Interestingly, Nim’s isSpace from unicode module returns true for ZERO WIDTH SPACE unicode character (0x200b) as input, but Python’s isspace returns false. I believe Python’s behavior here is incorrect.

Is Title? #

print(''.istitle())
print('T'.istitle())
print('ǅ'.istitle())
print('The Quick? (“Brown”) Fox Can’t Jump 32.3 Feet, Right?'.istitle()) # Python's output is wrong
print('this is not a title'.istitle())
print('This Is A Title'.istitle())
print('This Is À Title'.istitle())
print('This Is Not a Title'.istitle())

False
True
True
False
False
True
True
False

import std/[unicode, strformat]

# https://github.com/nim-lang/Nim/issues/14348#issuecomment-629414257
proc isTitle(s: string): bool =
  proc isUpperOrTitle(r: Rune): bool = r.isUpper() or r.isTitle()

  var
    alphaSeen = false
  for word in s.split(): # Split s into a sequence of words
    result = true
    var
      upperSeen = false
    let
      runes = word.toRunes()
    for r in runes:
      if not r.isAlpha():
        continue
      alphaSeen = true
      if not upperSeen:
        if r.isUpperOrTitle():
          upperSeen = true
        else:
          return false
      else:
        if r.isUpperOrTitle():
          return false
  if not alphaSeen:
    return false

echo "".isTitle()
echo "T".isTitle()
echo "ǅ".isTitle()
echo "The Quick? (“Brown”) Fox Can’t Jump 32.3 Feet, Right?".isTitle()
echo "this is not a title".isTitle()
echo "This Is A Title".isTitle()
echo "This Is À Title".isTitle()
echo "This Is Not a Title".isTitle()

false
true
true
true
false
true
true
false

Join #

print(' '.join(['a', 'b', 'c']))
print('xx'.join(['a', 'b', 'c']))

a b c
axxbxxc

import strutils
echo "Sequences:"
# echo @["a", "b", "c"].join(' ') # Error: type mismatch: got (seq[string], char)
echo @["a", "b", "c"].join(" ")
echo join(@["a", "b", "c"], " ")
echo @["a", "b", "c"].join("xx")
echo @['a', 'b', 'c'].join("") # join characters to form strings
echo "Lists:"
echo ["a", "b", "c"].join(" ") # Works after Nim issue # 6210 got fixed.
echo (["a", "b", "c"].join(" ")) # Works!
echo join(["a", "b", "c"], " ") # Works!
var list = ["a", "b", "c"]
echo list.join(" ") # Works too!

Sequences:
a b c
a b c
axxbxxc
abc
Lists:
a b c
a b c
a b c
a b c

Notes

The second arg to join, the separator argument has to be a string, cannot be a character.
echo ["a", "b", "c"].join(" ") did not work prior to the fix in ddc131cf07 (see Nim Issue #6210), but now it does!

Justify with filling #

Center Justify with filling #

str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
print(str.center(80))
print(str.center(80, '*'))

                  a	bc	def	aghij	cklm	danopqrstuv	adefwxyz	zyx
******************a	bc	def	aghij	cklm	danopqrstuv	adefwxyz	zyx******************

import strutils
var str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
echo str.center(80)
echo str.center(80, '*')
# or
echo center(str, 80, '*')

                  a	bc	def	aghij	cklm	danopqrstuv	adefwxyz	zyx
******************a	bc	def	aghij	cklm	danopqrstuv	adefwxyz	zyx******************
******************a	bc	def	aghij	cklm	danopqrstuv	adefwxyz	zyx******************

Left Justify with filling #

print('abc'.ljust(2, '*'))
print('abc'.ljust(10, '*'))

abc
abc*******

import strutils
echo "abc".alignLeft(2, '*')
echo "abc".alignLeft(10, '*')

abc
abc*******

Right Justify with filling #

print('abc'.rjust(10, '*'))

*******abc

import strutils
echo "abc".align(10, '*')

*******abc

Zero Fill #

print('42'.zfill(5))
print('-42'.zfill(5))
print(' -42'.zfill(5))

00042
-0042
0 -42

import strutils
echo "Using align:"
echo "42".align(5, '0')
echo "-42".align(5, '0')

echo "Using zfill:"
proc zfill(s: string; count: Natural): string =
  let strlen = len(s)
  if strlen < count:
    if s[0]=='-':
      result = "-"
      result.add("0".repeat(count-strlen))
      result.add(s[1 .. s.high])
    else:
      result = "0".repeat(count-strlen)
      result.add(s)
  else:
    result = s

echo "42".zfill(5)
echo "-42".zfill(5)
echo " -42".zfill(5)

Using align:
00042
00-42
Using zfill:
00042
-0042
0 -42

Notes

The align in Nim does not do the right thing as the Python zfill does when filling zeroes on the left in strings representing negative numbers.
No Nim equivalent, but I came up with my own zfill proc for Nim above.

Case conversion #

To Lower #

print('a'.lower())
print('A'.lower())
print('abc'.lower())
print('Abc'.lower())
print('aBc'.lower())
print('012'.lower())
print('{}'.lower())
print('ABC'.lower())
print('À'.lower())
print('à'.lower())

a
a
abc
abc
abc
012
{}
abc
à
à

import strutils except toLower
import unicode
echo 'a'.toLowerAscii()
echo 'A'.toLowerAscii()
echo "abc".toLowerAscii()
echo "Abc".toLowerAscii()
echo "aBc".toLowerAscii()
echo "012".toLowerAscii()
echo "{}".toLowerAscii()
echo "ABC".toLowerAscii()
echo "[Wrong] ", "À".toLowerAscii() # Does not work! As the name suggests, works only for ascii.
echo "à".toLowerAscii()
echo "À".toLower() # from unicode
echo "à".toLower() # from unicode

a
a
abc
abc
abc
012
{}
abc
[Wrong] À
à
à
à

Notes

toLower from strutils is deprecated. Use toLowerAscii instead, or toLower from unicode (as done above).
To convert a non-ascii alphabet to lower case, use unicode.toLower.

To Upper #

print('a'.upper())
print('A'.upper())
print('abc'.upper())
print('Abc'.upper())
print('aBc'.upper())
print('012'.upper())
print('{}'.upper())
print('ABC'.upper())
print('À'.upper())
print('à'.upper())

A
A
ABC
ABC
ABC
012
{}
ABC
À
À

import strutils except toUpper
import unicode
echo 'a'.toUpperAscii()
echo 'A'.toUpperAscii()
echo "abc".toUpperAscii()
echo "Abc".toUpperAscii()
echo "aBc".toUpperAscii()
echo "012".toUpperAscii()
echo "{}".toUpperAscii()
echo "ABC".toUpperAscii()
echo "À".toUpperAscii()
echo "[Wrong] ", "à".toUpperAscii() # Does not work! As the name suggests, works only for ascii.
echo "À".toUpper() # from unicode
echo "à".toUpper() # from unicode

A
A
ABC
ABC
ABC
012
{}
ABC
À
[Wrong] à
À
À

Notes

toUpper from strutils is deprecated. Use toUpperAscii instead, or toUpper from unicode (as done above).
To convert a non-ascii alphabet to upper case, use unicode.toUpper.

Capitalize #

str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
print(str.capitalize())

A	bc	def	aghij	cklm	danopqrstuv	adefwxyz	zyx

import strutils
var str = "a\tbc\tdef\taghij\tcklm\tdanopqrstuv\tadefwxyz\tzyx"
echo str.capitalizeAscii
# or
echo capitalizeAscii(str)

A	bc	def	aghij	cklm	danopqrstuv	adefwxyz	zyx
A	bc	def	aghij	cklm	danopqrstuv	adefwxyz	zyx

To Title #

print('convert this to title á û'.title())

Convert This To Title Á Û

import unicode
echo "convert this to title á û".title()

Convert This To Title Á Û

Swap Case #

print('Swap CASE example á û Ê'.swapcase())
print('Swap CASE example á û Ê'.swapcase().swapcase())

sWAP case EXAMPLE Á Û ê
Swap CASE example á û Ê

import unicode
echo "Swap CASE example á û Ê".swapcase()
echo "Swap CASE example á û Ê".swapcase().swapcase()

sWAP case EXAMPLE Á Û ê
Swap CASE example á û Ê

Notes

See this SO Q/A to read about few cases where s.swapcase().swapcase()==s is not true (at least for Python).

Strip #

Left/leading and right/trailing Strip #

print('«' + '   spacious   '.strip() + '»')
print('«' + '\n string \n \n\n'.strip() + '»')
print('«' + '\n'.strip() + '»')
print('www.example.com'.strip('cmowz.'))
print('mississippi'.strip('mipz'))

«spacious»
«string»
«»
example
ssiss

import strutils
echo "«" & "   spacious   ".strip() & "»"
echo "«" & "\n string \n \n\n".strip() & "»"
echo "«" & "\n".strip() & "»"
echo "www.example.com".strip(chars={'c', 'm', 'o', 'w', 'z', '.'})
echo "mississippi".strip(chars={'m', 'i', 'p', 'z'})

«spacious»
«string»
«»
example
ssiss

Notes

Python strip takes a string as an argument to specify the letters that need to be stripped off the input string. But Nim strip requires a Set of characters.

Left/leading Strip #

print('«' + '   spacious   '.lstrip() + '»')
print('www.example.com'.lstrip('cmowz.'))
print('mississippi'.lstrip('mipz'))

«spacious   »
example.com
ssissippi

import strutils
echo "«", "   spacious   ".strip(trailing=false), "»"
echo "www.example.com".strip(trailing=false, chars={'c', 'm', 'o', 'w', 'z', '.'})
echo "mississippi".strip(trailing=false, chars={'m', 'i', 'p', 'z'})

«spacious   »
example.com
ssissippi

Right/trailing Strip #

print('«' + '   spacious   '.rstrip() + '»')
print('www.example.com'.rstrip('cmowz.'))
print('mississippi'.rstrip('mipz'))

«   spacious»
www.example
mississ

import strutils
echo "«", "   spacious   ".strip(leading=false), "»"
echo "www.example.com".strip(leading=false, chars={'c', 'm', 'o', 'w', 'z', '.'})
echo "mississippi".strip(leading=false, chars={'m', 'i', 'p', 'z'})

«   spacious»
www.example
mississ

Partition #

First occurrence partition #

print('ab:ce:ef:ce:ab'.partition(':'))
print('ab:ce:ef:ce:ab'.partition('ce'))

('ab', ':', 'ce:ef:ce:ab')
('ab:', 'ce', ':ef:ce:ab')

import strmisc
echo "ab:ce:ef:ce:ab".partition(":") # The argument is a string, not a character
echo "ab:ce:ef:ce:ab".partition("ce")

("ab", ":", "ce:ef:ce:ab")
("ab:", "ce", ":ef:ce:ab")

Right partition or Last occurrence partition #

print('ab:ce:ef:ce:ab'.rpartition(':'))
print('ab:ce:ef:ce:ab'.rpartition('ce'))

('ab:ce:ef:ce', ':', 'ab')
('ab:ce:ef:', 'ce', ':ab')

import strmisc
echo "ab:ce:ef:ce:ab".rpartition(":") # The argument is a string, not a character
# or
echo "ab:ce:ef:ce:ab".partition(":", right=true)
echo "ab:ce:ef:ce:ab".rpartition("ce")
# or
echo "ab:ce:ef:ce:ab".partition("ce", right=true)

("ab:ce:ef:ce", ":", "ab")
("ab:ce:ef:ce", ":", "ab")
("ab:ce:ef:", "ce", ":ab")
("ab:ce:ef:", "ce", ":ab")

Replace #

print('abc abc abc'.replace(' ab', '-xy'))
print('abc abc abc'.replace(' ', '')) # Strip all spaces
print('abc abc abc'.replace(' ab', '-xy', 0))
print('abc abc abc'.replace(' ab', '-xy', 1))
print('abc abc abc'.replace(' ab', '-xy', 2))

abc-xyc-xyc
abcabcabc
abc abc abc
abc-xyc abc
abc-xyc-xyc

import strutils
echo "abc abc abc".replace(" ab", "-xy")
echo "abc abc abc".replace(" ", "") # Strip all spaces
# echo "abc abc abc".replace(" ab", "-xy", 0) # Invalid, does not expect a count:int argument
# echo "abc abc abc".replace(" ab", "-xy", 1) # Invalid, does not expect a count:int argument
# echo "abc abc abc".replace(" ab", "-xy", 2) # Invalid, does not expect a count:int argument

abc-xyc-xyc
abcabcabc

Notes

Nim does not allow specifying the number of occurrences to be replaced using a count argument as in the Python version of replace.

Split #

Split (from left) #

print('1,2,3'.split(','))
print('1,2,3'.split(',', maxsplit=1))
print('1,2,,3,'.split(','))
print('1::2::3'.split('::'))
print('1::2::3'.split('::', maxsplit=1))
print('1::2::::3::'.split('::'))

['1', '2', '3']
['1', '2,3']
['1', '2', '', '3', '']
['1', '2', '3']
['1', '2::3']
['1', '2', '', '3', '']

import strutils
echo "1,2,3".split(',')
echo "1,2,3".split(',', maxsplit=1)
echo "1,2,,3,".split(',')
echo "1::2::3".split("::")
echo "1::2::3".split("::", maxsplit=1)
echo "1::2::::3::".split("::")

@["1", "2", "3"]
@["1", "2,3"]
@["1", "2", "", "3", ""]
@["1", "2", "3"]
@["1", "2::3"]
@["1", "2", "", "3", ""]

Split from right #

rsplit behaves just like split unless the maxsplit argument is given

print('1,2,3'.rsplit(','))
print('1,2,3'.rsplit(',', maxsplit=1))
print('1,2,,3,'.rsplit(','))
print('1::2::3'.rsplit('::'))
print('1::2::3'.rsplit('::', maxsplit=1))
print('1::2::::3::'.rsplit('::'))

['1', '2', '3']
['1,2', '3']
['1', '2', '', '3', '']
['1', '2', '3']
['1::2', '3']
['1', '2', '', '3', '']

import strutils
echo "1,2,3".rsplit(',')
echo "1,2,3".rsplit(',', maxsplit=1)
echo "1,2,,3,".rsplit(',')
echo "1::2::3".rsplit("::")
echo "1::2::3".rsplit("::", maxsplit=1)
echo "1::2::::3::".rsplit("::")

@["1", "2", "3"]
@["1,2", "3"]
@["1", "2", "", "3", ""]
@["1", "2", "3"]
@["1::2", "3"]
@["1", "2", "", "", "3", ""]

Split Lines #

print('ab c\n\nde fg\rkl\r\n'.splitlines())
print('ab c\n\nde fg\rkl\r\n'.splitlines(keepends=True))

['ab c', '', 'de fg', 'kl']
['ab c\n', '\n', 'de fg\r', 'kl\r\n']

import strutils
echo "ab c\n\nde fg\rkl\r\n".splitLines()
echo "ab c\n\nde fg\rkl\r\n".splitLines(keepEol = true)

@["ab c", "", "de fg", "kl", ""]
@["ab c\n", "\n", "de fg\r", "kl\r\n", ""]

Notes

The Nim version creates separate splits for the \r and \n. Note the last "" split created by Nim, but not by Python for the same input string.

Convert #

See the encodings module for equivalents of Python decode and encode functions.

Others #

There is no equivalent for the Python translate function , in Nim as of writing this (<2017-08-09 Wed>).

Updates #

String slicing #

All characters except last #

Understanding the ^N syntax #

All characters except first #

All characters except first and last #

Count #

Starts/ends with #

Starts With #

Ends With #

Expand Tabs #

Find/Index #

Find (from left) #

Find from right #

Index (from left) #

Index from right #

String Predicates #

Is Alphanumeric? #

TODO Figure out how to write unicode-equivalent of isAlphaNumeric #

Is Alpha? #

Is Digit? #

Better IsLower/IsUpper #

Is Lower? #

DONE Presence of space and punctuations in string makes isLower return false #

Is Upper? #

DONE Presence of space and punctuations in string makes isUpper return false #

Is Space? #

Is Title? #

Join #

Justify with filling #

Center Justify with filling #

Left Justify with filling #

Right Justify with filling #

Zero Fill #

Case conversion #

To Lower #

To Upper #

Capitalize #

To Title #

Swap Case #

Strip #

Left/leading and right/trailing Strip #

Left/leading Strip #

Right/trailing Strip #

Partition #

First occurrence partition #

Right partition or Last occurrence partition #

Replace #

Split #

Split (from left) #

Split from right #

Split Lines #

Convert #

Others #

References #

Understanding the `^N` syntax #

TODO Figure out how to write unicode-equivalent of `isAlphaNumeric` #