<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>regex on
A Scripter's Notes</title><link>https://scripter.co/tags/regex/</link><description>Recent content in regex
on A Scripter's Notes</description><language>en-us</language><managingEditor>kaushal.modi@gmail.com (Kaushal Modi)</managingEditor><webMaster>kaushal.modi@gmail.com (Kaushal Modi)</webMaster><lastBuildDate>Wed, 22 Apr 2026 08:24:58 -0400</lastBuildDate><generator>Hugo -- gohugo.io</generator><docs>https://validator.w3.org/feed/docs/rss2.html</docs><atom:link href="https://scripter.co/tags/regex/index.xml" rel="self" type="application/rss+xml"/><item><title>grep -Po</title><link>https://scripter.co/grep-po/</link><description>&lt;blockquote>Using &lt;code>grep&lt;/code> to do substring extraction in shell scripts.&lt;/blockquote>&lt;div class="ox-hugo-toc toc">
&lt;div class="heading">Table of Contents&lt;/div>
&lt;ul>
&lt;li>&lt;a href="#grep-po-problem-statement">Problem statement&lt;/a>&lt;/li>
&lt;li>&lt;a href="#solution-using-grep-po">Solution using &lt;code>grep -Po&lt;/code>&lt;/a>&lt;/li>
&lt;li>&lt;a href="#arriving-to-this-solution">Arriving to this solution&lt;/a>&lt;/li>
&lt;li>&lt;a href="#summary">Summary&lt;/a>&lt;/li>
&lt;/ul>
&lt;/div>
&lt;!--endtoc-->
&lt;p>I like &lt;a href="https://en.wikipedia.org/wiki/Regular_expression">regular expressions&lt;/a>
&lt;span class="sidenote-number">&lt;small class="sidenote">
I recommend using &lt;a href="https://regex101.com/">https://regex101.com/&lt;/a> to practice regular
expressions of different flavors (PCRE2, PCRE, Python, etc.) whether
or not you are new to using &lt;abbr aria-label=" regular expression" tabindex=0>regex&lt;/abbr>.
&lt;/small>&lt;/span>
as they allow me to be concise and specific about what I need to
search.&lt;/p>
&lt;p>And I have liked using regular expressions for many years, ever since
I learned Perl about fifteen years back. I am writing this post as I
am remembering the delight I felt when I realized that I can use the
familiar Perl regular expressions to do string parsing in shell
scripts. I am not exactly sure, but I probably learned about this
&lt;code>grep -Po&lt;/code> trick from &lt;em>stackexchange&lt;/em> (&lt;a href="#citeproc_bib_item_1">camh, 2011&lt;/a>).&lt;/p>
&lt;h2 id="grep-po-problem-statement">Problem statement&amp;nbsp;&lt;a class="headline-hash no-text-decoration" href="#grep-po-problem-statement">#&lt;/a>&lt;/h2>
&lt;p>I could be parsing a log file with a line like &lt;code>web report: https://foo.bar/detail.html&lt;/code> and I need to extract the
&lt;code>https://foo.bar&lt;/code> part to a shell script variable.&lt;/p>
&lt;h2 id="solution-using-grep-po">Solution using &lt;code>grep -Po&lt;/code>&amp;nbsp;&lt;a class="headline-hash no-text-decoration" href="#solution-using-grep-po">#&lt;/a>&lt;/h2>
&lt;div class="note">
&lt;p>This solution requires a GNU &lt;code>grep&lt;/code> version supporting &lt;code>-P&lt;/code>, that&amp;rsquo;s
compiled with &lt;code>libpcre&lt;/code>.
&lt;span class="sidenote-number">&lt;small class="sidenote">
&lt;em>GNU grep&lt;/em> gained the PCRE (&lt;code>-P&lt;/code>) feature back &lt;a href="https://git.savannah.gnu.org/cgit/grep.git/commit/?id=05860b2d966701a5a9f70a650d32b30ae2612eeb">in 2000&lt;/a>.
&lt;/small>&lt;/span>
Also I have never come across a system or
used one that did not have such a &lt;code>grep&lt;/code> version installed.&lt;/p>
&lt;/div>
&lt;p>I&amp;rsquo;ll throw the solution out here and then dig into the details.&lt;/p>
&lt;p>&lt;a id="code-snippet--grepPo-example">&lt;/a>&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-shell" data-lang="shell">&lt;span class="line">&lt;span class="cl">&lt;span class="nb">echo&lt;/span> &lt;span class="s2">&amp;#34;def\nabc&amp;#34;&lt;/span> &lt;span class="p">|&lt;/span> grep -Po &lt;span class="s1">&amp;#39;a\K.(?=c)&amp;#39;&lt;/span> &lt;span class="c1"># =&amp;gt; b&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="src-block-caption">
&lt;span class="src-block-number">&lt;a href="#code-snippet--grepPo-example">Code Snippet 1&lt;/a>:&lt;/span>
Extracting "b" from "abc" using &lt;code>grep -Po&lt;/code>
&lt;/div>
&lt;p>The &lt;em>grep&lt;/em> switches used here are:&lt;/p>
&lt;dl>
&lt;dt>&lt;code>-P&lt;/code>&lt;/dt>
&lt;dd>Use (P)erl regular expressions. This allows us to use the
&lt;a href="https://www.regular-expressions.info/lookaround.html">&lt;em>look around&lt;/em> regex&lt;/a> syntax like &lt;code>(?=..)&lt;/code> and special characters like
&lt;code>\K&lt;/code> (&lt;a href="#citeproc_bib_item_2">“perlre - Perl regular expressions,” n.d.&lt;/a>).&lt;/dd>
&lt;dt>&lt;code>-o&lt;/code>&lt;/dt>
&lt;dd>Print only the matched portion to the (o)utput&lt;/dd>
&lt;/dl>
&lt;h2 id="arriving-to-this-solution">Arriving to this solution&amp;nbsp;&lt;a class="headline-hash no-text-decoration" href="#arriving-to-this-solution">#&lt;/a>&lt;/h2>
&lt;p>Now I&amp;rsquo;ll start with a basic example and build up to the &lt;a href="#code-snippet--grepPo-example">above
solution&lt;/a>.&lt;/p>
&lt;dl>
&lt;dt>Problem&lt;/dt>
&lt;dd>Let&amp;rsquo;s say I have this text with two lines &amp;ldquo;def&amp;rdquo; and &amp;ldquo;abc&amp;rdquo;
and I want&lt;span class="org-target" id="org-target--wanted-grep-output">&lt;/span> to output whatever character is between &amp;ldquo;a&amp;rdquo; and &amp;ldquo;c&amp;rdquo;.&lt;/dd>
&lt;/dl>
&lt;!--listend-->
&lt;ul>
&lt;li>
&lt;p>Below, the regular expression for matching any character between &amp;ldquo;a&amp;rdquo;
and &amp;ldquo;c&amp;rdquo; ( &lt;code>'a.c'&lt;/code> ) is correct, but that will output the whole input
because the &lt;em>grep&lt;/em> of that regex succeeded.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-shell" data-lang="shell">&lt;span class="line">&lt;span class="cl">&lt;span class="nb">echo&lt;/span> &lt;span class="s2">&amp;#34;def\nabc&amp;#34;&lt;/span> &lt;span class="p">|&lt;/span> grep &lt;span class="s1">&amp;#39;a.c&amp;#39;&lt;/span> &lt;span class="c1"># =&amp;gt; def\nabc&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;li>
&lt;p>Now we add the &lt;em>grep&lt;/em> &lt;code>-o&lt;/code> switch so that it outputs only the
matched portion. As the regex is &lt;code>'a.c'&lt;/code>​, the &lt;code>-o&lt;/code> switch will
output every part of the input that matched that. So the output is
&amp;ldquo;abc&amp;rdquo;. It&amp;rsquo;s still not what we &lt;a href="#org-target--wanted-grep-output">wanted&lt;/a>.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-shell" data-lang="shell">&lt;span class="line">&lt;span class="cl">&lt;span class="nb">echo&lt;/span> &lt;span class="s2">&amp;#34;def\nabc&amp;#34;&lt;/span> &lt;span class="p">|&lt;/span> grep -o &lt;span class="s1">&amp;#39;a.c&amp;#39;&lt;/span> &lt;span class="c1"># =&amp;gt; abc&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;li>
&lt;p>Now we bring in the powerful Perl regex feature &lt;em>positive
lookahead&lt;/em>.
&lt;span class="sidenote-number">&lt;small class="sidenote">
Positive lookahead is used when you want to match something &lt;span class="underline">only
if&lt;/span> it&amp;rsquo;s followed by something else. It&amp;rsquo;s syntax looks like &lt;code>q(?=u)&lt;/code>
where that expression matches if a &lt;code>q&lt;/code> is followed by a &lt;code>u&lt;/code>, without
making the &lt;code>u&lt;/code> part of the match &amp;ndash; &lt;a href="https://www.regular-expressions.info/lookaround.html">reference&lt;/a>.
&lt;/small>&lt;/span>
But this is still not exactly what we want because &amp;ldquo;a&amp;rdquo; is still
considered as part of the match. Now the output is &amp;ldquo;ab&amp;rdquo;.&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-shell" data-lang="shell">&lt;span class="line">&lt;span class="cl">&lt;span class="nb">echo&lt;/span> &lt;span class="s2">&amp;#34;abc&amp;#34;&lt;/span> &lt;span class="p">|&lt;/span> grep -Po &lt;span class="s1">&amp;#39;a.(?=c)&amp;#39;&lt;/span> &lt;span class="c1"># =&amp;gt; ab&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;li>
&lt;p>We only need a special character that marks a point in the regex
that tells &amp;ldquo;don&amp;rsquo;t consider anything before this as part of the
match&amp;rdquo;. The &lt;code>\K&lt;/code> special construct described in the &lt;a href="https://perldoc.perl.org/perlre#Lookaround-Assertions">Perl regular
expressions doc&lt;/a> as:&lt;/p>
&lt;blockquote>
&lt;p>There is a special form of this construct, called &lt;code>\K&lt;/code> (available
since Perl 5.10.0), which causes the regex engine to &amp;ldquo;keep&amp;rdquo;
everything it had matched prior to the &lt;code>\K&lt;/code> and not include it in
matched string. This effectively provides non-experimental
variable-length lookbehind of any length.&lt;/p>
&lt;/blockquote>
&lt;p>And, thus we have the final solution:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-shell" data-lang="shell">&lt;span class="line">&lt;span class="cl">&lt;span class="nb">echo&lt;/span> &lt;span class="s2">&amp;#34;abc&amp;#34;&lt;/span> &lt;span class="p">|&lt;/span> grep -Po &lt;span class="s1">&amp;#39;a\K.(?=c)&amp;#39;&lt;/span> &lt;span class="c1"># =&amp;gt; b&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;/li>
&lt;/ul>
&lt;h2 id="summary">Summary&amp;nbsp;&lt;a class="headline-hash no-text-decoration" href="#summary">#&lt;/a>&lt;/h2>
&lt;p>Taking the example from the &lt;a href="#grep-po-problem-statement">problem statement&lt;/a>, this will work:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-bash" data-lang="bash">&lt;span class="line">&lt;span class="cl">&lt;span class="nv">string&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="s2">&amp;#34;web report: https://foo.bar/detail.html&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nv">substring&lt;/span>&lt;span class="o">=&lt;/span>&lt;span class="k">$(&lt;/span>grep -Po &lt;span class="s1">&amp;#39;web report:\s*\K.*?(?=/detail\.html)&amp;#39;&lt;/span> &lt;span class="o">&amp;lt;&amp;lt;&amp;lt;&lt;/span> &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="si">${&lt;/span>&lt;span class="nv">string&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="k">)&lt;/span>
&lt;/span>&lt;/span>&lt;span class="line">&lt;span class="cl">&lt;span class="nb">echo&lt;/span> &lt;span class="s2">&amp;#34;&lt;/span>&lt;span class="si">${&lt;/span>&lt;span class="nv">substring&lt;/span>&lt;span class="si">}&lt;/span>&lt;span class="s2">&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre tabindex="0" class="chroma">&lt;code class="language-text" data-lang="text">&lt;span class="line">&lt;span class="cl">https://foo.bar
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>
&lt;h2 id="references">References&amp;nbsp;&lt;a class="headline-hash no-text-decoration" href="#references">#&lt;/a>&lt;/h2>
&lt;div class="csl-bib-body">
&lt;div class="csl-entry">&lt;a id="citeproc_bib_item_1">&lt;/a>camh. (2011). Can grep output only specified groupings that match? [Website]. In &lt;i>Unix stackexchange&lt;/i>. &lt;a href="https://unix.stackexchange.com/a/13472/57923">https://unix.stackexchange.com/a/13472/57923&lt;/a>&lt;/div>
&lt;div class="csl-entry">&lt;a id="citeproc_bib_item_2">&lt;/a>perlre - Perl regular expressions. (n.d.). [Website]. In &lt;i>Perldoc 5.34.0&lt;/i>. Retrieved February 16, 2022, from &lt;a href="https://perldoc.perl.org/perlre">https://perldoc.perl.org/perlre&lt;/a>&lt;/div>
&lt;/div></description><author>Kaushal.Modi@fakeEmailToMakeValidatorHappy.com (Kaushal Modi)</author><category domain="https://scripter.co/categories/unix">unix</category><category domain="https://scripter.co/categories/shell">shell</category><category domain="https://scripter.co/tags/grep">grep</category><category domain="https://scripter.co/tags/regex">regex</category><category domain="https://scripter.co/tags/string">string</category><category domain="https://scripter.co/tags/perl">perl</category><category domain="https://scripter.co/tags/100daystooffload">100DaysToOffload</category><guid>https://scripter.co/grep-po/</guid><pubDate>Wed, 16 Feb 2022 21:34:00 -0500</pubDate></item></channel></rss>