Jekyll2020-09-21T22:29:07+02:00http://slawomir.net//slawomir.netI'm a Linux Hooligan and this blog is about funny, interesting and weird faces of Software Engineering.Heirloom on journal cover2020-09-21T00:00:00+02:002020-09-21T00:00:00+02:00http://slawomir.net/2020/09/21/cover-heirloom<p><a href="/assets/Programista_90.jpg">
<img src="/assets/Programista_90.jpg" style="height: 250px; float: left; margin-right: 1em;" />
</a></p>
<p>Couple of months ago I wrote an article to “Programista” journal (a Polish one) about how DEFLATE
algorithm works under the bonnet. Apart from describing DEFLATE it illustrates clever use of
a <a href="https://github.com/pauldmccarthy/indexed_gzip">indexed_gzip</a> to decompress random part of
gzipped file without decompressing what’s before. I got it to cover and thought it’s a good chance
to put some hidden information over there. I decided to put heirloom for my children. I wonder
what will be their reaction when they’re in their 20’s and somebody tells them :)</p>Couple of months ago I wrote an article to “Programista” journal (a Polish one) about how DEFLATE algorithm works under the bonnet. Apart from describing DEFLATE it illustrates clever use of a indexed_gzip to decompress random part of gzipped file without decompressing what’s before. I got it to cover and thought it’s a good chance to put some hidden information over there. I decided to put heirloom for my children. I wonder what will be their reaction when they’re in their 20’s and somebody tells them :)Automating full-page web screenshots without ads and other crap2020-09-21T00:00:00+02:002020-09-21T00:00:00+02:00http://slawomir.net/2020/09/21/command-line-page-screenshots<p>Have you ever hit a wall with your idea/project? I recall that the other day I heard following words
about my project.</p>
<blockquote>
<p>we won’t invest, because we aren’t sure if that boat will become a ship in a future</p>
</blockquote>
<p>Cruel words, right? Well. Later on it emerged that they were right with their judgement. Not every boat
becomes a ship. Just few do. One of the most exceptional example is the Internet. It was a boat
and it became a ship. Comparing it a ship is actually not fair, but you get it.</p>
<p>When the boat becomes a ship it can accommodate much more people, it requires more power to operate,
it looks much better, it is more robust, it’s more powerful and offers more services, it’s less
maneuverable etc. All of this is true for the Internet too. Target audience is much broader and
therefore the goals become different. Revenue streams are different too. Last but not least, technologies
and solutions from the past are not suitable anymore and that can create new technical challenges for
people that work with them.</p>
<p>In this post I want to describe how and why I automated taking full-page screenshots of web pages
without advertisments, GDRP notifications, Cookie/privacy alerts etc. Back then it was pretty easy
thing to do. That, unfortunately, doesn’t hold anymore.</p>
<p><img src="/assets/webscreenshot.png" alt="webscreenshot" /></p>
<hr />
<p>Information in the Internet was always ephemeral. What if you find some information on some web page
and store URL for future? You may go back after a while to discover it’s already gone (404). It may
have been moved somewhere else, or maybe it’s not available anymore. Yes, you can use search engine
to find another source of information about the topic, but only if the URL contains some key words,
or you were precautious enough to copy page title apart from the URL. I created small system to make
full-page screenshots of pages to circumvent this whole problem.</p>
<p>In the past you could simply use <code class="highlighter-rouge">curl</code> to create your own copy of a page, but unfortunately nowadays it
is not that straighforward:</p>
<ul>
<li>lot of pages require JavaScript, and significant part of them require JavaScript to load the content</li>
<li>some pages are protected (e.g. against DDoS) and will check the browser</li>
<li>there is a lot of bloat (GDRP, Cookie and privacy policies, full-page advertisments)</li>
<li>pages are resource-heavy</li>
<li>pages require lot of content that come from 3rd-party services (CDNs etc)</li>
</ul>
<p>You can use service like <a href="https://archive.is">archive.is</a>, but for the sake of the article I’m gonna
assume you want your own local copy. There are many ways how we can make a copy of a website, but
I’m going to focus on the most primitive one: full-page screenshots. I find screenshots easy to
preview and easy to share.</p>
<p>So what are challenges of automating web page screenshots?</p>
<ol>
<li>in order for everything to work right we need underlying browser to render the page for us</li>
<li>we need to take full-page screenshot, so simple screen grabbing won’t work</li>
<li>page contents are gathered asynchronously, so we need to somehow instrument the browser to take
screenshot only after page is fully loaded</li>
<li>we need to hide all GDRP, TOS, Cookie windows before taking a screenshot</li>
<li>since we want automated solution, we look for headless solution (no rinning X’es)</li>
</ol>
<p>Points 1, 2, 3 and 5 are solved by using headless automated browser like PhantomJS. We will use
<a href="https://github.com/maaaaz/webscreenshot">webscreenshot</a> Python package as a wrapper. It contains
convenient script to take full-page screenshot.</p>
<p>To solve point 4, we will use <a href="https://github.com/epitron/mitm-adblock">mitm-adblock</a> that uses
<a href="https://mitmproxy.org/">mitmproxy</a> under the bonnet. Basically it forms a HTTP(S) proxy that will
reject JavaScript scripts according to Adblock rules. These are the same rules that are used by
browser extensions like Adblock, uBlock etc.</p>
<p>Picture above illustrates how the system works. After cloning <code class="highlighter-rouge">mitm-adblock</code> we <code class="highlighter-rouge">cd</code> into its
directory. When running for the first time, we should execute <code class="highlighter-rouge">update-blocklists</code> to update Adblock
rules. Then we execute <code class="highlighter-rouge">go</code> script in background (or foreground in another terminal).</p>
<p>Second step is to pull <code class="highlighter-rouge">webscreenshot</code> and <code class="highlighter-rouge">cd</code> into its directory. Assuming that list of URLs is
prepared and available in file <code class="highlighter-rouge">/tmp/links.txt</code> we do following:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for </span><span class="nb">link </span><span class="k">in</span> <span class="si">$(</span><span class="nb">cat</span> /tmp/links.txt<span class="si">)</span><span class="p">;</span> <span class="k">do </span>python3 webscreenshot.py <span class="nt">-P</span> <span class="s1">'http://localhost:8118'</span> <span class="s2">"</span><span class="nv">$link</span><span class="s2">"</span><span class="p">;</span> <span class="k">done</span>
</code></pre></div></div>
<p>Where <code class="highlighter-rouge">localhost:8118</code> is endpoint of our <code class="highlighter-rouge">mitmproxy</code>. Depending on how many links we have, it may
take some time. When it finishes we should have all of our screenshots available in <code class="highlighter-rouge">screenshots</code>
subdirectory.</p>
<p>And that’s all!</p>
<h3 id="caveats-and-further-steps">Caveats and further steps</h3>
<p>Described solution is definitely not complete, but it was enough for my pet project. Problems that
I’ve encountered include:</p>
<ul>
<li>some pages require logging in. This could be solved in numerous ways: e.g. hooking in into
<code class="highlighter-rouge">webscreenshot.js</code>, custom logic in <code class="highlighter-rouge">mitmproxy</code>, or injecting apropriate cookies.</li>
<li>Adblock rules don’t cover everything. I had to hack <code class="highlighter-rouge">mitm-adblock</code> a little to e.g. block <code class="highlighter-rouge">optad360</code>
and <code class="highlighter-rouge">statsforads</code> sites</li>
<li>some sites like e.g. Twitter don’t work well with the solution. This can be solved by putting
custom code in <code class="highlighter-rouge">webscreenshot.js</code>, though</li>
</ul>
<p>There are also some further enhancements features I see:</p>
<ul>
<li>scrapping page HTML, so one can do a quick <code class="highlighter-rouge">grep</code> to find information</li>
<li>using <em>tesseract</em> or other OCR on page screenshot to extract visible text instead of pure HTML</li>
<li>crop page screenshots to exclude meaningless space, to save space (my screenshot dir is about 700MB)</li>
</ul>Have you ever hit a wall with your idea/project? I recall that the other day I heard following words about my project. we won’t invest, because we aren’t sure if that boat will become a ship in a future Cruel words, right? Well. Later on it emerged that they were right with their judgement. Not every boat becomes a ship. Just few do. One of the most exceptional example is the Internet. It was a boat and it became a ship. Comparing it a ship is actually not fair, but you get it. When the boat becomes a ship it can accommodate much more people, it requires more power to operate, it looks much better, it is more robust, it’s more powerful and offers more services, it’s less maneuverable etc. All of this is true for the Internet too. Target audience is much broader and therefore the goals become different. Revenue streams are different too. Last but not least, technologies and solutions from the past are not suitable anymore and that can create new technical challenges for people that work with them. In this post I want to describe how and why I automated taking full-page screenshots of web pages without advertisments, GDRP notifications, Cookie/privacy alerts etc. Back then it was pretty easy thing to do. That, unfortunately, doesn’t hold anymore.babla: command line translation tool (Polish-English)2020-09-21T00:00:00+02:002020-09-21T00:00:00+02:00http://slawomir.net/2020/09/21/babla<p>This is gonna be pretty short. Some time ago I created small script that uses <a href="https://bab.la">bab.la</a>
web service to translate words between Polish and English. Some people from my team are already using
it and found it to be convenient.</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>pip3 <span class="nb">install </span>babla
<span class="nv">$ </span>babla sumienny
conscientious
dutiful
assiduous
faithful
</code></pre></div></div>This is gonna be pretty short. Some time ago I created small script that uses bab.la web service to translate words between Polish and English. Some people from my team are already using it and found it to be convenient. $ pip3 install babla $ babla sumienny conscientious dutiful assiduous faithfulAutomating vimdiff’s HTML diff (TOhtml)2020-08-20T00:00:00+02:002020-08-20T00:00:00+02:00http://slawomir.net/2020/08/20/automating-vimdiff-tohtml<p>Proof of concepts are often done pretty differently than other kind of work. The rules described in
(in)famous <a href="https://gist.github.com/rondy/af1dee1d28c02e9a225ae55da2674a6f">“Effective engineer”</a> are even more important to succeed with PoCs. At the end of a day
it all gravitates around “value added divided by effort spent”.</p>
<p>I was doing some quick PoC project that lets user paste log fragment and see what are other log
fragments in the database that are most similar to the pasted one. But apart from showing the list
user wants to inspect the differences too.</p>
<p>Simplest solution is to generate diff in e.g. HTML and present it to the user. But what is the best
way to achieve that quickly? By using existing tool!</p>
<p>This post is about automating <code class="highlighter-rouge">vimdiff</code>’s <code class="highlighter-rouge">TOhtml</code> using <code class="highlighter-rouge">headlessvim</code> library. Apart from
presenting actual solution I cover some of the details on how Vim loads configuration files, how can
we pass commands to it, etc.</p>
<p><img src="/assets/headlessvim.png" alt="vimdiff in action" /></p>
<hr />
<p>Problem that my PoC is solving is Nokia-specific. No worries, though. We can use Linux kernel
logs to explain what is going on.</p>
<p>Imagine there’s a bug in the kernel you’re using. You inspect <code class="highlighter-rouge">dmesg</code> and see some multi-line crash.
Now, to make it hard, there’s no Google/DuckDuckGo. But you have access to gazillions of logs from
executions from other machines with a description of the solution attached. What if you could search
for similar crashes in these logs and check if the root cause is the same?</p>
<p>Describing my algorithm that finds most similar log fragments is out of scope of this post. I’m just
gonna describe the last step done by the user: inspecting the diff between queried log fragment and
the match that has been found.</p>
<h2 id="subprocess">Subprocess</h2>
<p>Standard way to compare two files in <code class="highlighter-rouge">vimdiff</code> is:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>vimdiff fileA fileB
</code></pre></div></div>
<p>and that is equivalent to:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>vim <span class="nt">-d</span> fileA fileB
</code></pre></div></div>
<p>Now, the problem is that we need to somehow instruct vim to automatically execute some commands
right after loading files. Fortunately there’s <code class="highlighter-rouge">-c</code> switch:</p>
<blockquote>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code> -c {command}
{command} will be executed after the first file has been read. {command} is interpreted as an Ex
command. If the {command} contains spaces it must be enclosed in double quotes (this depends on
the shell that is used). Example: Vim "+set si" main.c
Note: You can use up to 10 "+" or "-c" commands.
</code></pre></div> </div>
</blockquote>
<p>So it becomes:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>vim <span class="nt">-d</span> fileA fileB <span class="nt">-c</span> <span class="s1">'TOhtml'</span> <span class="nt">-c</span> <span class="s1">'sav! output.html'</span> <span class="nt">-c</span> <span class="s1">'qall!'</span>
</code></pre></div></div>
<p>We also want for it to work on “vanilla vim”, so without plugins and rc files. Additionally it would
be bad if the invocation had side effects on the disk. We achieve these goals by adding
<code class="highlighter-rouge">-i NONE -n -N -u vimrc</code>, where:</p>
<ul>
<li><code class="highlighter-rouge">-i NONE</code> - disables writing viminfo file</li>
<li><code class="highlighter-rouge">-n</code> - don’t use swap files. Thanks to this we can execute parallel <code class="highlighter-rouge">vim</code> processes without risk
of seeing the recovery message</li>
<li><code class="highlighter-rouge">-N</code> - disables <code class="highlighter-rouge">vi</code> compatibility</li>
<li><code class="highlighter-rouge">-u vimrc</code> - instructs <code class="highlighter-rouge">vim</code> to load configuration from the <code class="highlighter-rouge">vimrc</code> file in current work dir</li>
</ul>
<p>We need to create <code class="highlighter-rouge">vimrc</code> file for this to work. If we used <code class="highlighter-rouge">NONE</code> as an argument to <code class="highlighter-rouge">-u</code> then
<code class="highlighter-rouge">TOhtml</code> command wouldn’t work. What we need to do instead is to mimic how vim is initialized in
our distro.</p>
<p>I’m working on Debian, so my vimrc file consists of just single line:</p>
<div class="language-vim highlighter-rouge"><div class="highlight"><pre class="highlight"><code>runtime<span class="p">!</span> debian<span class="p">.</span><span class="k">vim</span>
</code></pre></div></div>
<p>This <code class="highlighter-rouge">debian.vim</code> file comes from the runtime path which is set to <code class="highlighter-rouge">/usr/share/vim/vim82/</code> on my
box. I’m not a magician. I’ve copied that line from <code class="highlighter-rouge">/etc/vim/vimrc</code>.</p>
<p>The final version is:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>vim <span class="nt">-i</span> NONE <span class="nt">-n</span> <span class="nt">-N</span> <span class="nt">-u</span> vimrc <span class="nt">-d</span> fileA fileB <span class="nt">-c</span> <span class="s1">'TOhtml'</span> <span class="nt">-c</span> <span class="s1">'sav! output.html'</span> <span class="nt">-c</span> <span class="s1">'qall!'</span>
</code></pre></div></div>
<p>And voille-a! You can now open <code class="highlighter-rouge">output.html</code> file and see what was generated. Upon running the
command you’ll see <code class="highlighter-rouge">vimdiff</code> interface for a fraction of second.</p>
<p>Putting this into Python script is pretty straighforward:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">subprocess</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">Popen</span><span class="p">([</span>
<span class="s">"/usr/bin/vim"</span><span class="p">,</span> <span class="s">"-i"</span><span class="p">,</span> <span class="s">"NONE"</span><span class="p">,</span> <span class="s">"-n"</span><span class="p">,</span> <span class="s">"-N"</span><span class="p">,</span> <span class="s">"-u"</span><span class="p">,</span> <span class="s">"vimrc"</span><span class="p">,</span> <span class="s">"-d"</span><span class="p">,</span> <span class="s">"fileA"</span><span class="p">,</span> <span class="s">"fileB"</span><span class="p">,</span> <span class="s">"-c"</span><span class="p">,</span> <span class="s">"TOhtml"</span><span class="p">,</span>
<span class="s">"-c"</span><span class="p">,</span> <span class="s">"sav! output.html"</span><span class="p">,</span> <span class="s">"-c"</span><span class="p">,</span> <span class="s">"qall!"</span>
<span class="p">])</span>
<span class="n">p</span><span class="o">.</span><span class="n">communicate</span><span class="p">()</span>
</code></pre></div></div>
<p>But this solution is bad, because:</p>
<ul>
<li>it’s fragile as hell</li>
<li>it explicitly uses <code class="highlighter-rouge">subprocess</code> so it literally asks for some thin wrapper</li>
<li>it requires TTY and takes over it. Watching server logs looks really funny :)</li>
</ul>
<h2 id="headlessvim">headlessvim</h2>
<p>Fortunately there’s a library that uses <code class="highlighter-rouge">pyte</code> (fake terminal implemented in Python) and wraps
the Vim process into handy API. Just look at how better the code is when we use that library:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">headlessvim</span>
<span class="k">with</span> <span class="n">headlessvim</span><span class="o">.</span><span class="nb">open</span><span class="p">(</span><span class="n">args</span><span class="o">=</span><span class="n">f</span><span class="s">"-N -i NONE -n -u ./vimrc -d {fileA_path} {fileB_path}"</span><span class="p">)</span> <span class="k">as</span> <span class="n">vim</span><span class="p">:</span>
<span class="n">vim</span><span class="o">.</span><span class="n">command</span><span class="p">(</span><span class="s">"TOhtml"</span><span class="p">)</span>
<span class="n">vim</span><span class="o">.</span><span class="n">command</span><span class="p">(</span><span class="n">f</span><span class="s">"sav! {output_path}"</span><span class="p">)</span>
</code></pre></div></div>
<p>The library was initially developed to help write unit tests for Vim plugins, but can be used in
many more scenarios, including ours.</p>Proof of concepts are often done pretty differently than other kind of work. The rules described in (in)famous “Effective engineer” are even more important to succeed with PoCs. At the end of a day it all gravitates around “value added divided by effort spent”. I was doing some quick PoC project that lets user paste log fragment and see what are other log fragments in the database that are most similar to the pasted one. But apart from showing the list user wants to inspect the differences too. Simplest solution is to generate diff in e.g. HTML and present it to the user. But what is the best way to achieve that quickly? By using existing tool! This post is about automating vimdiff’s TOhtml using headlessvim library. Apart from presenting actual solution I cover some of the details on how Vim loads configuration files, how can we pass commands to it, etc.Accessing globals after wrong code.interact() call2020-06-26T00:00:00+02:002020-06-26T00:00:00+02:00http://slawomir.net/2020/06/26/python-interact-no-globals<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">foo</span><span class="p">():</span>
<span class="k">global</span> <span class="n">database_index</span>
<span class="n">bar</span><span class="p">()</span>
<span class="kn">import</span> <span class="nn">code</span><span class="p">;</span> <span class="n">code</span><span class="o">.</span><span class="n">interact</span><span class="p">(</span><span class="n">local</span><span class="o">=</span><span class="nb">locals</span><span class="p">())</span></code></pre></figure>
<p>Have you ever called <code class="highlighter-rouge">code.interact()</code> and forgot to pass <code class="highlighter-rouge">local=locals()</code> or
<code class="highlighter-rouge">local={**globals(), **locals()}</code>. Most of the time you may just exit interactive console, add
missing parameters and run program again. But what if the program was executing for couple of hours
before interactive console was started? You might want to access e.g. global variables without
running it again. Fortunately Python is a language for adults, so it’s totally doable.</p>
<hr />
<p>The first option to access all variables etc is to use <code class="highlighter-rouge">sys._getframe</code>:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="kn">import</span> <span class="nn">sys</span>
<span class="o">>>></span> <span class="n">frame</span> <span class="o">=</span> <span class="n">sys</span><span class="o">.</span><span class="n">_getframe</span><span class="p">(</span><span class="mi">6</span><span class="p">)</span> <span class="c1"># in my setup it happens to be 6th frame
</span><span class="o">>>></span> <span class="n">database_index</span> <span class="o">=</span> <span class="n">frame</span><span class="o">.</span><span class="n">f_globals</span><span class="p">[</span><span class="s">'dataframe_index'</span><span class="p">]</span>
</code></pre></div></div>
<p>But in the help of <code class="highlighter-rouge">_getframe</code> we can read:</p>
<blockquote>
<p>This function should be used for internal and specialized purposes only.</p>
</blockquote>
<p>Fortunately there’s another module that also does what we want: <code class="highlighter-rouge">inspect</code>. It’s a little bit
more verbose, but is not private.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="kn">import</span> <span class="nn">inspect</span>
<span class="o">>>></span> <span class="n">database_index</span> <span class="o">=</span> <span class="n">inspect</span><span class="o">.</span><span class="n">stack</span><span class="p">()[</span><span class="mi">6</span><span class="p">]</span><span class="o">.</span><span class="n">frame</span><span class="o">.</span><span class="n">f_globals</span><span class="p">[</span><span class="s">'database_index'</span><span class="p">]</span>
</code></pre></div></div>
<p>But frankly speaking if you look at <code class="highlighter-rouge">inspect</code>’s code you discover that it uses <code class="highlighter-rouge">sys</code> under the
bonnet. So my suggestion is to:</p>
<ul>
<li>use <code class="highlighter-rouge">sys._getframe</code> in emergency situations (like in interactive console)</li>
<li>use <code class="highlighter-rouge">inspect</code> module if you are doing this in a script</li>
</ul>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">stack</span><span class="p">(</span><span class="n">context</span><span class="o">=</span><span class="mi">1</span><span class="p">):</span>
<span class="s">"""Return a list of records for the stack above the caller's frame."""</span>
<span class="k">return</span> <span class="n">getouterframes</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">_getframe</span><span class="p">(</span><span class="mi">1</span><span class="p">),</span> <span class="n">context</span><span class="p">)</span></code></pre></figure>
<p>Obviously instead of <code class="highlighter-rouge">code.interact</code> an alternative can be used: <code class="highlighter-rouge">pdb.set_trace</code>. It doesn’t
suffer such problems at all.</p>
<h3 id="bonus-python-frames-and-surprising-setter-of-f_lineno">Bonus: python frames and surprising setter of <code class="highlighter-rouge">f_lineno</code></h3>
<p>Out of curiosity I looked into <em>CPython</em> sources. It looks like <code class="highlighter-rouge">_getframe</code> function does simple O(n)
stack traversal. It retrieves frames from current thread state.</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">static</span> <span class="n">PyObject</span> <span class="o">*</span> <span class="nf">sys__getframe_impl</span><span class="p">(</span><span class="n">PyObject</span> <span class="o">*</span><span class="n">module</span><span class="p">,</span> <span class="kt">int</span> <span class="n">depth</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">PyFrameObject</span> <span class="o">*</span><span class="n">f</span> <span class="o">=</span> <span class="n">_PyThreadState_GET</span><span class="p">()</span><span class="o">-></span><span class="n">frame</span><span class="p">;</span>
<span class="k">while</span> <span class="p">(</span><span class="n">depth</span> <span class="o">></span> <span class="mi">0</span> <span class="o">&&</span> <span class="n">f</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
<span class="n">f</span> <span class="o">=</span> <span class="n">f</span><span class="o">-></span><span class="n">f_back</span><span class="p">;</span>
<span class="o">--</span><span class="n">depth</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">...</span>
<span class="n">Py_INCREF</span><span class="p">(</span><span class="n">f</span><span class="p">);</span>
<span class="k">return</span> <span class="p">(</span><span class="n">PyObject</span><span class="o">*</span><span class="p">)</span><span class="n">f</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p><code class="highlighter-rouge">_PyThreadState_GET</code> as name suggests returns thread state object, where <code class="highlighter-rouge">frame</code> is one of the most
important fields. Quick look at the definition of frame struct reveals what potentially can be done
with it: <code class="highlighter-rouge">f_back</code>, <code class="highlighter-rouge">f_code</code>, <code class="highlighter-rouge">f_globals</code>, <code class="highlighter-rouge">f_locals</code>, <code class="highlighter-rouge">f_lineno</code> etc. My inner hacker woke up and
I tried to change <code class="highlighter-rouge">f_lineno</code> of a frame to see what happens:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="o">>>></span> <span class="n">sys</span><span class="o">.</span><span class="n">_getframe</span><span class="p">()</span><span class="o">.</span><span class="n">f_lineno</span> <span class="o">=</span> <span class="mi">1</span>
<span class="nb">ValueError</span><span class="p">:</span> <span class="n">f_lineno</span> <span class="n">can</span> <span class="n">only</span> <span class="n">be</span> <span class="nb">set</span> <span class="n">by</span> <span class="n">a</span> <span class="n">trace</span> <span class="n">function</span></code></pre></figure>
<p>This error is baked right into <em>CPython</em>! Apparently <code class="highlighter-rouge">frame_setlineno</code> function bails out when
the caller is not a trace function. From the docs of the function we can also learn that <code class="highlighter-rouge">f_lineno</code>
is used by tracking mechanism. It also describes some exceptions where you cannot jump:</p>
<ul>
<li>Lines with an ‘except’ statement on them can’t be jumped to, because
they expect an exception to be on the top of the stack.</li>
<li>Lines that live in a ‘finally’ block can’t be jumped from or to, since
the END_FINALLY expects to clean up the stack after the ‘try’ block.</li>
<li>‘try’, ‘with’ and ‘async with’ blocks can’t be jumped into because
the blockstack needs to be set up before their code runs.</li>
<li>‘for’ and ‘async for’ loops can’t be jumped into because the
iterator needs to be on the stack.</li>
<li>Jumps cannot be made from within a trace function invoked with a
‘return’ or ‘exception’ event since the eval loop has been exited at
that time.</li>
</ul>
<p>I can only say that whatever detail you pick it becomes a rabbit hole. This is so beautiful.</p>
<p align="center">
<img src="/assets/recursion.gif" />
</p>def foo(): global database_index bar() import code; code.interact(local=locals()) Have you ever called code.interact() and forgot to pass local=locals() or local={**globals(), **locals()}. Most of the time you may just exit interactive console, add missing parameters and run program again. But what if the program was executing for couple of hours before interactive console was started? You might want to access e.g. global variables without running it again. Fortunately Python is a language for adults, so it’s totally doable.tee equivalent as a Python class2020-06-25T00:00:00+02:002020-06-25T00:00:00+02:00http://slawomir.net/2020/06/25/python-tee<p>Do you know <code class="highlighter-rouge">tee</code> program? Its <code class="highlighter-rouge">man</code> page reads:</p>
<blockquote>
<p>tee - read from standard input and write to <strong>standard output and files</strong></p>
</blockquote>
<p>It makes it easy to split output of one program into both <em>stdout</em> and files. It’s a nice UNIX
tool. Recently I was doing code review and it turned out that equivalent of such thing may be pretty
useful in Python programs too:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s">"file1.txt"</span><span class="p">)</span> <span class="k">as</span> <span class="n">f1</span><span class="p">,</span> <span class="n">tee</span><span class="p">(</span><span class="nb">open</span><span class="p">(</span><span class="s">"file2.txt"</span><span class="p">))</span> <span class="k">as</span> <span class="n">f2</span><span class="p">:</span>
<span class="n">shutil</span><span class="o">.</span><span class="n">copyfileobj</span><span class="p">(</span><span class="n">f1</span><span class="p">,</span> <span class="n">f2</span><span class="p">)</span>
<span class="k">if</span> <span class="n">f2</span><span class="o">.</span><span class="n">tail</span> <span class="ow">not</span> <span class="ow">in</span> <span class="p">(</span><span class="s">'</span><span class="se">\r</span><span class="s">'</span><span class="p">,</span> <span class="s">'</span><span class="se">\n</span><span class="s">'</span><span class="p">):</span>
<span class="n">f2</span><span class="o">.</span><span class="n">fileobj</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="s">'</span><span class="se">\n</span><span class="s">'</span><span class="p">)</span></code></pre></figure>
<p>It allows to do extra work, so we can employ it to e.g. simultaneous hash calculation or other job.</p>
<hr />
<p>I came up with this idea whilst reviewing some code. I saw following function (anonymized).</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"> <span class="k">def</span> <span class="nf">_some_private_method</span><span class="p">(</span><span class="n">cls</span><span class="p">,</span> <span class="n">paths</span><span class="p">:</span> <span class="n">Iterable</span><span class="p">[</span><span class="nb">str</span><span class="p">]):</span>
<span class="n">special_paths</span> <span class="o">=</span> <span class="nb">filter</span><span class="p">(</span><span class="n">is_special_path</span><span class="p">,</span> <span class="n">paths</span><span class="p">)</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">FILEPATH</span><span class="p">,</span> <span class="s">"wb"</span><span class="p">)</span> <span class="k">as</span> <span class="n">out_file</span><span class="p">:</span>
<span class="k">for</span> <span class="n">path</span> <span class="ow">in</span> <span class="n">special_paths</span><span class="p">:</span>
<span class="n">LOGGER</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="n">f</span><span class="s">"Adding {path}"</span><span class="p">)</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="s">"rb"</span><span class="p">)</span> <span class="k">as</span> <span class="n">additional_file</span><span class="p">:</span>
<span class="n">shutil</span><span class="o">.</span><span class="n">copyfileobj</span><span class="p">(</span><span class="n">additional_file</span><span class="p">,</span> <span class="n">out_file</span><span class="p">)</span>
<span class="n">additional_file</span><span class="o">.</span><span class="n">seek</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
<span class="n">last_byte</span> <span class="o">=</span> <span class="n">additional_file</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
<span class="k">if</span> <span class="n">last_byte</span> <span class="o">!=</span> <span class="n">b</span><span class="s">"</span><span class="se">\n</span><span class="s">"</span> <span class="ow">and</span> <span class="n">last_byte</span> <span class="o">!=</span> <span class="n">b</span><span class="s">"</span><span class="se">\r</span><span class="s">"</span><span class="p">:</span>
<span class="n">out_file</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">b</span><span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">)</span></code></pre></figure>
<p>I don’t like such code. The <code class="highlighter-rouge">seek</code> hack is obscure. What can be done to make it better? What if we
simply remembered what was the last byte copied by <code class="highlighter-rouge">shutil.copyfileobj</code>?</p>
<p>Unfortunately, <code class="highlighter-rouge">copyfileobj</code> accepts only two <em>fileobj</em>s and buffer size. Recently I was experimenting
with <code class="highlighter-rouge">indexed_gzip</code> and I had to roll out my own copy of <code class="highlighter-rouge">copyfileobj</code> that apart from copying the
data was also calculating md5 hash and number of bytes copied.</p>
<p>An alternative is to wrap one of the arguments with something that will do whatever we want. Let’s
focus on the problem at hand: adding newline if necessary.</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="o">@</span><span class="n">dataclass</span>
<span class="k">class</span> <span class="nc">MyTee</span><span class="p">:</span>
<span class="n">fileobj</span><span class="p">:</span> <span class="n">io</span><span class="o">.</span><span class="n">BufferedReader</span>
<span class="n">tail</span><span class="p">:</span> <span class="nb">bytes</span> <span class="o">=</span> <span class="n">field</span><span class="p">(</span><span class="n">init</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">write</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">tail</span> <span class="o">=</span> <span class="n">data</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">:]</span> <span class="c1"># without a colon it would not become bytes()
</span> <span class="bp">self</span><span class="o">.</span><span class="n">fileobj</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">data</span><span class="p">)</span></code></pre></figure>
<p>If we need to remember k last characters, we can simply use <code class="highlighter-rouge">collections.deque</code> as <code class="highlighter-rouge">tail</code> and it will
work as a circular buffer.</p>
<p>In order to make it look like in the first listing we need to add trivial context manager:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="o">@</span><span class="n">contextlib</span><span class="o">.</span><span class="n">contextmanager</span>
<span class="k">def</span> <span class="nf">tee</span><span class="p">(</span><span class="n">fileobj</span><span class="p">):</span>
<span class="k">yield</span> <span class="n">MyTee</span><span class="p">(</span><span class="n">fileobj</span><span class="p">)</span></code></pre></figure>
<p>And voille-a!</p>
<p>This mechanism can be further improved to be more flexible etc.</p>Do you know tee program? Its man page reads: tee - read from standard input and write to standard output and files It makes it easy to split output of one program into both stdout and files. It’s a nice UNIX tool. Recently I was doing code review and it turned out that equivalent of such thing may be pretty useful in Python programs too: with open("file1.txt") as f1, tee(open("file2.txt")) as f2: shutil.copyfileobj(f1, f2) if f2.tail not in ('\r', '\n'): f2.fileobj.write('\n') It allows to do extra work, so we can employ it to e.g. simultaneous hash calculation or other job.Roccat Suora driver for Linux > 4.11.02020-06-24T00:00:00+02:002020-06-24T00:00:00+02:00http://slawomir.net/2020/06/24/roccat-suora-linux<p>Recently I decided it’s time to install dedicated driver for my keyboard to programatically control
its LED behavior. I have Roccat Suora keyboard. Fortunately all of the code is already available
<a href="https://sourceforge.net/projects/roccat/files/">here</a>. However the kernel module failed to compile
because of <code class="highlighter-rouge">signal_pending</code> being undeclared. I had to add following code in <code class="highlighter-rouge">hid-roccat.c</code> and
it worked like a charm.</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="cp">#include <linux/version.h>
#if (LINUX_VERSION_CODE >= KERNEL_VERSION(4, 11, 0))
</span> <span class="cp">#include <linux/sched/signal.h>
#endif</span></code></pre></figure>
<p><code class="highlighter-rouge">signal_pending</code> was moved to <code class="highlighter-rouge">linux/sched/signal.h</code></p>
<hr />Recently I decided it’s time to install dedicated driver for my keyboard to programatically control its LED behavior. I have Roccat Suora keyboard. Fortunately all of the code is already available here. However the kernel module failed to compile because of signal_pending being undeclared. I had to add following code in hid-roccat.c and it worked like a charm. #include <linux/version.h> #if (LINUX_VERSION_CODE >= KERNEL_VERSION(4, 11, 0)) #include <linux/sched/signal.h> #endif signal_pending was moved to linux/sched/signal.hExplanation of C++ expression on Code::Dive T-Shirts2019-11-29T00:00:00+01:002019-11-29T00:00:00+01:00http://slawomir.net/2019/11/29/cpp-code-dive-t-shirts-expression<p><img src="/assets/code-dive-2019-t-shirt.jpg" alt="Code::Dive 2019 T-Shirt" /></p>
<p>This year Code::Dive conference was held in Wrocław for the sixth time. It is amazing how all of this
has unfolded, especially given the fact that I was involved in it from its very beginning. In recent
two years I had too few time and ideas to give talks, but I managed to make small contribution. The
task was to prepare some eye-catching slogan or something similar to put on T-Shirts for conference
attendants.</p>
<p>C++ roots are common for me and the conference, so I suggested we should put some fancy C++
expression on the T-Shirts:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(+[](){})();
</code></pre></div></div>
<p>For non-C++ guys it might look like a sequence of random characters, but it is perfectly valid
C++ expression. For C++ programmers it isn’t even odd except the plus sign. Lot of attendees were
questioning this syntax, but they were coming back saying that “indeed it compiles! What the hell
is that?”.</p>
<p>This is what makes it somewhat special - single annoying character that makes you’re not able
to explain what’s going on. Frankly speaking this isn’t something I came up with out of blue.
I saw it many years ago, it caught my attention and I recalled it when I was thinking about
the T-Shirt.</p>
<p>For the sake of this post let’s see how can we decompose the expression to make it much simpler
and what the + sign actually does.</p>
<p>In the middle of the expression we have anonymous function, a.k.a lambda. C++ syntax feels like a
compromise between readability and expressiveness.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[] () {}
</code></pre></div></div>
<p>In order to implement anonymous function programmer has to specify three things. This is C++ trivia.</p>
<ul>
<li><code class="highlighter-rouge">[]</code> - what variables from the outer scope the function will capture (capture list)</li>
<li><code class="highlighter-rouge">()</code> - function parameters</li>
<li><code class="highlighter-rouge">{}</code> - function body</li>
</ul>
<p>In our case we’re not capturing any variables, we’re not getting any parameters and we’re doing
nothing, hence it becomes three empty blocks - <code class="highlighter-rouge">[](){}</code>.</p>
<p>Now we know what the core of the expression does. Let us use lambda symbol to denote it and see
how it simplifies overall expression:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(+λ)();
</code></pre></div></div>
<p>We’re left with the plus sign, parentheses and a semicolon - something we shouldn’t even consider.
Let’s explain the plus sign, or unary <code class="highlighter-rouge">operator+</code> to be precise. It isn’t simple without referring
to the standard. Good explanation was given in <a href="https://stackoverflow.com/questions/17822131/resolving-ambiguous-overload-on-function-pointer-and-stdfunction-for-a-lambda">this SO question</a>.</p>
<p>Generally speaking evaluation of a lambda function depends on whether it captures or not. In our
case it doesn’t capture. Let’s see what the standard has to say about that:</p>
<blockquote>
<p>The closure type for a lambda-expression with no lambda-capture has a public non-virtual non-explicit const conversion function to pointer to function having the same parameter and return types as the closure type’s function call operator. The value returned by this conversion function shall be the address of a function that, when invoked, has the same effect as invoking the closure type’s function call operator.</p>
</blockquote>
<p>So when a compiler encounters lambda with empty capture list it will create a class that has that
specific conversion operator. In other words there exists a method that converts such a lambda
to a pointer to function.</p>
<p>Now let’s see how unary <code class="highlighter-rouge">operator+</code> works:</p>
<blockquote>
<p>The operand of the unary + operator shall have arithmetic, unscoped enumeration, or pointer type and the result is the value of the argument.</p>
</blockquote>
<p>Maybe the wording is unintuitive but it’s all about:</p>
<pre><code class="language-C++">+1 == 1;
+var == var;
+ptr == ptr;
</code></pre>
<p>Hopefully now it’s clear :). For the record I was going ever further on making the expression more
cryptic - e.g. <code class="highlighter-rouge">((void)(([](auto)->void{})(+[](){})));</code>, but… as it is stated in famous Python’s
Zen: Simple is better than complex.</p>
<p>This may come as a surprise to some of you, but C++ is not even close to other languages when it
comes to syntax oddities. Please take a look at Malbolge:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code> (=<`#9]~6ZY32Vx/4Rs+0No-&Jk)"Fh}|Bcy?`=*z]Kw%oG4UUS0/@-ejc(:'8dc
</code></pre></div></div>savis - visualize SQLAlchemy models without fuss2019-11-09T00:00:00+01:002019-11-09T00:00:00+01:00http://slawomir.net/2019/11/09/savis<p><img src="/assets/savis-erd.png" alt="Simple ERD" /></p>
<p>Some time ago I realized that our project is a victim of NoSQL hype (hey! <a href="https://en.wikipedia.org/wiki/Hype_cycle">hype cycle</a>).
It was actually my fault when I introduced it. There was specific motivation behind
that decision, but that’s something I would like to keep for separate post.</p>
<p>Couple of days ago I started to work on a plan to migrate to SQL. I extracted all of
the keys and respective schemas from our NoSQL store and started doing ML. By ML I don’t
mean Machine Learning, but Manual Labor :-).</p>
<p>Our project grew surprisingly big in terms of number of keys and relations between them.
One option to proceed would be to rewrite everything from scratch. But I didn’t want to
do so. Based on my experience such big rewrites almost always backfire. I was looking
how to split the keyspace so we can proceed in more iterative way. I was experimenting
a lot and what I missed was an easy way to write models and see ERD (entity relationship
diagram).</p>
<p>Ideally what I was looking for would:</p>
<ul>
<li>let me control the data (yes, online ERD tools, I’m looking at you)</li>
<li>let me write model/entity definitions only once</li>
<li>let me create ER diagram without running database</li>
<li>let me test these models in action without any modifications</li>
</ul>
<p>All online and offline tools didn’t match my requirements. The only project I’ve found
was <a href="https://github.com/Alexis-benoist/eralchemy">eralchemy</a>, which is great, but you either have to run database, or write models
in a custom markdown format.</p>
<p>Maybe there is something I couln’t find in the internet, who knows. But at that time
I decided to write small pet project that will satisfy me. It’s called <code class="highlighter-rouge">savis</code> -
SqlAlchemy VISualizer. The tool is <a href="https://github.com/szborows/savis">available at GitHub</a>. Rest of the post contains
details how it was built.</p>
<p>Eralchemy is definitely a great tool. It’s close to what I wanted but in order to
obtain ER diagram you need to</p>
<ol style="list-style-type: lower-alpha;">
<li>write your models, run migrations, extract schema</li>
<li>create a copy of your models in markdown notation</li>
</ol>
<p>We don’t want to run database, because we might want to change our models quite rapidly
and see the impact immediately. This leaves us with option b only. But how do we maintain
only one definition of our models? It’s simple: we write a program that
reads Python source files, extracts all of the models and prints them out in target
format.</p>
<p>Although it could be done by interpreting Python files just as they were test files,
but this wouldn’t be bullet proof. But Hey! Python’s motto is “Batteries included”!
It comes with a library we can use to do this the right way - ast. We’re gonna use
parser to convert textual file into Abstract Syntax Tree. Then it’s all about using
it to find all of the classes, filtering out those which aren’t models, extracting
class members and producing final output. Let’s see how do we do all of this.</p>
<p>Firstly we need to look for possible files. We can use <code class="highlighter-rouge">glob</code> library to do this:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">file_</span> <span class="ow">in</span> <span class="n">input_dir</span><span class="o">.</span><span class="n">glob</span><span class="p">(</span><span class="s">'**/*.py'</span><span class="p">):</span>
<span class="n">process_file</span><span class="p">(</span><span class="n">file_</span><span class="p">)</span>
</code></pre></div></div>
<p>We’re using construct called <a href="http://www.tldp.org/LDP/abs/html/globbingref.html">glob</a>.
It looks like a regex, but it isn’t one. Whenever you use bash you can pass such a
glob expression - e.g. in order to recursively find files with <code class="highlighter-rouge">.py</code> extension you
can use following spell. I highly recommend to read linked documentation as even
seasoned software engineers aren’t aware of certain features. See <code class="highlighter-rouge">man glob</code> for
further details.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">ls</span> <span class="k">**</span>/<span class="k">*</span>.py
</code></pre></div></div>
<p>Neat thing, isn’t it?
Python’s <code class="highlighter-rouge">glob</code> library does the same. Speaking about paths… I recommend <code class="highlighter-rouge">pathlib</code>
library which is used to work with paths like a boss. Representing paths with strings
can be cumbersome. Maybe in our case it would be like using a cannonball against a fly,
but knowing/using more libraries won’t hurt.</p>
<p>Once we have a path to file that can contain model definitions we should look for
them! A model is a Python class that has extra field: <code class="highlighter-rouge">__tablename__</code>. We will make
use of this requirement.</p>
<p>But how do we convert a file into something we can work on? How do we use <code class="highlighter-rouge">ast</code> library?
It’s pretty simple as it turns out:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">file_</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">root</span> <span class="o">=</span> <span class="n">ast</span><span class="o">.</span><span class="n">parse</span><span class="p">(</span><span class="n">f</span><span class="o">.</span><span class="n">read</span><span class="p">(),</span> <span class="n">file_</span><span class="p">)</span>
</code></pre></div></div>
<p><code class="highlighter-rouge">ast</code> library parses the file and returns tree root. From now on we can continue
working on that tree. We have to recursively traverse it to find classes that
are indeed SQLAlchemy models. At some point we’ll want to iterate over models
to generate their representation. <code class="highlighter-rouge">for model in get_models(tree)</code> looks like
pythonic way, so our implementation should be a generator.</p>
<p>All nodes in the tree that aren’t classes should be omitted. Since each node
is of specific type we can filter nodes using <code class="highlighter-rouge">type</code> call. If the node isn’t
of <code class="highlighter-rouge">ast.ClassDef</code> type we should recurse, because there still might be class
definitions deeper. Please consider following example. This is, BTW, good example
why doing a grep-like processing is bad idea.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">foo</span><span class="p">(</span><span class="n">var</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
<span class="k">if</span> <span class="n">var</span><span class="p">:</span>
<span class="k">class</span> <span class="nc">Bar</span><span class="p">:</span>
<span class="k">pass</span>
<span class="k">return</span> <span class="n">Bar</span><span class="p">()</span>
</code></pre></div></div>
<p>Second ingredient is <code class="highlighter-rouge">__tablename__</code> class member. If it’s present in the class
definition, then we’re talking about SQLAlchemy model. Here’s the code:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">find_models</span><span class="p">(</span><span class="n">node</span><span class="p">):</span>
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="n">ast</span><span class="o">.</span><span class="n">ClassDef</span><span class="p">):</span>
<span class="k">if</span> <span class="s">'__tablename__'</span> <span class="ow">in</span> <span class="n">get_member_names</span><span class="p">(</span><span class="n">node</span><span class="p">):</span>
<span class="k">yield</span> <span class="n">node</span>
<span class="k">if</span> <span class="nb">hasattr</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="s">'body'</span><span class="p">):</span>
<span class="k">for</span> <span class="n">child</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">body</span><span class="p">:</span>
<span class="k">yield</span> <span class="k">from</span> <span class="n">find_models</span><span class="p">(</span><span class="n">child</span><span class="p">)</span>
</code></pre></div></div>
<p>Some of the nodes won’t have <code class="highlighter-rouge">body</code> attribute, so we have to filter out these
too. I believe that the code is straightforward, so let’s continue with how
we can implement <code class="highlighter-rouge">get_member_names</code>. This function will extract names of
all class members.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">get_member_names</span><span class="p">(</span><span class="n">node</span><span class="p">):</span>
<span class="k">if</span> <span class="ow">not</span> <span class="nb">hasattr</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="s">'body'</span><span class="p">):</span>
<span class="k">return</span>
<span class="k">for</span> <span class="n">member</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">body</span><span class="p">:</span>
<span class="k">if</span> <span class="ow">not</span> <span class="nb">hasattr</span><span class="p">(</span><span class="n">member</span><span class="p">,</span> <span class="s">'targets'</span><span class="p">)</span> <span class="ow">or</span> <span class="ow">not</span> <span class="n">member</span><span class="o">.</span><span class="n">targets</span><span class="p">:</span>
<span class="k">continue</span>
<span class="k">for</span> <span class="n">target</span> <span class="ow">in</span> <span class="n">member</span><span class="o">.</span><span class="n">targets</span><span class="p">:</span>
<span class="k">yield</span> <span class="n">target</span><span class="o">.</span><span class="nb">id</span>
</code></pre></div></div>
<p>Again, we’re checking if this is bodyful node. Then we’re iterating over
all childs of that particular node. We’re looking for <code class="highlighter-rouge">ast.Assign</code> members
(e.g. <code class="highlighter-rouge">variable = value</code>) but there’s no need to use <code class="highlighter-rouge">type</code> - we can directly
check if there is <code class="highlighter-rouge">target</code> attribute. Nodes having it are representing
assignments. Target is, simply speaking, a variable to which value will be
written to. We iterate over targets (Python support constructs like
<code class="highlighter-rouge">a, b = a, b</code> where there are more targets than one) and yield them. Voille-a!</p>
<p>Frankly speaking we’re almost done. We need to extract types of the class
members, check out which parameters are passed to them etc. All fields that
represent columns in SQL will be a <code class="highlighter-rouge">sqlalchemy.Column</code> instances. <code class="highlighter-rouge">primary_key</code>
is a keyword used to denote a primary key etc. We just need to take all of
this into account. Full code is available <a href="https://github.com/szborows/savis/blob/master/savis.py">here</a>.</p>
<p>What it gives us finally? A markdown file that eralchemy can understand and
display. It’s not complicated. It’s not complete either. But it works :-).</p>
<p><img src="/assets/savis-erd.png" alt="Simple ERD" /></p>GarageTalks: Taming Kubernetes jobs with Python2019-09-24T00:00:00+02:002019-09-24T00:00:00+02:00http://slawomir.net/2019/09/24/taming-k8s-jobs-with-python<p>Recently Nokia launched fancy new <a href="https://www.meetup.com/pl-PL/GarageTalks-tech-attitudes/events/264547887/">Garage Talks</a> meetup. It gravitates around cloud technologies, development tools, architecture etc. I had a talk last time and it was about Kubernetes jobs and how you can create and controll them using official python client. Slides can be found <a href="https://slawomir.net/p/taming-k8s-jobs-with-python">here</a>.</p>Recently Nokia launched fancy new Garage Talks meetup. It gravitates around cloud technologies, development tools, architecture etc. I had a talk last time and it was about Kubernetes jobs and how you can create and controll them using official python client. Slides can be found here.HexIT Escape Room for IT geeks - escape if you can (Wrocław, Poland)2018-10-03T15:55:00+02:002018-10-03T15:55:00+02:00http://slawomir.net/2018/10/03/hexit-escape-room-escape-if-you-can<a href="https://2.bp.blogspot.com/-cOTA7d1VD3Q/W7TIftmcHlI/AAAAAAAAAhs/W3gwT3IoSiQJl0kyVYWyYPuvc5cWOhzyQCLcBGAs/s1600/hexit1.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="517" data-original-width="775" height="212" src="https://2.bp.blogspot.com/-cOTA7d1VD3Q/W7TIftmcHlI/AAAAAAAAAhs/W3gwT3IoSiQJl0kyVYWyYPuvc5cWOhzyQCLcBGAs/s320/hexit1.jpg" width="320" /></a>I'm glad to announce that we have launched an escape room that targets IT people (developers, testers etc). So far it works well for one month and about 30 teams (3-4 people) have already enjoyed it.<br /><br />Please yourself and pay us a visit! Basing on the reactions of other teams I can guarantee remarkable experience. You don't need to be a "hackerman" to complete the room, but if you are you will do so faster ;-). Teams can be mixed too (but at least one person with basic programming skills is rather required).<br /><br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-VTK_kUVsFaU/W7TIfhRytTI/AAAAAAAAAhw/toWob_ITl0cMB_43zabxLGxYl7-FmS9SgCLcBGAs/s1600/hexit2.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="1440" data-original-width="960" height="320" src="https://1.bp.blogspot.com/-VTK_kUVsFaU/W7TIfhRytTI/AAAAAAAAAhw/toWob_ITl0cMB_43zabxLGxYl7-FmS9SgCLcBGAs/s320/hexit2.jpg" width="213" /></a></div><br /><b>Room location & partnership with Let Me Out.</b><br /><br />ul. Bernardyńska 4 (close to Galeria Dominikańska), IInd floor (map below)<br /><br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><img border="0" data-original-height="658" data-original-width="981" height="214" src="https://2.bp.blogspot.com/-HHPAI8r1-YM/W7TJlUwHgSI/AAAAAAAAAh4/ZSzYJ7OzGKIzMKAD5eLcLBAqpvdsBcyfACLcBGAs/s320/hexit-map.png" style="margin-left: auto; margin-right: auto;" width="320" /></td></tr><tr><td class="tr-caption" style="text-align: center;"><a href="https://www.google.com/maps/place/Bernardy%C5%84ska+4,+11-400+Wroc%C5%82aw,+Polska/@51.1101787,17.0294945,15.5z/data=!4m5!3m4!1s0x470fc277e01358ed:0x494fb3fbd9f272a!8m2!3d51.1104811!4d17.0413637" target="_blank">Link to Google Maps</a></td></tr></tbody></table><div class="separator" style="clear: both; text-align: center;"></div><br /><br /><span style="font-size: large;"><span style="background-color: yellow;">Book here: <a href="http://letmeout.pl/">letmeout.pl</a><span style="background-color: white;"> (select Wrocław)</span></span></span><br /><br />I'm the author and creator of the room but I'm not running the business. The company that operates the room is Let Me Out and has excellent portfolio of other escape rooms in lot of Polish cities and Brussels.<br /><br /><b>Room theme</b><br />It goes like this: <br /><i>Another country is trying to become an atomic superpower through the development of nuclear weapons, which consequently results in the destabilization of the region and the escalation of the international conflict on an unprecedented scale. The world is on the verge of the outbreak of World War III. The only salvation is to infect the secret plantation of uranium treatment with a computer virus. Will a group of programmers be able to prevent nuclear war in 90 minutes?</i> <br /><br />I received some suggestions that the room itself should be marketed as "an ordinary escape room with extra IT riddles". And this is actually what I wanted to build. Not to give a desk, PC and Jira for the players, but give them a nice mix of good background story with many different IT riddles.<br />Solve the riddles while saving the world! :-)<br /><br /><b>Easter Egg</b><br />In the room there are some easter eggs. One of them will let you listen to some famous song. The code is what comes out of `1900 + 80 + 9` and you need to properly enter it. You'll know where once you're there ;)<br /><br /><b>Room name</b><br />Funny fact about the name: it incorporates four things:<br /><ul><li>Hex, as a reference to hexadecimal numbers (you'll see some of them ;-)</li><li>Hex, as a uranium industry jargon name for <a href="https://en.wikipedia.org/wiki/Uranium_hexafluoride" target="_blank">Uranium Hexafluoride</a></li><li>Exit, related to Escape word</li><li>IT - information technology</li></ul>bjkI'm glad to announce that we have launched an escape room that targets IT people (developers, testers etc). So far it works well for one month and about 30 teams (3-4 people) have already enjoyed it.Please yourself and pay us a visit! Basing on the reactions of other teams I can guarantee remarkable experience. You don't need to be a "hackerman" to complete the room, but if you are you will do so faster ;-). Teams can be mixed too (but at least one person with basic programming skills is rather required).Room location & partnership with Let Me Out.ul. Bernardyńska 4 (close to Galeria Dominikańska), IInd floor (map below)Link to Google MapsBook here: letmeout.pl (select Wrocław)I'm the author and creator of the room but I'm not running the business. The company that operates the room is Let Me Out and has excellent portfolio of other escape rooms in lot of Polish cities and Brussels.Room themeIt goes like this: Another country is trying to become an atomic superpower through the development of nuclear weapons, which consequently results in the destabilization of the region and the escalation of the international conflict on an unprecedented scale. The world is on the verge of the outbreak of World War III. The only salvation is to infect the secret plantation of uranium treatment with a computer virus. Will a group of programmers be able to prevent nuclear war in 90 minutes? I received some suggestions that the room itself should be marketed as "an ordinary escape room with extra IT riddles". And this is actually what I wanted to build. Not to give a desk, PC and Jira for the players, but give them a nice mix of good background story with many different IT riddles.Solve the riddles while saving the world! :-)Easter EggIn the room there are some easter eggs. One of them will let you listen to some famous song. The code is what comes out of `1900 + 80 + 9` and you need to properly enter it. You'll know where once you're there ;)Room nameFunny fact about the name: it incorporates four things:Hex, as a reference to hexadecimal numbers (you'll see some of them ;-)Hex, as a uranium industry jargon name for Uranium HexafluorideExit, related to Escape wordIT - information technologyGlobal app variables in connexion & aiohttp2018-08-17T12:56:00+02:002018-08-17T12:56:00+02:00http://slawomir.net/2018/08/17/global-app-variables-in-connexion<br />tl;dr: use <span style="font-family: "Courier New", Courier, monospace;">pass_context_arg_name</span> and <span style="font-family: "Courier New", Courier, monospace;">api.subapp</span><br /><br />Nowadays microservice architecture seem to be the default way distributed applications are build. Also, people started to treat APIs as a first-class citizen. Hence, it's no surprise that projects like <a href="https://github.com/OAI/OpenAPI-Specification" target="_blank">Swagger/OpenAPI</a> are gaining popularity on a daily basis.<br /><br />One of Python OpenAPI implementations that I discovered recently is <a href="https://github.com/zalando/connexion/" target="_blank">Connexion</a>. Advantages of using OpenAPI are obvious: e.g. you can decouple endpoints schema from app logic and have only single place where whole API is described. Even the fact that there's Swagger UI for API users can be quite beneficial.<br /><br />In the past I've been looking at different frameworks like django-rest, but nothing seemed as simple as Connexion. I decided to play it with right after discovering that the guys from Zalando added support for aiohttp (asynchronous HTTP server) - the framework we use extensively in our projects.<br /><br />So what's the problem? What this post is about? Although Connexion is great, it is undocumented (or my DuckDuckGo-foo sucks and this is in fact just not well-documented) how to glue it with how global variables are handled in aiohttp - using app as an container for globals. Consider following snippet:<br /><br /><span style="font-family: "Courier New", Courier, monospace;"><span style="color: #cc0000;">async</span><b> </b><span style="color: #cc0000;">def</span> handler(request):</span><br /><span style="font-family: "Courier New", Courier, monospace;"> <span style="color: blue;"># this is how aiohttp creators recommend to access global variables</span></span><br /><span style="font-family: "Courier New", Courier, monospace;"> <span style="color: blue;"># e.g. database handle</span></span><br /><span style="font-family: "Courier New", Courier, monospace;"> request.app[<span style="color: magenta;">'redis_con'</span>].incr(<span style="color: magenta;">'visits'</span>)</span><br /><span style="font-family: "Courier New", Courier, monospace;"> <span style="color: #cc0000;">return</span> web.Response(body=<span style="color: magenta;">b'hello'</span>)</span><br /><br />Nothing much more than ordinary aiohttp handler that uses <span style="font-family: "Courier New", Courier, monospace;">redis_con</span> global. Unfortunately using globals with Connexion is not that straightforward. Example how Connexion handlers look like (following comes from Connextion docs):<br /><br /><span style="font-family: "Courier New", Courier, monospace;"><span style="color: #cc0000;">def</span> example(name: str) -> str:</span><br /><span style="font-family: "Courier New", Courier, monospace;"> <span style="color: #cc0000;">return</span> <span style="color: magenta;">'Hello {name}'</span>.format(name=name) </span><br /><br /><table class="highlight tab-size js-file-line-container" data-tab-size="8"><tbody><tr><td class="blob-code blob-code-inner js-file-line" id="LC6"><br /></td><td class="blob-code blob-code-inner js-file-line" id="LC6"><br /></td></tr><tr></tr></tbody></table>There's no request parameter! It took me some time to find out how to let Connexion pass request (aiohttp context) to handlers. I had to dig into source code to figure out following:<br /><br /><span style="font-family: "Courier New", Courier, monospace;"><span style="color: #cc0000;">def</span> start(redis_con):</span><br /><span style="font-family: "Courier New", Courier, monospace;"> app = connexion.AioHttpApp(__name__, specification_dir=<span style="color: magenta;">'swagger/'</span>)<br /> api = app.add_api(<span style="color: magenta;">'api.yml'</span>, pass_context_arg_name=<span style="color: magenta;">'request'</span>)<br /> api.subapp[<span style="color: magenta;">'redis_con'</span>] = redis_con<br /> app.run()</span><br /><br />We're passing <span style="font-family: "Courier New", Courier, monospace;">pass_context_arg_name</span> parameter and it turns out that for aiohttp the context is the request. The unintuive thing is that subapp part. We need to use it in order to set global. This part I have found in <span style="font-family: "Courier New", Courier, monospace;">aiohttp_jinja2.setup</span> function. Now, we can use it in handlers like following.<br /><br /><span style="font-family: "Courier New", Courier, monospace;"><span style="color: #cc0000;">async def</span> handler(*args, **kwargs):</span><br /><span style="font-family: "Courier New", Courier, monospace;"> kwargs[<span style="color: magenta;">'request'</span>].app[<span style="color: magenta;">'redis_con'</span>].incr(<span style="color: magenta;">'visits'</span>)</span><br /><span style="font-family: "Courier New", Courier, monospace;"> <span style="color: #cc0000;">return</span> web.Response(body=<span style="color: magenta;">b'hello'</span>) </span><br /><br />That's all. Seems like easy thing, but nowhere online could I find it.bjktl;dr: use pass_context_arg_name and api.subappNowadays microservice architecture seem to be the default way distributed applications are build. Also, people started to treat APIs as a first-class citizen. Hence, it's no surprise that projects like Swagger/OpenAPI are gaining popularity on a daily basis.One of Python OpenAPI implementations that I discovered recently is Connexion. Advantages of using OpenAPI are obvious: e.g. you can decouple endpoints schema from app logic and have only single place where whole API is described. Even the fact that there's Swagger UI for API users can be quite beneficial.In the past I've been looking at different frameworks like django-rest, but nothing seemed as simple as Connexion. I decided to play it with right after discovering that the guys from Zalando added support for aiohttp (asynchronous HTTP server) - the framework we use extensively in our projects.So what's the problem? What this post is about? Although Connexion is great, it is undocumented (or my DuckDuckGo-foo sucks and this is in fact just not well-documented) how to glue it with how global variables are handled in aiohttp - using app as an container for globals. Consider following snippet:async def handler(request): # this is how aiohttp creators recommend to access global variables # e.g. database handle request.app['redis_con'].incr('visits') return web.Response(body=b'hello')Nothing much more than ordinary aiohttp handler that uses redis_con global. Unfortunately using globals with Connexion is not that straightforward. Example how Connexion handlers look like (following comes from Connextion docs):def example(name: str) -> str: return 'Hello {name}'.format(name=name) There's no request parameter! It took me some time to find out how to let Connexion pass request (aiohttp context) to handlers. I had to dig into source code to figure out following:def start(redis_con): app = connexion.AioHttpApp(__name__, specification_dir='swagger/') api = app.add_api('api.yml', pass_context_arg_name='request') api.subapp['redis_con'] = redis_con app.run()We're passing pass_context_arg_name parameter and it turns out that for aiohttp the context is the request. The unintuive thing is that subapp part. We need to use it in order to set global. This part I have found in aiohttp_jinja2.setup function. Now, we can use it in handlers like following.async def handler(*args, **kwargs): kwargs['request'].app['redis_con'].incr('visits') return web.Response(body=b'hello') That's all. Seems like easy thing, but nowhere online could I find it.Handling multiple identical USB ethernet adapters (Raspberry PI, udev)2018-07-21T22:47:00+02:002018-07-21T22:47:00+02:00http://slawomir.net/2018/07/21/handling-multiple-identical-usbYou have to build simple ethernet-connected chain of devices and continuously check that it's healthy. In order to save money and time you decide to replace individual devices (say Raspberries) with multiple USB ethernet adapters. You buy Chinese ones. What could go wrong?<br /><br />We're building an <a href="https://en.wikipedia.org/wiki/Escape_room" target="_blank">escape room</a>. There's plenty of them in Wrocław but our is special, because it's dedicated for IT guys. Random people would have lot of trouble solving even first riddles. These riddles are supposed to be great fun for tech people.<br /><br />I don't want to spoil what are the riddles. Let us stay with the technical problem that I had at hand. Multiple devices need to be accessible in some specific configuration to solve one of the riddles. It made no sense to have these devices if their only purpose was to respond to some ICMP packet (certainly there is even more low-level solution, but we need something easy and reliable now). We decided to limit number of these and to attach USB ethernet adapters to each. My colleague has bought some Chinese adapters like on picture below and problems emerged immediately.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-3P2R5w94bA8/W1OYBSTzSfI/AAAAAAAAAg0/IyWd2pDG0jAl0bw6jHZkIyZ-UES2DLeywCLcBGAs/s1600/20180717_223254.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1600" data-original-width="1200" height="320" src="https://1.bp.blogspot.com/-3P2R5w94bA8/W1OYBSTzSfI/AAAAAAAAAg0/IyWd2pDG0jAl0bw6jHZkIyZ-UES2DLeywCLcBGAs/s320/20180717_223254.jpg" width="240" /></a></div><br />BTW the funny fact about CE marks on some devices (I'm not sure about this one) may not actually be CE marks but "China Export". You can read more about it <a href="https://www.ybw.com/vhf-marine-radio-guide/warning-dont-get-confused-between-the-ce-mark-and-the-china-export-mark-4607" target="_blank">here</a>.<br /><br /><b>Perfect hardware clones!</b><br /><br />So what's the problem? Well... when I firstly plugged in first adapted I made some configuration changes in Raspbian and was happy that everything works flawlessly. However, couple of days after I connected second adapter to the same device and it was the time when the problem surfaced. All of these USB adapters had the same MAC address. To make it even worse, after inspecting what's in /sys, I was sure that all of the USB parameters are also identical. In other words these devices were perfect clones. ROM was the same for all of them! And btw one out of 8 was not working at all.<br /><br />Why this is a problem? It's because if the names are the same, kernel will rename network interface name to something like rename{number} and there's no reliable way to tell which interface is connected to which cable. Sadly, they also share the same MAC, so if you connect all adapters to the same switch, funny things will start to happen!<br /><br /><b>U<strike>boot</strike>dev for the rescue</b><br /><br />I'm not that into Linux, but I immediately knew where to look for - udev. I was afraid that there won't be a way to differentiate between adapters at udev level and I was right.<br /><br />However, some silly (maybe not silly. If something is silly but it works it means it's not silly ;-) solution is possible: differentiate USB ports rather than the devices themselves.<br /><br />I started to read documentation and have found that you can create rules based on ports, like following:<br /><br /><span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;">SUBSYSTEM=="net", KERNELS=="1-2:1.0", ATTR{address}=="00:e0:4c:53:44:58"</span></span><br /><br />net is the subsystem we want. USB port must be provided in KERNELS parameter (S at the end is both intentional and crucial). By providing address attribute you may further target only these Chinese adapters you have on the desk.<br /><br />Finding out usb ports proved to be a little tricky task. You can do it using udevadm utility.<br />I have prepared diagram for my RPi 3:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-tP85WsmJIt8/W1OZjIDFTII/AAAAAAAAAhA/HYpi3sXIn_8NlpCHCm48tqSbOmTE7GAjACLcBGAs/s1600/rpi-usb-ports.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="186" data-original-width="566" height="131" src="https://4.bp.blogspot.com/-tP85WsmJIt8/W1OZjIDFTII/AAAAAAAAAhA/HYpi3sXIn_8NlpCHCm48tqSbOmTE7GAjACLcBGAs/s400/rpi-usb-ports.png" width="400" /></a></div><br /><br />Please take note that this may be different in your case. The reason is that it all depends on:<br /><ul><li>hardware revision</li><li>firmware versions</li><li>kernel version</li><li>kernel modules version </li></ul>Once we know these USB "addresses" we can write rules. Rules are below. I'd like to additionally emphasize two things:<br /><ul><li>you can target using <span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;">ATTR{address}=="mac-here"</span></span>, but apparently there's no way to change it (<span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;">ATTR{address}="new-mac"</span></span> doesn't work)</li><li>changing MAC address is still possible (e.g. <span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;">ifconfig <ifname> hw ether ...</span></span>) and you can even use the name you set, but you must use absolute paths to executable!</li></ul><br /><span style="font-size: x-small;"><span style="font-family: "courier new" , "courier" , monospace;">SUBSYSTEM=="net", KERNELS=="1-1.2:1.0", ATTR{address}=="00:e0:4c:53:44:58", NAME="kabelek1", RUN+="/sbin/ifconfig kabelek1 hw ether 00:e0:4c:00:00:01"<br />SUBSYSTEM=="net", KERNELS=="1-1.4:1.0", ATTR{address}=="00:e0:4c:53:44:58", NAME="kabelek2", RUN+="/sbin/ifconfig kabelek2 hw ether 00:e0:4c:00:00:02"<br />SUBSYSTEM=="net", KERNELS=="1-1.3:1.0", ATTR{address}=="00:e0:4c:53:44:58", NAME="kabelek3", RUN+="/sbin/ifconfig kabelek3 hw ether 00:e0:4c:00:00:03"<br />SUBSYSTEM=="net", KERNELS=="1-1.5:1.0", ATTR{address}=="00:e0:4c:53:44:58", NAME="kabelek4", RUN+="/sbin/ifconfig kabelek4 hw ether 00:e0:4c:00:00:04"</span></span><br /><br />And voile-a! You are free to connect lot of adapters to single Raspberry. You still need to maintain USB-port and Ethernet cables coupling and also you will need to do something with the cables ;)<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-DhbDhJjMNoo/W1OYBGL_E8I/AAAAAAAAAgw/8djGnlBeu5MD2ySJzaNMMWeiMdTa52OIwCLcBGAs/s1600/20180717_223305.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1200" data-original-width="1600" height="240" src="https://3.bp.blogspot.com/-DhbDhJjMNoo/W1OYBGL_E8I/AAAAAAAAAgw/8djGnlBeu5MD2ySJzaNMMWeiMdTa52OIwCLcBGAs/s320/20180717_223305.jpg" width="320" /></a></div><br />This is how my desk looked like when I was figuring things out.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://2.bp.blogspot.com/-p7ecq027W4M/W0J50vEJN9I/AAAAAAAAAgg/mPtaMlX2NkkeT0n7dobNA3eLPfWH3ZO1ACLcBGAs/s1600/20180621_002046.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1600" data-original-width="1200" height="400" src="https://2.bp.blogspot.com/-p7ecq027W4M/W0J50vEJN9I/AAAAAAAAAgg/mPtaMlX2NkkeT0n7dobNA3eLPfWH3ZO1ACLcBGAs/s400/20180621_002046.jpg" width="300" /></a></div>To summarize, almost everything can be done and if something really can't, then you somehow can circumvent. However I believe this trick is just palliative. Chinese adapters can backfire any time, so if you require reliability, then you should look for other hardware.<br /><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"></div>bjkYou have to build simple ethernet-connected chain of devices and continuously check that it's healthy. In order to save money and time you decide to replace individual devices (say Raspberries) with multiple USB ethernet adapters. You buy Chinese ones. What could go wrong?We're building an escape room. There's plenty of them in Wrocław but our is special, because it's dedicated for IT guys. Random people would have lot of trouble solving even first riddles. These riddles are supposed to be great fun for tech people.I don't want to spoil what are the riddles. Let us stay with the technical problem that I had at hand. Multiple devices need to be accessible in some specific configuration to solve one of the riddles. It made no sense to have these devices if their only purpose was to respond to some ICMP packet (certainly there is even more low-level solution, but we need something easy and reliable now). We decided to limit number of these and to attach USB ethernet adapters to each. My colleague has bought some Chinese adapters like on picture below and problems emerged immediately.BTW the funny fact about CE marks on some devices (I'm not sure about this one) may not actually be CE marks but "China Export". You can read more about it here.Perfect hardware clones!So what's the problem? Well... when I firstly plugged in first adapted I made some configuration changes in Raspbian and was happy that everything works flawlessly. However, couple of days after I connected second adapter to the same device and it was the time when the problem surfaced. All of these USB adapters had the same MAC address. To make it even worse, after inspecting what's in /sys, I was sure that all of the USB parameters are also identical. In other words these devices were perfect clones. ROM was the same for all of them! And btw one out of 8 was not working at all.Why this is a problem? It's because if the names are the same, kernel will rename network interface name to something like rename{number} and there's no reliable way to tell which interface is connected to which cable. Sadly, they also share the same MAC, so if you connect all adapters to the same switch, funny things will start to happen!Ubootdev for the rescueI'm not that into Linux, but I immediately knew where to look for - udev. I was afraid that there won't be a way to differentiate between adapters at udev level and I was right.However, some silly (maybe not silly. If something is silly but it works it means it's not silly ;-) solution is possible: differentiate USB ports rather than the devices themselves.I started to read documentation and have found that you can create rules based on ports, like following:SUBSYSTEM=="net", KERNELS=="1-2:1.0", ATTR{address}=="00:e0:4c:53:44:58"net is the subsystem we want. USB port must be provided in KERNELS parameter (S at the end is both intentional and crucial). By providing address attribute you may further target only these Chinese adapters you have on the desk.Finding out usb ports proved to be a little tricky task. You can do it using udevadm utility.I have prepared diagram for my RPi 3:Please take note that this may be different in your case. The reason is that it all depends on:hardware revisionfirmware versionskernel versionkernel modules version Once we know these USB "addresses" we can write rules. Rules are below. I'd like to additionally emphasize two things:you can target using ATTR{address}=="mac-here", but apparently there's no way to change it (ATTR{address}="new-mac" doesn't work)changing MAC address is still possible (e.g. ifconfig <ifname> hw ether ...) and you can even use the name you set, but you must use absolute paths to executable!SUBSYSTEM=="net", KERNELS=="1-1.2:1.0", ATTR{address}=="00:e0:4c:53:44:58", NAME="kabelek1", RUN+="/sbin/ifconfig kabelek1 hw ether 00:e0:4c:00:00:01"SUBSYSTEM=="net", KERNELS=="1-1.4:1.0", ATTR{address}=="00:e0:4c:53:44:58", NAME="kabelek2", RUN+="/sbin/ifconfig kabelek2 hw ether 00:e0:4c:00:00:02"SUBSYSTEM=="net", KERNELS=="1-1.3:1.0", ATTR{address}=="00:e0:4c:53:44:58", NAME="kabelek3", RUN+="/sbin/ifconfig kabelek3 hw ether 00:e0:4c:00:00:03"SUBSYSTEM=="net", KERNELS=="1-1.5:1.0", ATTR{address}=="00:e0:4c:53:44:58", NAME="kabelek4", RUN+="/sbin/ifconfig kabelek4 hw ether 00:e0:4c:00:00:04"And voile-a! You are free to connect lot of adapters to single Raspberry. You still need to maintain USB-port and Ethernet cables coupling and also you will need to do something with the cables ;)This is how my desk looked like when I was figuring things out.To summarize, almost everything can be done and if something really can't, then you somehow can circumvent. However I believe this trick is just palliative. Chinese adapters can backfire any time, so if you require reliability, then you should look for other hardware.Preconfigured Jenkins cluster in Docker Swarm (proxy, accounts, plugins)2018-01-03T08:38:00+01:002018-01-03T08:38:00+01:00http://slawomir.net/2018/01/03/preconfigured-jenkins-cluster-in-dockerIn recent years lot of popular technologies were adjusted so they can run in Docker containers. Our industry even coined new verb - dockerization. When something is dockerized we usually expect it to behave like self-contained app that is controlled with either command line switches or environment variables. We also assume that apart of this kind of customization the dockerized thing is zero-conf - it will start right away with no further magic spells.<br /><br />It's just awesome when things work that way. Unfortunately there are exceptions and Jenkins is one of them. The problem with Jenkins is that even when you start it from within a container, you still need to:<br /><ul><li>open configuration wizard (it's a web page) </li><li>prove that you're the guy: pass it's challenge by reading some magic file and pasting its content into configuration wizard</li><li>configure proxy, if you're behind one</li><li>select plugins to be installed during initialization</li><li>setup admin account </li></ul>Pretty bad. It resembles installation wizard like in Windows. Phew. Couple of weeks ago I was trying to check out how well Jenkins would solve one of our data transformation (ETL) problem and was unsure how many times it will be deployed. Hence I needed to do something about this installation process so it sucks less. All of the building blocks were already on the table: Terraform, Ansible and Docker Swarm. The missing part was pre-configured dockerized Jenkins running in the Swarm.<br /><br /><a href="https://4.bp.blogspot.com/-HZ6nEhx-jNE/WkX9YNH32CI/AAAAAAAAAew/5TH6mL0cuEUmXnSV1Sf2px2ODb5ZzJctwCLcBGAs/s1600/satanspbeach.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="1181" data-original-width="767" height="320" src="https://4.bp.blogspot.com/-HZ6nEhx-jNE/WkX9YNH32CI/AAAAAAAAAew/5TH6mL0cuEUmXnSV1Sf2px2ODb5ZzJctwCLcBGAs/s320/satanspbeach.jpg" width="207" /></a>So this post, in DuckDuckGo-friendly list, explains how to:<br /><ul><li>pre-configure Jenkins with custom user (admin) account</li><li>pre-configure Jenkins with a proxy</li><li>pre-configure Jenkins with specified plugins</li><li>run Jenkins master and slaves entirely in Docker Swarm with Jenkins' own Swarm plugin for automatic master-slave connection establishment</li><li>allow Jenkins jobs to execute other Docker containers nearby (the daemon's sock trick)</li></ul><br /><br /><br /><br /><br /><span style="font-size: xx-small;"><span style="color: #999999;"> http://www.rustypants.net/wp-content/uploads/2008/10/satanspbeach.jpg</span></span><br /><br /><br /><h4>Abandon all hope, ye who enter here.</h4>I remember that in one of C projects (<strike>not sure what was it, but perhaps something from GNU, maybe RMS</strike> update: <a href="https://gist.github.com/danmilon/4719562" target="_blank">it was xterm</a>) there was this comment "abandon all hope, ye who enter here". <strike>It also mentioned how many people have ignored this warning and tried to refactor something.</strike> I have the same reflections w.r.t. configuring Jenkins without custom Groovy scripts. I was reluctant to learn new language, but eventually this seemed like the most reasonable way to continue.<br /><br />Of course, all of following problems can be solved in a troglodyte way too. E.g. you can configure by hand, extract Jenkins home directory, targz it and re-use. But that brings couple of other problems. Also, surprisingly fresh Jenkins home weighted about 70MBs in my case. I always thought that it's just bunch of XML files, but perhaps it's not that straightforward. Since primitive solutions didn't work right away, I decided to stop for a while and try to solve the problem "the right way".<br /><br /><h4>System overview & requirements.</h4>System is simple: there's one master (and it's an brilliant example of a SPOF, but nobody cares, since you're unsure of future) and number of workers (slaves). We want workers to register to the master automatically. Unfortunately this is not possible using plain JNLP solution, because you need to register the worker in master prior to establishing a link. In theory you could do some <span style="font-family: "courier new" , "courier" , monospace;">curl</span> magic, but fortunately there's a plugin that does it for you - Jenkins Swarm (not to be confused with Docker Swarm, as it has literally nothing to do with it). Jenkins Swarm plugin consists of two things: a plugin for master Jenkins and Java JAR for slaves.<br />So we're set up. Jenkins Swarm will take care of auto-connecting slaves. Now, we must run dockerized version of these slaves and put it to Docker Swarm. But before we talk about slaves, let's handle the master.<br /><br /><h4>Jenkins master with plugins, proxy, and extra configuration.</h4>Let me paste Dockerfile and explain it line by line.<br /><br /><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;"><b>FROM</b></span> jenkins/jenkins:2.89.1-alpine</span><br /><span style="font-family: "courier new" , "courier" , monospace;"><br /></span><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;"><b>ARG</b></span> proxy</span><br /><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;"><b>ENV</b></span> http_proxy=$proxy https_proxy=$proxy</span><br /><span style="font-family: "courier new" , "courier" , monospace;"><br /></span><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;"><b>USER</b></span> root</span><br /><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;"><b>RUN</b></span> apk update && apk add python3</span><br /><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;"><b>COPY</b></span> requirements.txt /tmp/requirements.txt</span><br /><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;"><b>RUN</b></span> pip3 install -r /tmp/requirements.txt</span><br /><span style="font-family: "courier new" , "courier" , monospace;"><br /></span><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;"><b>USER</b></span> jenkins</span><br /><span style="font-family: "courier new" , "courier" , monospace;"><br /></span><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;"><b>COPY</b></span> plugins.txt /plugins.txt</span><br /><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;"><b>RUN</b></span> /usr/local/bin/install-plugins.sh swarm:3.6 workflow-aggregator:2.5</span><br /><span style="font-family: "courier new" , "courier" , monospace;"><br /></span><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;"><b>ENV</b></span> JAVA_OPTS=<span style="color: red;">"-Djenkins.install.runSetupWizard=false"</span></span><br /><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;"><b>COPY</b></span> security.groovy /usr/share/jenkins/ref/init.groovy.d/security.groovy</span><br /><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;"><b>COPY</b></span> proxy.groovy /usr/share/jenkins/ref/init.groovy.d/proxy.groovy</span><br /><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;"><b>COPY</b></span> executors.groovy /usr/share/jenkins/ref/init.groovy.d/executors.groovy</span><br /><br />We must start with some Jenkins image in order to customize it. In my case that's slim Alpine Linux version 2.89.1. Then there's build argument for the proxy. You can ignore this part if you're not behind one.<br /><br />Before we modify the image, we need to switch to root user. After we're done we should switch it back to jenkins fo better security (if you wonder how to check it without base image Dockerfile, <span style="font-family: "courier new" , "courier" , monospace;">docker history</span> command is your friend). In my case I'm also installing some <span style="font-family: "courier new" , "courier" , monospace;">python3</span> stuff defined in <span style="font-family: "courier new" , "courier" , monospace;">requirements.txt</span> dependency file. If you're not willing to add any package to the system, you can skip this entire part too.<br /><br />Then, we approach configuring plugins. In different places in Internet you can find an advice to use <span style="font-family: "courier new" , "courier" , monospace;">/usr/local/bin/plugins.sh</span> but believe me you don't want to do this, as this installs plugins without their dependencies. Newer <span style="font-family: "courier new" , "courier" , monospace;">install-plugins.sh</span> script takes care of dependencies for you. In our case we're installing two plugins. You might want to install just the essential one - the swarm plugin.<br /><br />Now, four nonstandard lines. I believe that setting <span style="font-family: "courier new" , "courier" , monospace;">runSetupWizard</span> to <span style="font-family: "courier new" , "courier" , monospace;">false</span> is self-explanatory. The rest of lines are there for account setup, proxy configuration and executors configuration.<br /><br />Let's start with setting up admin account. Groovy here we go! <br /><br /><span style="color: magenta;"><b><span style="font-family: "courier new" , "courier" , monospace;">#!groovy</span></b></span><br /><span style="font-family: "courier new" , "courier" , monospace;"><br /></span><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;"><b>import</b></span> jenkins.model.*</span><br /><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;"><b>import</b></span> hudson.security.*</span><br /><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;"><b>import</b></span> jenkins.security.s2m.AdminWhitelistRule</span><br /><span style="font-family: "courier new" , "courier" , monospace;"><br /></span><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;"><b>def</b></span> instance = Jenkins.getInstance()</span><br /><span style="font-family: "courier new" , "courier" , monospace;"><br /></span><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;"><b>def</b></span> user = <span style="color: purple;"><b>new</b></span> File(<span style="color: red;">"/run/secrets/jenkinsUser"</span>).text.trim()</span><br /><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;"><b>def</b></span> pass = <span style="color: purple;"><b>new</b></span> File(<span style="color: red;">"/run/secrets/jenkinsPassword"</span>).text.trim()</span><br /><span style="font-family: "courier new" , "courier" , monospace;"><br /></span><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;"><b>def</b></span> hudsonRealm = <span style="color: purple;"><b>new</b></span> HudsonPrivateSecurityRealm(<span style="color: magenta;">false</span>)</span><br /><span style="font-family: "courier new" , "courier" , monospace;">hudsonRealm.createAccount(user, pass)</span><br /><span style="font-family: "courier new" , "courier" , monospace;">instance.setSecurityRealm(hudsonRealm)</span><br /><span style="font-family: "courier new" , "courier" , monospace;"><br /></span><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;"><b>def</b></span> strategy = <span style="color: purple;"><b>new</b></span> FullControlOnceLoggedInAuthorizationStrategy()</span><br /><span style="font-family: "courier new" , "courier" , monospace;">instance.setAuthorizationStrategy(strategy)</span><br /><span style="font-family: "courier new" , "courier" , monospace;">instance.save()</span><br /><span style="font-family: "courier new" , "courier" , monospace;"><br /></span><span style="font-family: "courier new" , "courier" , monospace;">Jenkins.instance.getInjector().getInstance(AdminWhitelistRule.class).setMasterKillSwitch(<span style="color: magenta;">false</span>)</span><br /><br />I'm not Groovy expert so don't judge me by the code above. I have started with just knowledge that it runs over JVM :). It's actually looks like nice managed language. The good part is that, as in Python, the code mostly speaks for itself. Hudson Legacy is visible here as well. I won't go into details - if you want to know from where all of this magic comes, pay a visit to <a href="http://javadoc.jenkins.io/" target="_blank">official docs</a>. Don't forget that you can also use infamous Jenkins console. I found Groovy's <span style="font-family: "courier new" , "courier" , monospace;">dump</span> built-in very helpful too.<br />So the above script will actually setup an admin account, but doesn't hardwire anything. Both username and password come from <a href="https://docs.docker.com/engine/swarm/secrets/" target="_blank">Docker Secrets</a> that allows you to manage sensitive data in your Swarm cluster nicely.<br /><br />Now, the second script is for proxy:<br /><br /><span style="color: magenta;"><b><span style="font-family: "courier new" , "courier" , monospace;">#!groovy</span></b></span><br /><span style="font-family: "courier new" , "courier" , monospace;"><br /></span><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;"><b>import</b></span> jenkins.model.*</span><br /><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;"><b>import</b></span> hudson.*</span><br /><span style="font-family: "courier new" , "courier" , monospace;"><br /></span><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;"><b>def</b></span> instance = Jenkins.getInstance()</span><br /><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;"><b>def</b></span> pc = <span style="color: purple;"><b>new</b></span> hudson.ProxyConfiguration(<span style="color: red;">"1.2.3.4"</span>, <span style="color: magenta;">8080</span>, <span style="color: magenta;">null</span>, <span style="color: magenta;">null</span>, <span style="color: red;">"localhost,*.your.intranet.com"</span>);</span><br /><span style="font-family: "courier new" , "courier" , monospace;">instance.proxy = pc;</span><br /><span style="font-family: "courier new" , "courier" , monospace;">instance.save()</span><br /><br />Here's some magic too. It sets up proxy <span style="font-family: "courier new" , "courier" , monospace;">1.2.3.4:8080</span> but with specified exceptions. Then it modifies Jenkins instance (which seem to be a singleton).<br /><br />And finally, executors part. I wanted this one so master is not used as a worker at all.<br /><br /><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;"><b>import</b></span> jenkins.model.*</span><br /><span style="font-family: "courier new" , "courier" , monospace;">Jenkins.instance.setNumExecutors(<span style="color: magenta;">0</span>)</span><br /><br /><h4>Slaves.</h4>Now, since the master is ready, let's configure slaves. Their Dockerfile is as follows.<br /><br /><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;"><b>FROM</b></span> docker:17.03-rc</span><br /><span style="font-family: "courier new" , "courier" , monospace;"><br /></span><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;"><b>ARG</b></span> proxy</span><br /><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;"><b>ENV</b></span> https_proxy=$proxy http_proxy=$proxy no_proxy=<span style="color: red;">"localhost,*.your.intranet.com"</span></span><br /><span style="font-family: "courier new" , "courier" , monospace;"><br /></span><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;"><b>RUN</b></span> apk --update add openjdk8-jre git python3</span><br /><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;"><b>RUN</b></span> wget -O swarm-client.jar http://repo.jenkins-ci.org/releases/org/jenkins-ci/plugins/swarm-client/3.3/swarm-client-3.3.jar</span><br /><span style="font-family: "courier new" , "courier" , monospace;"><br /></span><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;"><b>ENV</b></span> http_proxy= https_proxy=</span><br /><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;"><b>COPY</b></span> entrypoint.sh /</span><br /><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;"><b>RUN</b></span> chmod +x /entrypoint.sh</span><br /><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;"><b>CMD</b></span> [<span style="color: red;">"/entrypoint.sh"</span>]</span><br /><br />This time base image is docker, because we want to have docker installed within this docker container (so this container can spawn other containers). After setting proxies (the part that is not mandatory) we must download Java Runtime Environment version 8 and download swarm-client JAR. I'm using version 3.3 which is accessible through URL as for today.<br />Finally, there's an entrypoint that will execute swarm-client and do all the magic, but it heavily relies on Docker Secret named <span style="font-family: "courier new" , "courier" , monospace;">jenkinsSwarm</span>, which should look like following.<br /><br /><span style="font-family: "courier new" , "courier" , monospace;">-master http://master_address:8080 -password jenkinsUser -username jenkinsPassword</span><br /><br /><br />Here master_address must be known to slave machines (e.g. in <span style="font-family: "courier new" , "courier" , monospace;">/etc/hosts</span>, Consul or something). You should also include username and password - the same ones that you share in other Docker Swarm secrets.<br /><br />If you're using Ansible like I do, it's pretty straightforward to utilize variables instead not to hardcode credentials. For instance <span style="font-family: "courier new" , "courier" , monospace;">ansible-vault</span> can be used for this.<br /><br /><span style="font-family: "courier new" , "courier" , monospace;">entrypoint.sh</span> itself is almost one-liner:<br /><br /><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;"><b>mkdir</b></span> /tmp/jenkins</span><br /><span style="font-family: "courier new" , "courier" , monospace;"><span style="color: purple;"><b>java</b></span> -jar swarm-client.jar -labels=docker -executors=1 -fsroot=/tmp/jenkins -name=docker-<span style="color: red;">$(hostname)</span> <span style="color: red;">$(cat /run/secrets/jenkinsSwarm)</span></span><br /><br />It assumes that it's running in the Swarm and can access <span style="font-family: "courier new" , "courier" , monospace;">/run/secrets/jenkinsSwarm</span> (the line that's pasted above).<br /><br /><h4>Glueing it all together.</h4>Building blocks are already in place. Now it's time to glue everything together. I don't want to go into details here, because this is not primary topic of this blog post. If you're interested in how personally I did everything please let me know in comments, so I will create GitHub repo. Let me however give you some important hints:<br /><ul><li>if you want slave to be able to spawn other containers (on the same host on which the slave is running), you must bind mount <span style="font-family: "courier new" , "courier" , monospace;">docker.sock</span> file, e.g. like this: <span style="color: red;"><span style="font-family: "courier new" , "courier" , monospace;">"/var/run/docker.sock:/var/run/docker.sock"</span></span>. There's more to this, though! Docker daemon will not allow <span style="font-family: "courier new" , "courier" , monospace;">jenkins</span> user to spawn containers, so you must somehow circumvent this problem. I'm circumventing this by adding <span style="font-family: "courier new" , "courier" , monospace;">jenkins</span> user to docker group, but this works only because there's 1:1 mapping between the host and container.</li><li>you should have three secrets in Docker Swarm cluster: <span style="font-family: "courier new" , "courier" , monospace;">jenkinsUser</span>, <span style="font-family: "courier new" , "courier" , monospace;">jenkinsPassword</span> and <span style="font-family: "courier new" , "courier" , monospace;">jenkinsSwarm</span> with username, password, and swarm-client.jar arguments respectively</li><li>machines must be able to communicate. For internal JNLP communication, port <span style="color: magenta;"><span style="font-family: "courier new" , "courier" , monospace;">50000/tcp</span></span> must be opened.</li><li>if you set deployment mode to global in <span style="font-family: "courier new" , "courier" , monospace;">docker-compose.yml</span> file (if you're using one), then you will have as much slaves as machines in the cluster, which can be nice</li><li>if you're gonna stick to this solution for a longer period of time I recommend to think about horizontally scaling out and in: it should be as simple as adding/removing machines from the cluster: just one <span style="font-family: "courier new" , "courier" , monospace;">terraform</span> command followed by <span style="font-family: "courier new" , "courier" , monospace;">ansible-playbook</span> spell.</li></ul><br /><br />Hopefully this post helps you with setting up Jenkins cluster that simply works. If you'd like to see the code, let me know in comments!bjkIn recent years lot of popular technologies were adjusted so they can run in Docker containers. Our industry even coined new verb - dockerization. When something is dockerized we usually expect it to behave like self-contained app that is controlled with either command line switches or environment variables. We also assume that apart of this kind of customization the dockerized thing is zero-conf - it will start right away with no further magic spells.It's just awesome when things work that way. Unfortunately there are exceptions and Jenkins is one of them. The problem with Jenkins is that even when you start it from within a container, you still need to:open configuration wizard (it's a web page) prove that you're the guy: pass it's challenge by reading some magic file and pasting its content into configuration wizardconfigure proxy, if you're behind oneselect plugins to be installed during initializationsetup admin account Pretty bad. It resembles installation wizard like in Windows. Phew. Couple of weeks ago I was trying to check out how well Jenkins would solve one of our data transformation (ETL) problem and was unsure how many times it will be deployed. Hence I needed to do something about this installation process so it sucks less. All of the building blocks were already on the table: Terraform, Ansible and Docker Swarm. The missing part was pre-configured dockerized Jenkins running in the Swarm.So this post, in DuckDuckGo-friendly list, explains how to:pre-configure Jenkins with custom user (admin) accountpre-configure Jenkins with a proxypre-configure Jenkins with specified pluginsrun Jenkins master and slaves entirely in Docker Swarm with Jenkins' own Swarm plugin for automatic master-slave connection establishmentallow Jenkins jobs to execute other Docker containers nearby (the daemon's sock trick) http://www.rustypants.net/wp-content/uploads/2008/10/satanspbeach.jpgAbandon all hope, ye who enter here.I remember that in one of C projects (not sure what was it, but perhaps something from GNU, maybe RMS update: it was xterm) there was this comment "abandon all hope, ye who enter here". It also mentioned how many people have ignored this warning and tried to refactor something. I have the same reflections w.r.t. configuring Jenkins without custom Groovy scripts. I was reluctant to learn new language, but eventually this seemed like the most reasonable way to continue.Of course, all of following problems can be solved in a troglodyte way too. E.g. you can configure by hand, extract Jenkins home directory, targz it and re-use. But that brings couple of other problems. Also, surprisingly fresh Jenkins home weighted about 70MBs in my case. I always thought that it's just bunch of XML files, but perhaps it's not that straightforward. Since primitive solutions didn't work right away, I decided to stop for a while and try to solve the problem "the right way".System overview & requirements.System is simple: there's one master (and it's an brilliant example of a SPOF, but nobody cares, since you're unsure of future) and number of workers (slaves). We want workers to register to the master automatically. Unfortunately this is not possible using plain JNLP solution, because you need to register the worker in master prior to establishing a link. In theory you could do some curl magic, but fortunately there's a plugin that does it for you - Jenkins Swarm (not to be confused with Docker Swarm, as it has literally nothing to do with it). Jenkins Swarm plugin consists of two things: a plugin for master Jenkins and Java JAR for slaves.So we're set up. Jenkins Swarm will take care of auto-connecting slaves. Now, we must run dockerized version of these slaves and put it to Docker Swarm. But before we talk about slaves, let's handle the master.Jenkins master with plugins, proxy, and extra configuration.Let me paste Dockerfile and explain it line by line.FROM jenkins/jenkins:2.89.1-alpineARG proxyENV http_proxy=$proxy https_proxy=$proxyUSER rootRUN apk update && apk add python3COPY requirements.txt /tmp/requirements.txtRUN pip3 install -r /tmp/requirements.txtUSER jenkinsCOPY plugins.txt /plugins.txtRUN /usr/local/bin/install-plugins.sh swarm:3.6 workflow-aggregator:2.5ENV JAVA_OPTS="-Djenkins.install.runSetupWizard=false"COPY security.groovy /usr/share/jenkins/ref/init.groovy.d/security.groovyCOPY proxy.groovy /usr/share/jenkins/ref/init.groovy.d/proxy.groovyCOPY executors.groovy /usr/share/jenkins/ref/init.groovy.d/executors.groovyWe must start with some Jenkins image in order to customize it. In my case that's slim Alpine Linux version 2.89.1. Then there's build argument for the proxy. You can ignore this part if you're not behind one.Before we modify the image, we need to switch to root user. After we're done we should switch it back to jenkins fo better security (if you wonder how to check it without base image Dockerfile, docker history command is your friend). In my case I'm also installing some python3 stuff defined in requirements.txt dependency file. If you're not willing to add any package to the system, you can skip this entire part too.Then, we approach configuring plugins. In different places in Internet you can find an advice to use /usr/local/bin/plugins.sh but believe me you don't want to do this, as this installs plugins without their dependencies. Newer install-plugins.sh script takes care of dependencies for you. In our case we're installing two plugins. You might want to install just the essential one - the swarm plugin.Now, four nonstandard lines. I believe that setting runSetupWizard to false is self-explanatory. The rest of lines are there for account setup, proxy configuration and executors configuration.Let's start with setting up admin account. Groovy here we go! #!groovyimport jenkins.model.*import hudson.security.*import jenkins.security.s2m.AdminWhitelistRuledef instance = Jenkins.getInstance()def user = new File("/run/secrets/jenkinsUser").text.trim()def pass = new File("/run/secrets/jenkinsPassword").text.trim()def hudsonRealm = new HudsonPrivateSecurityRealm(false)hudsonRealm.createAccount(user, pass)instance.setSecurityRealm(hudsonRealm)def strategy = new FullControlOnceLoggedInAuthorizationStrategy()instance.setAuthorizationStrategy(strategy)instance.save()Jenkins.instance.getInjector().getInstance(AdminWhitelistRule.class).setMasterKillSwitch(false)I'm not Groovy expert so don't judge me by the code above. I have started with just knowledge that it runs over JVM :). It's actually looks like nice managed language. The good part is that, as in Python, the code mostly speaks for itself. Hudson Legacy is visible here as well. I won't go into details - if you want to know from where all of this magic comes, pay a visit to official docs. Don't forget that you can also use infamous Jenkins console. I found Groovy's dump built-in very helpful too.So the above script will actually setup an admin account, but doesn't hardwire anything. Both username and password come from Docker Secrets that allows you to manage sensitive data in your Swarm cluster nicely.Now, the second script is for proxy:#!groovyimport jenkins.model.*import hudson.*def instance = Jenkins.getInstance()def pc = new hudson.ProxyConfiguration("1.2.3.4", 8080, null, null, "localhost,*.your.intranet.com");instance.proxy = pc;instance.save()Here's some magic too. It sets up proxy 1.2.3.4:8080 but with specified exceptions. Then it modifies Jenkins instance (which seem to be a singleton).And finally, executors part. I wanted this one so master is not used as a worker at all.import jenkins.model.*Jenkins.instance.setNumExecutors(0)Slaves.Now, since the master is ready, let's configure slaves. Their Dockerfile is as follows.FROM docker:17.03-rcARG proxyENV https_proxy=$proxy http_proxy=$proxy no_proxy="localhost,*.your.intranet.com"RUN apk --update add openjdk8-jre git python3RUN wget -O swarm-client.jar http://repo.jenkins-ci.org/releases/org/jenkins-ci/plugins/swarm-client/3.3/swarm-client-3.3.jarENV http_proxy= https_proxy=COPY entrypoint.sh /RUN chmod +x /entrypoint.shCMD ["/entrypoint.sh"]This time base image is docker, because we want to have docker installed within this docker container (so this container can spawn other containers). After setting proxies (the part that is not mandatory) we must download Java Runtime Environment version 8 and download swarm-client JAR. I'm using version 3.3 which is accessible through URL as for today.Finally, there's an entrypoint that will execute swarm-client and do all the magic, but it heavily relies on Docker Secret named jenkinsSwarm, which should look like following.-master http://master_address:8080 -password jenkinsUser -username jenkinsPasswordHere master_address must be known to slave machines (e.g. in /etc/hosts, Consul or something). You should also include username and password - the same ones that you share in other Docker Swarm secrets.If you're using Ansible like I do, it's pretty straightforward to utilize variables instead not to hardcode credentials. For instance ansible-vault can be used for this.entrypoint.sh itself is almost one-liner:mkdir /tmp/jenkinsjava -jar swarm-client.jar -labels=docker -executors=1 -fsroot=/tmp/jenkins -name=docker-$(hostname) $(cat /run/secrets/jenkinsSwarm)It assumes that it's running in the Swarm and can access /run/secrets/jenkinsSwarm (the line that's pasted above).Glueing it all together.Building blocks are already in place. Now it's time to glue everything together. I don't want to go into details here, because this is not primary topic of this blog post. If you're interested in how personally I did everything please let me know in comments, so I will create GitHub repo. Let me however give you some important hints:if you want slave to be able to spawn other containers (on the same host on which the slave is running), you must bind mount docker.sock file, e.g. like this: "/var/run/docker.sock:/var/run/docker.sock". There's more to this, though! Docker daemon will not allow jenkins user to spawn containers, so you must somehow circumvent this problem. I'm circumventing this by adding jenkins user to docker group, but this works only because there's 1:1 mapping between the host and container.you should have three secrets in Docker Swarm cluster: jenkinsUser, jenkinsPassword and jenkinsSwarm with username, password, and swarm-client.jar arguments respectivelymachines must be able to communicate. For internal JNLP communication, port 50000/tcp must be opened.if you set deployment mode to global in docker-compose.yml file (if you're using one), then you will have as much slaves as machines in the cluster, which can be niceif you're gonna stick to this solution for a longer period of time I recommend to think about horizontally scaling out and in: it should be as simple as adding/removing machines from the cluster: just one terraform command followed by ansible-playbook spell.Hopefully this post helps you with setting up Jenkins cluster that simply works. If you'd like to see the code, let me know in comments!Airflow Docker with Xcom push and pull2017-12-08T22:36:00+01:002017-12-08T22:36:00+01:00http://slawomir.net/2017/12/08/airflow-docker-with-xcom-push-and-pullRecently, in one projects I'm working on, we started to research technologies that can be used to design and execute data processing flows. Amount of data to be processed is counted in terabytes, hence we were aiming at solutions that can be deployed in the cloud. Solutions from Apache umbrella like Hadoop, Spark, or Flink were at the table from the very beginning, but we also looked at others like Luigi or Airflow, because our use case was neither MapReducable nor stream-based.<br /><br />Airflow caught our attention and we decided to give it a shot just to see if we can create PoC using it*. In order to execute PoC faster rather than slower, we planned to provision Swarm cluster for this.<br /><br />In the Airflow you can find couple of so-called operators that allow you to execute actions. There are operators for Bash or Python, but you can also find something for e.g. Hive. Fortunately there is also Docker operator for us.<br /><br /><b>Local PoC</b><br />PoC started on my laptop and not in the cluster. Thankfully, DockerOperator allows you to pass URL to docker daemon, so moving from laptop to cluster is close to just changing one parameter. Nice! <br /><br />If you want to run Airflow server locally from inside container, and have it running as non-root (you should!) and you bind docker.sock from host to the container, you must create docker group in the container that mirrors docker group on your host and then add e.g. airflow user to this group. That does the trick...<br /><br />So just running DockerOperator is not black magic. However, if your containers need to exchange data it starts to be a little bit more tricky.<br /><br /><b>Xcom push/pull</b><br />The push part is simple and documented. Just set <span style="font-family: "Courier New", Courier, monospace;">xcom_push</span> parameter to <span style="font-family: "Courier New", Courier, monospace;">True</span> and last line of container stdout will be published by Airflow as it was pushed programatically. It looks that this is natural Airflow way.<br /><br />Pull is not that obvious. Perhaps because it's not documented. You can't read stdin or something. The way to do this involves joining two dots:<br /><ul><li>command parameter can be Jinja2-templated</li><li>one of the macros allows you to do xcom_pull </li></ul>So you need to prepare your containers in a special way so they can pull/push. Let's start with a container that pushes something:<br /><br /><span style="font-family: "Courier New", Courier, monospace;"><span style="color: purple;">FROM</span> debian<br /><span style="color: purple;">ENTRYPOINT</span> echo <span style="color: red;">'{"i_am_pushing": "json"}'</span></span><br /><br />Simple enough. Now pulling container:<br /><br /><span style="font-family: "Courier New", Courier, monospace;"><span style="color: purple;">FROM</span> debian<br /><span style="color: purple;">COPY</span> ./entrypoint /<br /><span style="color: purple;">ENTRYPOINT</span> [<span style="color: red;">"/entrypoint"</span>]</span><br /><br />Entrypoint script can be whatever and will get the JSON as <span style="font-family: "Courier New", Courier, monospace;">$1</span>. Crucial (and also easy to miss) thing that is required for it to work is that <span style="font-family: "Courier New", Courier, monospace;">ENTRYPOINT</span> must use exec form. Yes, there are two forms of <span style="font-family: "Courier New", Courier, monospace;">ENTRYPOINT</span>. If you use the one without array, then parameters will not be passed to the container!<br /><br />Finally, you can glue things together and you're done. The <span style="font-family: "Courier New", Courier, monospace;">ti</span> macro allows us to get data pushed by other task. <span style="font-family: "Courier New", Courier, monospace;">ti</span> stands for <span style="font-family: "Courier New", Courier, monospace;">task_instance</span>.<br /><br /><span style="font-family: "Courier New", Courier, monospace;">dag = DAG(<span style="color: red;">'docker'</span>, default_args=default_args, schedule_interval=timedelta(<span style="color: purple;">1</span>))<br /><br />t1 = DockerOperator(task_id=<span style="color: red;">'docker_1'</span>, dag=dag, image=<span style="color: red;">'docker_1'</span>, xcom_push=<span style="color: purple;">True</span>)<br /><br />t2 = DockerOperator(task_id=<span style="color: red;">'docker_2'</span>, dag=dag, image=<span style="color: red;">'docker_2'</span>, command=<span style="color: red;">'{{ ti.xcom_pull(task_ids="docker_1") }}'</span>)<br /><br />t2.set_upstream(t1)</span><br /><br /><br /><b>Conclusion</b><br />Docker can be used in Airflow along with Xcom push/pull functionality. It isn't very convenient and is not well documented I would say, but at least it works. <br /><br />If time permits I'm going to create PR for documenting pull op. I don't know how it works out, because in Airflow GH project there are 237 PRs now and some of them are there since May 2016!<br /><br /><br />* the funny thing is that we considered Jenkins too! ;-)bjkRecently, in one projects I'm working on, we started to research technologies that can be used to design and execute data processing flows. Amount of data to be processed is counted in terabytes, hence we were aiming at solutions that can be deployed in the cloud. Solutions from Apache umbrella like Hadoop, Spark, or Flink were at the table from the very beginning, but we also looked at others like Luigi or Airflow, because our use case was neither MapReducable nor stream-based.Airflow caught our attention and we decided to give it a shot just to see if we can create PoC using it*. In order to execute PoC faster rather than slower, we planned to provision Swarm cluster for this.In the Airflow you can find couple of so-called operators that allow you to execute actions. There are operators for Bash or Python, but you can also find something for e.g. Hive. Fortunately there is also Docker operator for us.Local PoCPoC started on my laptop and not in the cluster. Thankfully, DockerOperator allows you to pass URL to docker daemon, so moving from laptop to cluster is close to just changing one parameter. Nice! If you want to run Airflow server locally from inside container, and have it running as non-root (you should!) and you bind docker.sock from host to the container, you must create docker group in the container that mirrors docker group on your host and then add e.g. airflow user to this group. That does the trick...So just running DockerOperator is not black magic. However, if your containers need to exchange data it starts to be a little bit more tricky.Xcom push/pullThe push part is simple and documented. Just set xcom_push parameter to True and last line of container stdout will be published by Airflow as it was pushed programatically. It looks that this is natural Airflow way.Pull is not that obvious. Perhaps because it's not documented. You can't read stdin or something. The way to do this involves joining two dots:command parameter can be Jinja2-templatedone of the macros allows you to do xcom_pull So you need to prepare your containers in a special way so they can pull/push. Let's start with a container that pushes something:FROM debianENTRYPOINT echo '{"i_am_pushing": "json"}'Simple enough. Now pulling container:FROM debianCOPY ./entrypoint /ENTRYPOINT ["/entrypoint"]Entrypoint script can be whatever and will get the JSON as $1. Crucial (and also easy to miss) thing that is required for it to work is that ENTRYPOINT must use exec form. Yes, there are two forms of ENTRYPOINT. If you use the one without array, then parameters will not be passed to the container!Finally, you can glue things together and you're done. The ti macro allows us to get data pushed by other task. ti stands for task_instance.dag = DAG('docker', default_args=default_args, schedule_interval=timedelta(1))t1 = DockerOperator(task_id='docker_1', dag=dag, image='docker_1', xcom_push=True)t2 = DockerOperator(task_id='docker_2', dag=dag, image='docker_2', command='{{ ti.xcom_pull(task_ids="docker_1") }}')t2.set_upstream(t1)ConclusionDocker can be used in Airflow along with Xcom push/pull functionality. It isn't very convenient and is not well documented I would say, but at least it works. If time permits I'm going to create PR for documenting pull op. I don't know how it works out, because in Airflow GH project there are 237 PRs now and some of them are there since May 2016!* the funny thing is that we considered Jenkins too! ;-)Tests stability S09E11 (Docker, Selenium)2017-11-30T02:47:00+01:002017-11-30T02:47:00+01:00http://slawomir.net/2017/11/30/tests-stability-s09e11-docker-selenium<br />If you're experienced in setting up automated testing with Selenium and Docker you'll perhaps agree with me that it's not the most stable thing in the world. Actually it's far far away from any stable island - right in the middle of "the sea of instability".<br /><br />When you think about failures in automated testing and how they develop when the system is growing it can resemble drugs. Seriously. When you start, occasional failures are ignored. You close your eyes and click "Retry". Innocent. But after some time it snowballs into a problem. And you find yourself with a blind fold put on but you can't remember buying it.<br /><br />This post is small story how in one of small projects we started with occasional failures and ended up with... well... you'll see. Read on ;).<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-81RoPbSc1WU/Wh9fOozuFJI/AAAAAAAAAeM/nZkiiBwKzGgNz8J1Ql_O5il4QG09h_T8gCLcBGAs/s1600/2881603057_820af9d26a.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="278" data-original-width="318" src="https://4.bp.blogspot.com/-81RoPbSc1WU/Wh9fOozuFJI/AAAAAAAAAeM/nZkiiBwKzGgNz8J1Ql_O5il4QG09h_T8gCLcBGAs/s1600/2881603057_820af9d26a.jpg" /></a></div><br /><br />For past couple of months I was thinking that "others have worse setups and live", but today it all culminated, I have achieved fourth degree of density and decided to stop being quiet.<br /><br /><b>Disclaimer</b><br />In the middle of this post you might start to think that our environment is simply broken. That's damn right. The cloud in which we're running is not very stable. Sometimes it behaves like it had a sulk. There are problems with proxies too. And finally we add Docker and Selenium to the mixture. I think testimonial from one of our engineers sums it all:<br /><blockquote class="tr_bq"><span lang="en-PH"><span style="font-family: "calibri" , sans-serif; font-size: x-small;"><span style="font-size: 11pt;">if retry didn’t fix it for the <span class="currentHitHighlight" id="0.5176511097110247" name="searchHitInReadingPane">10</span><sup>th</sup> time, then there’s definitely something wrong</span></span></span></blockquote>And now something must be noted as well. The project I'm referring to is just a sideway one. It's an attempt to innovate some process, unsupported by the business whatsoever.<br /><br /><b>The triggers</b><br />I was pressing "Retry" button for another time on two of the e2e jobs and saw following.<br /><br /><span style="font-size: small;"><span style="font-family: "courier new" , "courier" , monospace;">// job 1<br />couldn't stat /proc/self/fd/18446744073709551615: stat /proc/self/fd/23: no such file or directory<br /><br />// job 2<br />Service 'frontend' failed to build: readlink /proc/4304/exe: no such file or directory</span></span><br /><br />What the hell is this? We have never seen this before and now apparently it became a commonplace in our CI pipeline (it was nth retry).<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://4.bp.blogspot.com/-S4wr265VQKg/Wh9cme1pQjI/AAAAAAAAAd0/rNcT_ZsLojYNWcDrZlfYPvLgyDP3J9RQACLcBGAs/s1600/Mad_scientist.svg.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1589" data-original-width="1094" height="400" src="https://4.bp.blogspot.com/-S4wr265VQKg/Wh9cme1pQjI/AAAAAAAAAd0/rNcT_ZsLojYNWcDrZlfYPvLgyDP3J9RQACLcBGAs/s400/Mad_scientist.svg.png" width="275" /></a></div><br />So this big number after /fd/ is 64-bit value of -1. Perhaps something in Selenium uses some function that returns an error and then tries to call stat syscall, passing -1 as an argument. Function return value was not checked!<br />The second error message is most probably related to docker. Something tries to find where is executable for some PID. Why?<br /><br />"Retry" solution did not work this time. Re-deploying e2e workers also didn't help. I thought that now is the time when we should get some insights into what is actually happening and how many failures were caused by unstable environment.<br /><br />Luckily we're running on GitLab, which provides reasonable API. Read on to see what I've found. I personally find it hilarious.<br /><br /><b>Insight into failures</b><br />It's extremely easy to make use of GitLab CI API (thanks GitLab guys!). I have extracted JSON objects for every job in every pipeline that was recorded in our project and started playing with the data.<br /><br />The first thing that I checked was how many failures there are per particular type of test. Names are anonymized a little because I'm unsure if this is sensitive data or not. Better safe than sorry!<br /><br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://2.bp.blogspot.com/-LGpOoRQrCmA/Wh9OnGVswlI/AAAAAAAAAdY/-sKo7w-EDeUf8i02u2CfyCJX9S9y3OFVgCLcBGAs/s1600/Figure_1.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="872" data-original-width="1600" height="347" src="https://2.bp.blogspot.com/-LGpOoRQrCmA/Wh9OnGVswlI/AAAAAAAAAdY/-sKo7w-EDeUf8i02u2CfyCJX9S9y3OFVgCLcBGAs/s640/Figure_1.png" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Fig 1: Successful/failed jobs, per job name</td></tr></tbody></table>I knew that some tests were failing often, but these results tell that in some cases almost 50% of the jobs fail! Insane! BTW we recently split some of long-running e2e test suites into smaller jobs, which is observable from the figure.<br />But now we can argue that maybe this is because of the bugs in the code. Let's see. In order to tell this we must analyze data basing on commit hashes: how many commits in particular jobs were executed multiple times and finished with different status. In other words: we look for the situations in which even without changes in the code the job status was varying.<br /><br />The numbers for our repository are:<br /><ul><li>number of (commit, job) pairs with at least one success: <b>23550</b></li><li>total number of failures for these pairs: <b>1484</b></li></ul><br />In other words, unstable environment was responsible for at least ~<b>6.30%</b> of observable failures. It might look like small number, but if you take into account that single job can last for 45 minutes, it becomes a lot of wasted time. Especially that failure notifications aren't always handled immediately. I also have a hunch that at some time people started to click "Retry" just to be sure the problem is not with the environment.<br /><br />My top 5 picks among all of these failures are below.<br /><br /><span style="font-family: "courier new" , "courier" , monospace;"><br /></span><span style="font-family: "courier new" , "courier" , monospace;">hash:job | #tot | success/fail | users clicking "Retry"</span><br /><span style="font-family: "courier new" , "courier" , monospace;">----------------------------------------------------------------</span><br /><span style="font-family: "courier new" , "courier" , monospace;">d7f43f9c:e2e-7 | 19 | ( 1/17) | user-6,user-7,user-5<br />2fcecb7c:e2e-7 | 16 | ( 8/ 8) | user-6,user-7<br />2c34596f:other-1 | 14 | ( 1/13) | user-8<br />525203c6:other-13 | 12 | ( 1/ 8) | user-13,user-11<br />3457fbc5:e2e-6 | 11 | ( 2/ 9) | user-14</span><br /><br />So, for instance - commit d7f43f9c was failing on job e2e-7 17 times and three distinct users tried to make it pass by clicking "Retry" button over and over. And finally they made it! Ridiculous, isn't it?<br /><br />And speaking of time I've also checked jobs that lasted for enormous number of time. Winners are:<br /><br /><span style="font-family: "courier new" , "courier" , monospace;">job:status | time (hours)</span><br /><span style="font-family: "courier new" , "courier" , monospace;">---------------------------------</span><br /><span style="font-family: "courier new" , "courier" , monospace;">other-2:failed | 167.30<br />other-8:canceled | 118.89<br />other-4:canceled | 27.19<br />e2e-7:success | 26.12<br />other-1:failed | 26.01</span><br /><br />Perhaps these are just outliers. Histograms would give better insight. But even if outliers, they're crazy outliers.<br /><br /><br />I have also attempted to detect reason of the failure but this is more complex problem to solve. It requires to parse logs and guess which line was the first one indicating error condition. Then the second guess - about whether the problem originated from environment or the code.<br />Maybe such a task could be somehow handled by (in)famous machine learning. Actually there are more items that could be achieved with ML support. Two most simple examples are:<br /><ul><li>giving estimation whether the job will fail</li><ul><li>also, providing reason of failure</li><li>if the failure originated from faulty environment, what exactly was it? </li></ul><li>estimated time for the pipeline to finish</li><li>auto-retry in case of env-related failure</li></ul><br /><b>Conclusions</b><br />Apparently I've been having much more unstable e2e test environment than I ever thought. Lesson learned is that if you get used to solve problem by retrying you loose sense in how big trouble you are.<br /><br />Similarly to any other engineering problem you first need to gather data and decide what to do next. Basing on numbers I have now I'm planning to implement some ideas to make life easier.<br /><br />While analyzing the data I had moments when I couldn't stop laughing to myself. But the reality is sad. It started with occasional failures and ended with continuous problem. And we weren't doing much about it. The problem was not that we were effed in the ass. The problem was that we started to arrange our place there. Insights will help us get out.<br /><br />Share your ideas in comments. If we bootstrap discussion I'll do my best to share the code I have in GitHub.bjkIf you're experienced in setting up automated testing with Selenium and Docker you'll perhaps agree with me that it's not the most stable thing in the world. Actually it's far far away from any stable island - right in the middle of "the sea of instability".When you think about failures in automated testing and how they develop when the system is growing it can resemble drugs. Seriously. When you start, occasional failures are ignored. You close your eyes and click "Retry". Innocent. But after some time it snowballs into a problem. And you find yourself with a blind fold put on but you can't remember buying it.This post is small story how in one of small projects we started with occasional failures and ended up with... well... you'll see. Read on ;).For past couple of months I was thinking that "others have worse setups and live", but today it all culminated, I have achieved fourth degree of density and decided to stop being quiet.DisclaimerIn the middle of this post you might start to think that our environment is simply broken. That's damn right. The cloud in which we're running is not very stable. Sometimes it behaves like it had a sulk. There are problems with proxies too. And finally we add Docker and Selenium to the mixture. I think testimonial from one of our engineers sums it all:if retry didn’t fix it for the 10th time, then there’s definitely something wrongAnd now something must be noted as well. The project I'm referring to is just a sideway one. It's an attempt to innovate some process, unsupported by the business whatsoever.The triggersI was pressing "Retry" button for another time on two of the e2e jobs and saw following.// job 1couldn't stat /proc/self/fd/18446744073709551615: stat /proc/self/fd/23: no such file or directory// job 2Service 'frontend' failed to build: readlink /proc/4304/exe: no such file or directoryWhat the hell is this? We have never seen this before and now apparently it became a commonplace in our CI pipeline (it was nth retry).So this big number after /fd/ is 64-bit value of -1. Perhaps something in Selenium uses some function that returns an error and then tries to call stat syscall, passing -1 as an argument. Function return value was not checked!The second error message is most probably related to docker. Something tries to find where is executable for some PID. Why?"Retry" solution did not work this time. Re-deploying e2e workers also didn't help. I thought that now is the time when we should get some insights into what is actually happening and how many failures were caused by unstable environment.Luckily we're running on GitLab, which provides reasonable API. Read on to see what I've found. I personally find it hilarious.Insight into failuresIt's extremely easy to make use of GitLab CI API (thanks GitLab guys!). I have extracted JSON objects for every job in every pipeline that was recorded in our project and started playing with the data.The first thing that I checked was how many failures there are per particular type of test. Names are anonymized a little because I'm unsure if this is sensitive data or not. Better safe than sorry!Fig 1: Successful/failed jobs, per job nameI knew that some tests were failing often, but these results tell that in some cases almost 50% of the jobs fail! Insane! BTW we recently split some of long-running e2e test suites into smaller jobs, which is observable from the figure.But now we can argue that maybe this is because of the bugs in the code. Let's see. In order to tell this we must analyze data basing on commit hashes: how many commits in particular jobs were executed multiple times and finished with different status. In other words: we look for the situations in which even without changes in the code the job status was varying.The numbers for our repository are:number of (commit, job) pairs with at least one success: 23550total number of failures for these pairs: 1484In other words, unstable environment was responsible for at least ~6.30% of observable failures. It might look like small number, but if you take into account that single job can last for 45 minutes, it becomes a lot of wasted time. Especially that failure notifications aren't always handled immediately. I also have a hunch that at some time people started to click "Retry" just to be sure the problem is not with the environment.My top 5 picks among all of these failures are below.hash:job | #tot | success/fail | users clicking "Retry"----------------------------------------------------------------d7f43f9c:e2e-7 | 19 | ( 1/17) | user-6,user-7,user-52fcecb7c:e2e-7 | 16 | ( 8/ 8) | user-6,user-72c34596f:other-1 | 14 | ( 1/13) | user-8525203c6:other-13 | 12 | ( 1/ 8) | user-13,user-113457fbc5:e2e-6 | 11 | ( 2/ 9) | user-14So, for instance - commit d7f43f9c was failing on job e2e-7 17 times and three distinct users tried to make it pass by clicking "Retry" button over and over. And finally they made it! Ridiculous, isn't it?And speaking of time I've also checked jobs that lasted for enormous number of time. Winners are:job:status | time (hours)---------------------------------other-2:failed | 167.30other-8:canceled | 118.89other-4:canceled | 27.19e2e-7:success | 26.12other-1:failed | 26.01Perhaps these are just outliers. Histograms would give better insight. But even if outliers, they're crazy outliers.I have also attempted to detect reason of the failure but this is more complex problem to solve. It requires to parse logs and guess which line was the first one indicating error condition. Then the second guess - about whether the problem originated from environment or the code.Maybe such a task could be somehow handled by (in)famous machine learning. Actually there are more items that could be achieved with ML support. Two most simple examples are:giving estimation whether the job will failalso, providing reason of failureif the failure originated from faulty environment, what exactly was it? estimated time for the pipeline to finishauto-retry in case of env-related failureConclusionsApparently I've been having much more unstable e2e test environment than I ever thought. Lesson learned is that if you get used to solve problem by retrying you loose sense in how big trouble you are.Similarly to any other engineering problem you first need to gather data and decide what to do next. Basing on numbers I have now I'm planning to implement some ideas to make life easier.While analyzing the data I had moments when I couldn't stop laughing to myself. But the reality is sad. It started with occasional failures and ended with continuous problem. And we weren't doing much about it. The problem was not that we were effed in the ass. The problem was that we started to arrange our place there. Insights will help us get out.Share your ideas in comments. If we bootstrap discussion I'll do my best to share the code I have in GitHub.C++: on the dollar sign2017-04-26T00:15:00+02:002017-04-26T00:15:00+02:00http://slawomir.net/2017/04/26/c-on-dollar-signIn most programming languages there are sane rules that specify what can be an identifier and what cannot. Most of the time it's even intuitive - it's just something that matches <span style="font-family: "Courier New",Courier,monospace;">[_a-zA-Z][a-zA-Z0-9]*</span>. There are languages that allow more (e.g. $ in PHP/JS, or <a href="http://www.originlab.com/doc/LabTalk/guide/String-registers" target="_blank">% in LabTalk</a>). How about C++? Answer to this question may be a little surprise.<br /><br />Almost a year ago we had this little argument with friend of mine whether dollar sign is allowed to be used within C++ identifiers. In other words it was about whether e.g. <span style="background-color: #eeeeee; font-family: "courier new" , "courier" , monospace;">int $this = 1;</span> is legal C++ or not.<br />Basically I was stating that's not possible. On the other hand, my friend was recalling some friend of his, which mentioned that dollars are fine.<br /><br />The first line of defense is of course nearest compiler. I decided to fire up one and simply check what happens if I compile following fragment of code.<br /><br /><pre id="vimCodeElement" style="background-color: seashell; font-size: 13px; white-space: pre-wrap;"><span class="LineNr" id="L1" style="background-color: lavenderblush; color: #4d4d4d; font-size: 1em; padding-bottom: 1px;">1 </span><span class="Type" style="color: seagreen; font-size: 1em; font-weight: bold;">auto</span> $foo() {<br /><span class="LineNr" id="L2" style="background-color: lavenderblush; color: #4d4d4d; font-size: 1em; padding-bottom: 1px;">2 </span> <span class="Type" style="color: seagreen; font-size: 1em; font-weight: bold;">int</span> $bar = <span class="Constant" style="color: deeppink; font-size: 1em;">1</span>;<br /><span class="LineNr" id="L3" style="background-color: lavenderblush; color: #4d4d4d; font-size: 1em; padding-bottom: 1px;">3 </span> <span class="Statement" style="color: brown; font-size: 1em; font-weight: bold;">return</span> $bar;<br /><span class="LineNr" id="L4" style="background-color: lavenderblush; color: #4d4d4d; font-size: 1em; padding-bottom: 1px;">4 </span>}</pre><br />At the time I had gcc-4.9.3 installed on my system (prehistoric version, I know ;-). For the record, the command was like this: <span style="background-color: #f3f3f3; font-family: "courier new" , "courier" , monospace;">g++ dollar.cpp -std=c++1y -c -Wall -Wextra -Werror</span>.<br /><br />And to my surprise... it compiled without single complaint. Moreover, clang and MSVC gulped this down without complaining as well. Well, Sławek - I said to myself - even if you're mastering something for years, there's still much to surprise you. BTW such a conclusion puts titles like following in much funnier light.<br /><br /><div style="text-align: center;"><a href="http://2.bp.blogspot.com/-Etlaj1acncA/VmoA1CV9RVI/AAAAAAAAATs/b0lkpzmhllk/s1600/41Cozm-LkhL._SX333_BO1%252C204%252C203%252C200_.jpg" imageanchor="1"><img border="0" height="200" src="https://2.bp.blogspot.com/-Etlaj1acncA/VmoA1CV9RVI/AAAAAAAAATs/b0lkpzmhllk/s200/41Cozm-LkhL._SX333_BO1%252C204%252C203%252C200_.jpg" width="134" /></a></div><br />It was normal office day and we had other work to get done, so I reluctantly accepted this just as another dark corner. After couple of hours I forgot about the situation and let it resurface... couple of weeks later.<br /><br />So, fast forward couple of weeks. I was preparing something related to C++ and I accidentally found a reference to the dollar sign in GCC documentation. It was nice feeling, because I knew I will fill this hole in my knowledge in a matter of minutes. So what was the reason compilers were happily accepting dollar signs? <br />Let me put here excerpt from GCC documentation, which speaks for itself :)<br /><blockquote class="tr_bq"><blockquote class="tr_bq"><i>GCC allows the ‘<samp>$</samp>’ character in identifiers as an extension for most targets. This is true regardless of the <samp>std=</samp> switch, since this extension cannot conflict with standards-conforming programs. When preprocessing assembler, however, dollars are not identifier characters by default.</i><br /><i>Currently the targets that by default do not permit ‘<samp>$</samp>’ are AVR, IP2K, MMIX, MIPS Irix 3, ARM aout, and PowerPC targets for the AIX operating system.</i><br /><i>You can override the default with <samp>-fdollars-in-identifiers</samp> or <samp>fno-dollars-in-identifiers</samp>. See <a href="https://gcc.gnu.org/onlinedocs/cpp/fdollars-in-identifiers.html#fdollars-in-identifiers">fdollars-in-identifiers</a>.</i></blockquote></blockquote><br />I think three most important things are:<br /><ol><li>This ain't work in macros.</li><li>It doesn't seem to be correlated with -std switch.</li><li>Some architectures do not permit it at all.</li></ol>What got me thinking it this list of architectures. And it took me couple of minutes to find out that e.g. assembler for ARM doesn't allow dollar sign. So any assembly code generated by GCC for ARM would not assemble if dollar sign was used. That's plausible explanation why GCC doesn't allow such a character for all architectures. It doesn't explain why compilers allow it for others, though.<br /><br />GCC could theoretically mitigate problem with particular architectures by replacing $ signs with some other character, but then bunch of other problems would appear: possible name conflicts, name mangling/demangling would yield incorrect values, and finally it wouldn't be possible to export such "changed" symbols from a library. In other words: disaster.<br /><br />What about the standard?<br /><br />After thinking about it for a minute I had strong need to see what exactly identifier does mean. So I opened <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3797.pdf" target="_blank">N3797</a> and quickly found section I was looking for, namely (surprise-surprise) <i>2.11 Identifiers</i>. So what does this section say?<br /><br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://4.bp.blogspot.com/-7zMTrtfoHtM/VmoDbUry6AI/AAAAAAAAAT0/bhUAD-ArE8c/s1600/identifiers.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="400" src="https://4.bp.blogspot.com/-7zMTrtfoHtM/VmoDbUry6AI/AAAAAAAAAT0/bhUAD-ArE8c/s400/identifiers.png" width="321" /></a></div><br /><br />Right after formal definition there is an explanation which refers to sections E.1 and E.2. But that's not important here. There is one more thing that appears in the formal definition and it's extremely easy to miss this one. It's "other implementation-defined characters". What does it mean? Yup - the compiler is allowed to allow any other character to be used within identifiers at will.<br /><br />P.s. surprisingly cppcheck 1.71 doesn't report $ sign in identifiers as a problem at all.bjkIn most programming languages there are sane rules that specify what can be an identifier and what cannot. Most of the time it's even intuitive - it's just something that matches [_a-zA-Z][a-zA-Z0-9]*. There are languages that allow more (e.g. $ in PHP/JS, or % in LabTalk). How about C++? Answer to this question may be a little surprise.Almost a year ago we had this little argument with friend of mine whether dollar sign is allowed to be used within C++ identifiers. In other words it was about whether e.g. int $this = 1; is legal C++ or not.Basically I was stating that's not possible. On the other hand, my friend was recalling some friend of his, which mentioned that dollars are fine.The first line of defense is of course nearest compiler. I decided to fire up one and simply check what happens if I compile following fragment of code.1 auto $foo() {2 int $bar = 1;3 return $bar;4 }At the time I had gcc-4.9.3 installed on my system (prehistoric version, I know ;-). For the record, the command was like this: g++ dollar.cpp -std=c++1y -c -Wall -Wextra -Werror.And to my surprise... it compiled without single complaint. Moreover, clang and MSVC gulped this down without complaining as well. Well, Sławek - I said to myself - even if you're mastering something for years, there's still much to surprise you. BTW such a conclusion puts titles like following in much funnier light.It was normal office day and we had other work to get done, so I reluctantly accepted this just as another dark corner. After couple of hours I forgot about the situation and let it resurface... couple of weeks later.So, fast forward couple of weeks. I was preparing something related to C++ and I accidentally found a reference to the dollar sign in GCC documentation. It was nice feeling, because I knew I will fill this hole in my knowledge in a matter of minutes. So what was the reason compilers were happily accepting dollar signs? Let me put here excerpt from GCC documentation, which speaks for itself :)GCC allows the ‘$’ character in identifiers as an extension for most targets. This is true regardless of the std= switch, since this extension cannot conflict with standards-conforming programs. When preprocessing assembler, however, dollars are not identifier characters by default.Currently the targets that by default do not permit ‘$’ are AVR, IP2K, MMIX, MIPS Irix 3, ARM aout, and PowerPC targets for the AIX operating system.You can override the default with -fdollars-in-identifiers or fno-dollars-in-identifiers. See fdollars-in-identifiers.I think three most important things are:This ain't work in macros.It doesn't seem to be correlated with -std switch.Some architectures do not permit it at all.What got me thinking it this list of architectures. And it took me couple of minutes to find out that e.g. assembler for ARM doesn't allow dollar sign. So any assembly code generated by GCC for ARM would not assemble if dollar sign was used. That's plausible explanation why GCC doesn't allow such a character for all architectures. It doesn't explain why compilers allow it for others, though.GCC could theoretically mitigate problem with particular architectures by replacing $ signs with some other character, but then bunch of other problems would appear: possible name conflicts, name mangling/demangling would yield incorrect values, and finally it wouldn't be possible to export such "changed" symbols from a library. In other words: disaster.What about the standard?After thinking about it for a minute I had strong need to see what exactly identifier does mean. So I opened N3797 and quickly found section I was looking for, namely (surprise-surprise) 2.11 Identifiers. So what does this section say?Right after formal definition there is an explanation which refers to sections E.1 and E.2. But that's not important here. There is one more thing that appears in the formal definition and it's extremely easy to miss this one. It's "other implementation-defined characters". What does it mean? Yup - the compiler is allowed to allow any other character to be used within identifiers at will.P.s. surprisingly cppcheck 1.71 doesn't report $ sign in identifiers as a problem at all.Getting all parent directories of a path2017-01-06T19:28:00+01:002017-01-06T19:28:00+01:00http://slawomir.net/2017/01/06/getting-all-parent-directories-of-file<span style="color: #e06666;">edit: reddit updates</span> <br /><br />Few minutes ago I needed to solve trivial problem of getting all parent directories of a path. It's very easy to do it imperatively, but it would simply not satisfy me. Hence, I challenged myself to do it declaratively in Python.<br /><br />The problem is simple, but let me put an example on the table, so it's even easier to imagine what are we talking about.<br /><br />Given some path, e.g.<br /><br /><span style="color: purple;"><span style="font-family: "courier new" , "courier" , monospace;">/home/szborows/code/the-best-project-in-the-world</span></span><br /><br />You want to have following list of parents:<br /><br /><span style="color: purple;"><span style="font-family: "courier new" , "courier" , monospace;">/home/szborows/code</span></span><br /><span style="color: purple;"><span style="font-family: "courier new" , "courier" , monospace;">/home/szborows</span></span><br /><span style="color: purple;"><span style="font-family: "courier new" , "courier" , monospace;">/home</span></span><br /><span style="color: purple;"><span style="font-family: "courier new" , "courier" , monospace;">/</span></span><br /><br />It's trivial to do this using <span style="font-family: "courier new" , "courier" , monospace;">split</span> and then some for loop. How to make it more declarative?<br />Thinking more mathematically (mathematicians will perhaps cry out to heaven for vengeance after reading on, but let me at least try...) we simply want to get all of the subsets from some ordered set S that form prefix w.r.t. S. So we can simply generate pairs of numbers <span style="color: purple;"><span style="font-family: "courier new" , "courier" , monospace;">(1, y)</span></span>, representing all prefixes where y belongs to <span style="color: purple;"><span style="font-family: "courier new" , "courier" , monospace;">[1, len S)</span></span><span style="font-family: inherit;">. We can actually ignore this constant 1 and just operate on numbers.</span><br /><span style="font-family: inherit;">In Python, to generate numbers starting from len(path) and going down we can simply utilize <span style="color: purple;"><span style="font-family: "courier new" , "courier" , monospace;">range()</span></span> and <span style="color: purple;"><span style="font-family: "courier new" , "courier" , monospace;">[::-1]</span></span> (this reverses collections, it's an idiom). Then <span style="color: purple;"><span style="font-family: "courier new" , "courier" , monospace;">join()</span></span> can be used on splited path, but with slicing from 1 to y. That's it. And now demonstration:</span><br /><br /><span style="color: purple;"><span style="font-family: "courier new" , "courier" , monospace;">>>> path = '/a/b/c/d' </span></span><br /><span style="color: purple;"><span style="font-family: "courier new" , "courier" , monospace;">>>> <b>['/' + '/'.join(path.split('/')[1:l]) for l in range(len(path.split('/')))[::-1] if l]</b></span></span><br /><span style="color: purple;"><span style="font-family: "courier new" , "courier" , monospace;">['/a/b/c', '/a/b', '/a', '/']</span></span><br /><br />But what about performance? Which one will be faster - imperative or declarative approach? Intuition suggests that imperative version will win, but let's check.<br /><br />On picture below you can see timeit (n=1000000) results for my machine (i5-6200U, Python 3.5.2+) for three paths:<br /><br /><pre style="background: #ffffff; color: black;">short_path <span style="color: #808030;">=</span> <span style="color: #0000e6;">'/lel'</span><br />regular_path <span style="color: #808030;">=</span> <span style="color: #0000e6;">'/jakie/jest/srednie/zagniezdzenie?'</span><br />long_path <span style="color: #808030;">=</span> <span style="color: #0000e6;">'/z/lekka/dlugasna/sciezka/co/by/pierdzielnik/mial/troche/roboty'</span></pre><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://3.bp.blogspot.com/-r4QOnEA3dzI/WHF7305erQI/AAAAAAAAAbA/rCCsLEj9r9cwgfC69p3jhLuVaDtZIkhfQCLcB/s1600/Rplots.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="400" src="https://3.bp.blogspot.com/-r4QOnEA3dzI/WHF7305erQI/AAAAAAAAAbA/rCCsLEj9r9cwgfC69p3jhLuVaDtZIkhfQCLcB/s400/Rplots.png" width="400" /></a></div>Implementations used:<br /><br /><pre style="background: #ffffff; color: black;"><span style="color: maroon; font-weight: bold;">def</span> imper1<span style="color: #808030;">(</span>path<span style="color: #808030;">)</span><span style="color: #808030;">:</span><br /> result <span style="color: #808030;">=</span> <span style="color: #808030;">[</span><span style="color: #808030;">]</span><br /> <span style="color: maroon; font-weight: bold;">for</span> i <span style="color: maroon; font-weight: bold;">in</span> <span style="color: #400000;">range</span><span style="color: #808030;">(</span><span style="color: #008c00;">1</span><span style="color: #808030;">,</span> <span style="color: #400000;">len</span><span style="color: #808030;">(</span>path<span style="color: #808030;">.</span>split<span style="color: #808030;">(</span><span style="color: #0000e6;">'/'</span><span style="color: #808030;">)</span><span style="color: #808030;">)</span><span style="color: #808030;">)</span><span style="color: #808030;">:</span><br /> y <span style="color: #808030;">=</span> <span style="color: #0000e6;">'/'</span><span style="color: #808030;">.</span>join<span style="color: #808030;">(</span>path<span style="color: #808030;">.</span>split<span style="color: #808030;">(</span><span style="color: #0000e6;">'/'</span><span style="color: #808030;">)</span><span style="color: #808030;">[</span><span style="color: #808030;">:</span>i<span style="color: #808030;">]</span><span style="color: #808030;">)</span> <span style="color: maroon; font-weight: bold;">or</span> <span style="color: #0000e6;">'/'</span><br /> result<span style="color: #808030;">.</span>append<span style="color: #808030;">(</span>y<span style="color: #808030;">)</span><br /> <span style="color: maroon; font-weight: bold;">return</span> result<br /><br /><span style="color: maroon; font-weight: bold;">def</span> imper2<span style="color: #808030;">(</span>path<span style="color: #808030;">)</span><span style="color: #808030;">:</span><br /> i <span style="color: #808030;">=</span> <span style="color: #400000;">len</span><span style="color: #808030;">(</span>path<span style="color: #808030;">)</span> <span style="color: #44aadd;">-</span> <span style="color: #008c00;">1</span><br /> l <span style="color: #808030;">=</span> <span style="color: #808030;">[</span><span style="color: #808030;">]</span><br /> <span style="color: maroon; font-weight: bold;">while</span> i <span style="color: #44aadd;">></span> <span style="color: #008c00;">0</span><span style="color: #808030;">:</span><br /> <span style="color: maroon; font-weight: bold;">while</span> i <span style="color: #44aadd;">!=</span> <span style="color: #008c00;">0</span> <span style="color: maroon; font-weight: bold;">and</span> path<span style="color: #808030;">[</span>i<span style="color: #808030;">]</span> <span style="color: #44aadd;">!=</span> <span style="color: #0000e6;">'/'</span><span style="color: #808030;">:</span><br /> i <span style="color: #44aadd;">-</span><span style="color: #808030;">=</span> <span style="color: #008c00;">1</span><br /> l<span style="color: #808030;">.</span>append<span style="color: #808030;">(</span>path<span style="color: #808030;">[</span><span style="color: #808030;">:</span>i<span style="color: #808030;">]</span> <span style="color: maroon; font-weight: bold;">or</span> <span style="color: #0000e6;">'/'</span><span style="color: #808030;">)</span><br /> i <span style="color: #44aadd;">-</span><span style="color: #808030;">=</span> <span style="color: #008c00;">1</span><br /> <span style="color: maroon; font-weight: bold;">return</span> l<br /><br /><span style="color: maroon; font-weight: bold;">def</span> decl1<span style="color: #808030;">(</span>path<span style="color: #808030;">)</span><span style="color: #808030;">:</span><br /> <span style="color: maroon; font-weight: bold;">return</span> <span style="color: #808030;">[</span><span style="color: #0000e6;">'/'</span> <span style="color: #44aadd;">+</span> <span style="color: #0000e6;">'/'</span><span style="color: #808030;">.</span>join<span style="color: #808030;">(</span>path<span style="color: #808030;">.</span>split<span style="color: #808030;">(</span><span style="color: #0000e6;">'/'</span><span style="color: #808030;">)</span><span style="color: #808030;">[</span><span style="color: #008c00;">1</span><span style="color: #808030;">:</span>l<span style="color: #808030;">]</span><span style="color: #808030;">)</span></pre><pre style="background: #ffffff; color: black;"><span style="color: maroon; font-weight: bold;"> for</span> l <span style="color: maroon; font-weight: bold;">in</span> <span style="color: #400000;">range</span><span style="color: #808030;">(</span><span style="color: #400000;">len</span><span style="color: #808030;">(</span>path<span style="color: #808030;">.</span>split<span style="color: #808030;">(</span><span style="color: #0000e6;">'/'</span><span style="color: #808030;">)</span><span style="color: #808030;">)</span><span style="color: #808030;">)</span><span style="color: #808030;">[</span><span style="color: #808030;">:</span><span style="color: #808030;">:</span><span style="color: #44aadd;">-</span><span style="color: #008c00;">1</span><span style="color: #808030;">]</span> <span style="color: maroon; font-weight: bold;">if</span> l<span style="color: #808030;">]</span><br /><br /><span style="color: maroon; font-weight: bold;">def</span> decl2<span style="color: #808030;">(</span>path<span style="color: #808030;">)</span><span style="color: #808030;">:</span><br /> <span style="color: maroon; font-weight: bold;">return</span> <span style="color: #808030;">[</span><span style="color: #0000e6;">'/'</span> <span style="color: #44aadd;">+</span> <span style="color: #0000e6;">'/'</span><span style="color: #808030;">.</span>join<span style="color: #808030;">(</span>path<span style="color: #808030;">.</span>split<span style="color: #808030;">(</span><span style="color: #0000e6;">'/'</span><span style="color: #808030;">)</span><span style="color: #808030;">[</span><span style="color: #008c00;">1</span><span style="color: #808030;">:</span><span style="color: #44aadd;">-</span>l<span style="color: #808030;">]</span><span style="color: #808030;">)</span></pre><pre style="background: #ffffff; color: black;"><span style="color: maroon; font-weight: bold;"> for</span> l <span style="color: maroon; font-weight: bold;">in</span> <span style="color: #400000;">range</span><span style="color: #808030;">(</span><span style="color: #44aadd;">-</span><span style="color: #400000;">len</span><span style="color: #808030;">(</span>path<span style="color: #808030;">.</span>split<span style="color: #808030;">(</span><span style="color: #0000e6;">'/'</span><span style="color: #808030;">)</span><span style="color: #808030;">)</span><span style="color: #44aadd;">+</span><span style="color: #008c00;">1</span><span style="color: #808030;">,</span> <span style="color: #008c00;">1</span><span style="color: #808030;">)</span> <span style="color: maroon; font-weight: bold;">if</span> l<span style="color: #808030;">]</span> </pre><pre style="background: #ffffff; color: black;"> </pre><pre style="background: #ffffff; color: black;"># decl3 hidden. read on ;-)</pre><br /><br />It started with imper1 and decl1. I noticed that imperative version is faster. I tried to speed up declarative function by replacing [::-1] with some numbers tricks. It helped, but not to the extend I anticipated. Then, I though about speeding up imper1 by using lower-level constructs. Unsurprisingly while loops and checks were faster. Let me temporarily ignore decl3 for now and play a little with CPython bytecode.<br /><br />By looking at my results not everything is so obvious. decl{1,2} turned out to have decent performance with 4-part path, which looks like reasonable average.<br /><br />I disassembled decl1 and decl2 to see the difference in byte code. The diff is shown below.<br /><span style="font-size: xx-small;"><br /></span><span style="color: purple;"><span style="font-size: xx-small;"><span style="font-family: "courier new" , "courier" , monospace;">30 CALL_FUNCTION 1 (1 positional, 0 keyword pair) | 30 CALL_FUNCTION 1 (1 positional, 0 keyword pair)<br />33 CALL_FUNCTION 1 (1 positional, 0 keyword pair) | 33 CALL_FUNCTION 1 (1 positional, 0 keyword pair)<br />36 CALL_FUNCTION 1 (1 positional, 0 keyword pair) | 36 UNARY_NEGATIVE<br />39 LOAD_CONST 0 (None) | 37 LOAD_CONST 4 (1) <br />42 LOAD_CONST 0 (None) | 40 BINARY_ADD<br />45 LOAD_CONST 5 (-1) | 41 LOAD_CONST 4 (1) <br />48 BUILD_SLICE 3 | 44 CALL_FUNCTION 2 (2 positional, 0 keyword pair)<br />51 BINARY_SUBSCR</span></span></span><br /><br /><br />As we can see [::-1] is implemented as three loads and build slice operations. I think this could be optimized if we had special opcode like e.g. BUILD_REV_SLICE. My little-optimized decl2 is faster because one UNARY_NEGATIVE and one BINARY_ADD is less than LOAD_CONST, BUILD_SLICE and BINARY_SUBSCR. Performance gain here is pretty obvious. No matter what decl2 must be faster.<br /><br />What about decl2 vs imper1?<br />It's more complicated and it was a little surprise that such a longer bytecode can be slower than shorter counterpart.<br /><br /><span style="color: purple;"><span style="font-size: xx-small;"><span style="font-family: "courier new" , "courier" , monospace;"> 3 0 BUILD_LIST 0 <br /> 3 STORE_FAST 1 (result)<br /> <br /> 4 6 SETUP_LOOP 91 (to 100) <br /> 9 LOAD_GLOBAL 0 (range)<br /> 12 LOAD_CONST 1 (1)<br /> 15 LOAD_GLOBAL 1 (len)<br /> 18 LOAD_FAST 0 (path)<br /> 21 LOAD_ATTR 2 (split) <br /> 24 LOAD_CONST 2 ('/')<br /> 27 CALL_FUNCTION 1 (1 positional, 0 keyword pair)<br /> 30 CALL_FUNCTION 1 (1 positional, 0 keyword pair)<br /> 33 CALL_FUNCTION 2 (2 positional, 0 keyword pair)<br /> 36 GET_ITER <br /> >> 37 FOR_ITER 59 (to 99)<br /> 40 STORE_FAST 2 (i)<br /><br /> 5 43 LOAD_CONST 2 ('/')<br /> 46 LOAD_ATTR 3 (join)<br /> 49 LOAD_FAST 0 (path)<br /> 52 LOAD_ATTR 2 (split)<br /> 55 LOAD_CONST 2 ('/')<br /> 58 CALL_FUNCTION 1 (1 positional, 0 keyword pair)<br /> 61 LOAD_CONST 0 (None)<br /> 64 LOAD_FAST 2 (i)<br /> 67 BUILD_SLICE 2<br /> 70 BINARY_SUBSCR<br /> 71 CALL_FUNCTION 1 (1 positional, 0 keyword pair)<br /> 74 JUMP_IF_TRUE_OR_POP 80<br /> 77 LOAD_CONST 2 ('/')<br /> >> 80 STORE_FAST 3 (y)<br /><br /> 6 83 LOAD_FAST 1 (result)<br /> 86 LOAD_ATTR 4 (append)<br /> 89 LOAD_FAST 3 (y)<br /> 92 CALL_FUNCTION 1 (1 positional, 0 keyword pair)<br /> 95 POP_TOP<br /> 96 JUMP_ABSOLUTE 37<br /> >> 99 POP_BLOCK<br /><br /> 7 >> 100 LOAD_FAST 1 (result)<br /> 103 RETURN_VALUE</span></span></span><br /><br />The culprit was LOAD_CONST in decl{1,2} that was loading list-comprehension as a code object. Let's see how it looks, just for the record.<br /><span style="color: purple;"><span style="font-size: xx-small;"><span style="font-family: "courier new" , "courier" , monospace;"><br /></span></span></span><span style="color: purple;"><span style="font-size: xx-small;"><span style="font-family: "courier new" , "courier" , monospace;">>>> dis.dis(decl2.__code__.co_consts[1])<br /> 21 0 BUILD_LIST 0<br /> 3 LOAD_FAST 0 (.0)<br /> >> 6 FOR_ITER 51 (to 60)<br /> 9 STORE_FAST 1 (l)<br /> 12 LOAD_FAST 1 (l)<br /> 15 POP_JUMP_IF_FALSE 6<br /> 18 LOAD_CONST 0 ('/')<br /> 21 LOAD_CONST 0 ('/')<br /> 24 LOAD_ATTR 0 (join)<br /> 27 LOAD_DEREF 0 (path)<br /> 30 LOAD_ATTR 1 (split)<br /> 33 LOAD_CONST 0 ('/')<br /> 36 CALL_FUNCTION 1 (1 positional, 0 keyword pair)<br /> 39 LOAD_CONST 1 (1)<br /> 42 LOAD_FAST 1 (l)<br /> 45 UNARY_NEGATIVE<br /> 46 BUILD_SLICE 2<br /> 49 BINARY_SUBSCR<br /> 50 CALL_FUNCTION 1 (1 positional, 0 keyword pair)<br /> 53 BINARY_ADD<br /> 54 LIST_APPEND 2<br /> 57 JUMP_ABSOLUTE 6<br /> >> 60 RETURN_VALUE</span></span></span><br /><br />So this is how list comprehensions look like when converted to byte code. Nice! Now performance results make more sense. In the project I was working on my function for getting all parent paths was called in one place and perhaps contributed to less than 5% of execution time of whole application. It would not make sense to optimize this piece of code. But it was delightful journey into internals of CPython, wasn't it?<br /><br />Now, let's get back to decl3. What have I done to make my declarative implementation 2x faster on average case and for right-part outliers? Well... I just reluctantly resigned from putting everything in one line and saved path.split('/') into separate variable. That's it.<br /><br />So what are learnings?<br /><ul><li>declarative method turned out to be faster than hand-crafter imperative one employing low-level constructs.<br />Why? Good question! Maybe because bytecode generator knows how to produce optimized code when it encounters list comprehension? But I have written no CPython code, so it's only my speculation.</li><li>trying to put everything in one line can hurt - in described case split() function was major performance dragger</li></ul>reddit-related updates:<br />Dunj3 outpaced me ;) - his implementation, which is better both w.r.t. "declarativeness" and performance: <br /><pre><code>list(itertools.accumulate(path.split('/'), curry(os.sep.join)))<br /></code></pre><ul></ul><br /><span style="color: #999999;"><span style="font-size: xx-small;">syntax highlighting done with https://tohtml.com/python/ </span></span><br /><ul></ul>bjkedit: reddit updates Few minutes ago I needed to solve trivial problem of getting all parent directories of a path. It's very easy to do it imperatively, but it would simply not satisfy me. Hence, I challenged myself to do it declaratively in Python.The problem is simple, but let me put an example on the table, so it's even easier to imagine what are we talking about.Given some path, e.g./home/szborows/code/the-best-project-in-the-worldYou want to have following list of parents:/home/szborows/code/home/szborows/home/It's trivial to do this using split and then some for loop. How to make it more declarative?Thinking more mathematically (mathematicians will perhaps cry out to heaven for vengeance after reading on, but let me at least try...) we simply want to get all of the subsets from some ordered set S that form prefix w.r.t. S. So we can simply generate pairs of numbers (1, y), representing all prefixes where y belongs to [1, len S). We can actually ignore this constant 1 and just operate on numbers.In Python, to generate numbers starting from len(path) and going down we can simply utilize range() and [::-1] (this reverses collections, it's an idiom). Then join() can be used on splited path, but with slicing from 1 to y. That's it. And now demonstration:>>> path = '/a/b/c/d' >>> ['/' + '/'.join(path.split('/')[1:l]) for l in range(len(path.split('/')))[::-1] if l]['/a/b/c', '/a/b', '/a', '/']But what about performance? Which one will be faster - imperative or declarative approach? Intuition suggests that imperative version will win, but let's check.On picture below you can see timeit (n=1000000) results for my machine (i5-6200U, Python 3.5.2+) for three paths:short_path = '/lel'regular_path = '/jakie/jest/srednie/zagniezdzenie?'long_path = '/z/lekka/dlugasna/sciezka/co/by/pierdzielnik/mial/troche/roboty'Implementations used:def imper1(path): result = [] for i in range(1, len(path.split('/'))): y = '/'.join(path.split('/')[:i]) or '/' result.append(y) return resultdef imper2(path): i = len(path) - 1 l = [] while i > 0: while i != 0 and path[i] != '/': i -= 1 l.append(path[:i] or '/') i -= 1 return ldef decl1(path): return ['/' + '/'.join(path.split('/')[1:l]) for l in range(len(path.split('/')))[::-1] if l]def decl2(path): return ['/' + '/'.join(path.split('/')[1:-l]) for l in range(-len(path.split('/'))+1, 1) if l] # decl3 hidden. read on ;-)It started with imper1 and decl1. I noticed that imperative version is faster. I tried to speed up declarative function by replacing [::-1] with some numbers tricks. It helped, but not to the extend I anticipated. Then, I though about speeding up imper1 by using lower-level constructs. Unsurprisingly while loops and checks were faster. Let me temporarily ignore decl3 for now and play a little with CPython bytecode.By looking at my results not everything is so obvious. decl{1,2} turned out to have decent performance with 4-part path, which looks like reasonable average.I disassembled decl1 and decl2 to see the difference in byte code. The diff is shown below.30 CALL_FUNCTION 1 (1 positional, 0 keyword pair) | 30 CALL_FUNCTION 1 (1 positional, 0 keyword pair)33 CALL_FUNCTION 1 (1 positional, 0 keyword pair) | 33 CALL_FUNCTION 1 (1 positional, 0 keyword pair)36 CALL_FUNCTION 1 (1 positional, 0 keyword pair) | 36 UNARY_NEGATIVE39 LOAD_CONST 0 (None) | 37 LOAD_CONST 4 (1) 42 LOAD_CONST 0 (None) | 40 BINARY_ADD45 LOAD_CONST 5 (-1) | 41 LOAD_CONST 4 (1) 48 BUILD_SLICE 3 | 44 CALL_FUNCTION 2 (2 positional, 0 keyword pair)51 BINARY_SUBSCRAs we can see [::-1] is implemented as three loads and build slice operations. I think this could be optimized if we had special opcode like e.g. BUILD_REV_SLICE. My little-optimized decl2 is faster because one UNARY_NEGATIVE and one BINARY_ADD is less than LOAD_CONST, BUILD_SLICE and BINARY_SUBSCR. Performance gain here is pretty obvious. No matter what decl2 must be faster.What about decl2 vs imper1?It's more complicated and it was a little surprise that such a longer bytecode can be slower than shorter counterpart. 3 0 BUILD_LIST 0 3 STORE_FAST 1 (result) 4 6 SETUP_LOOP 91 (to 100) 9 LOAD_GLOBAL 0 (range) 12 LOAD_CONST 1 (1) 15 LOAD_GLOBAL 1 (len) 18 LOAD_FAST 0 (path) 21 LOAD_ATTR 2 (split) 24 LOAD_CONST 2 ('/') 27 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 30 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 33 CALL_FUNCTION 2 (2 positional, 0 keyword pair) 36 GET_ITER >> 37 FOR_ITER 59 (to 99) 40 STORE_FAST 2 (i) 5 43 LOAD_CONST 2 ('/') 46 LOAD_ATTR 3 (join) 49 LOAD_FAST 0 (path) 52 LOAD_ATTR 2 (split) 55 LOAD_CONST 2 ('/') 58 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 61 LOAD_CONST 0 (None) 64 LOAD_FAST 2 (i) 67 BUILD_SLICE 2 70 BINARY_SUBSCR 71 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 74 JUMP_IF_TRUE_OR_POP 80 77 LOAD_CONST 2 ('/') >> 80 STORE_FAST 3 (y) 6 83 LOAD_FAST 1 (result) 86 LOAD_ATTR 4 (append) 89 LOAD_FAST 3 (y) 92 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 95 POP_TOP 96 JUMP_ABSOLUTE 37 >> 99 POP_BLOCK 7 >> 100 LOAD_FAST 1 (result) 103 RETURN_VALUEThe culprit was LOAD_CONST in decl{1,2} that was loading list-comprehension as a code object. Let's see how it looks, just for the record.>>> dis.dis(decl2.__code__.co_consts[1]) 21 0 BUILD_LIST 0 3 LOAD_FAST 0 (.0) >> 6 FOR_ITER 51 (to 60) 9 STORE_FAST 1 (l) 12 LOAD_FAST 1 (l) 15 POP_JUMP_IF_FALSE 6 18 LOAD_CONST 0 ('/') 21 LOAD_CONST 0 ('/') 24 LOAD_ATTR 0 (join) 27 LOAD_DEREF 0 (path) 30 LOAD_ATTR 1 (split) 33 LOAD_CONST 0 ('/') 36 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 39 LOAD_CONST 1 (1) 42 LOAD_FAST 1 (l) 45 UNARY_NEGATIVE 46 BUILD_SLICE 2 49 BINARY_SUBSCR 50 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 53 BINARY_ADD 54 LIST_APPEND 2 57 JUMP_ABSOLUTE 6 >> 60 RETURN_VALUESo this is how list comprehensions look like when converted to byte code. Nice! Now performance results make more sense. In the project I was working on my function for getting all parent paths was called in one place and perhaps contributed to less than 5% of execution time of whole application. It would not make sense to optimize this piece of code. But it was delightful journey into internals of CPython, wasn't it?Now, let's get back to decl3. What have I done to make my declarative implementation 2x faster on average case and for right-part outliers? Well... I just reluctantly resigned from putting everything in one line and saved path.split('/') into separate variable. That's it.So what are learnings?declarative method turned out to be faster than hand-crafter imperative one employing low-level constructs.Why? Good question! Maybe because bytecode generator knows how to produce optimized code when it encounters list comprehension? But I have written no CPython code, so it's only my speculation.trying to put everything in one line can hurt - in described case split() function was major performance draggerreddit-related updates:Dunj3 outpaced me ;) - his implementation, which is better both w.r.t. "declarativeness" and performance: list(itertools.accumulate(path.split('/'), curry(os.sep.join)))syntax highlighting done with https://tohtml.com/python/Logstash + filebeat: Invalid Frame Type, received: 12017-01-03T09:02:00+01:002017-01-03T09:02:00+01:00http://slawomir.net/2017/01/03/logstash-filebeat-invalid-frame-typePost for googlers that stumble on the same issue - it seems that "overconfiguration" is not a great idea for Filebeat and Logstash.<br /><br />I've decided to explicitly set ssl.verification_mode to none in my Filebeat config and then I got following Filebeat and Logstash errors:<br /><br /><span style="font-family: "Courier New",Courier,monospace;">filebeat_1 | 2017/01/03 07:43:49.136717 single.go:140: ERR Connecting error publishing events (retrying): EOF<br />filebeat_1 | 2017/01/03 07:43:50.152824 single.go:140: ERR Connecting error publishing events (retrying): EOF<br />filebeat_1 | 2017/01/03 07:43:52.157279 single.go:140: ERR Connecting error publishing events (retrying): EOF<br />filebeat_1 | 2017/01/03 07:43:56.173144 single.go:140: ERR Connecting error publishing events (retrying): EOF <br />filebeat_1 | 2017/01/03 07:44:04.189167 single.go:140: ERR Connecting error publishing events (retrying): EOF</span><br /><br /><br /><span style="font-family: "Courier New",Courier,monospace;">logstash_1 | 07:42:35.714 [Api Webserver] INFO logstash.agent - Successfully started Logstash API endpoint {:port=>9600} <br />logstash_1 | 07:43:49.135 [nioEventLoopGroup-4-1] ERROR org.logstash.beats.BeatsHandler - Exception: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 3 <br />logstash_1 | 07:43:49.139 [nioEventLoopGroup-4-1] ERROR org.logstash.beats.BeatsHandler - Exception: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 1 <br />logstash_1 | 07:43:50.150 [nioEventLoopGroup-4-2] ERROR org.logstash.beats.BeatsHandler - Exception: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 3 <br />logstash_1 | 07:43:50.154 [nioEventLoopGroup-4-2] ERROR org.logstash.beats.BeatsHandler - Exception: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 1 <br />logstash_1 | 07:43:52.156 [nioEventLoopGroup-4-3] ERROR org.logstash.beats.BeatsHandler - Exception: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 3 <br />logstash_1 | 07:43:52.157 [nioEventLoopGroup-4-3] ERROR org.logstash.beats.BeatsHandler - Exception: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 1 <br />logstash_1 | 07:43:56.170 [nioEventLoopGroup-4-4] ERROR org.logstash.beats.BeatsHandler - Exception: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 3 <br />logstash_1 | 07:43:56.175 [nioEventLoopGroup-4-4] ERROR org.logstash.beats.BeatsHandler - Exception: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 1</span><br /><br />It seems it's better to stay quiet with Filebeat :) Hopefully this helped to resolve your issue.bjkPost for googlers that stumble on the same issue - it seems that "overconfiguration" is not a great idea for Filebeat and Logstash.I've decided to explicitly set ssl.verification_mode to none in my Filebeat config and then I got following Filebeat and Logstash errors:filebeat_1 | 2017/01/03 07:43:49.136717 single.go:140: ERR Connecting error publishing events (retrying): EOFfilebeat_1 | 2017/01/03 07:43:50.152824 single.go:140: ERR Connecting error publishing events (retrying): EOFfilebeat_1 | 2017/01/03 07:43:52.157279 single.go:140: ERR Connecting error publishing events (retrying): EOFfilebeat_1 | 2017/01/03 07:43:56.173144 single.go:140: ERR Connecting error publishing events (retrying): EOF filebeat_1 | 2017/01/03 07:44:04.189167 single.go:140: ERR Connecting error publishing events (retrying): EOFlogstash_1 | 07:42:35.714 [Api Webserver] INFO logstash.agent - Successfully started Logstash API endpoint {:port=>9600} logstash_1 | 07:43:49.135 [nioEventLoopGroup-4-1] ERROR org.logstash.beats.BeatsHandler - Exception: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 3 logstash_1 | 07:43:49.139 [nioEventLoopGroup-4-1] ERROR org.logstash.beats.BeatsHandler - Exception: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 1 logstash_1 | 07:43:50.150 [nioEventLoopGroup-4-2] ERROR org.logstash.beats.BeatsHandler - Exception: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 3 logstash_1 | 07:43:50.154 [nioEventLoopGroup-4-2] ERROR org.logstash.beats.BeatsHandler - Exception: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 1 logstash_1 | 07:43:52.156 [nioEventLoopGroup-4-3] ERROR org.logstash.beats.BeatsHandler - Exception: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 3 logstash_1 | 07:43:52.157 [nioEventLoopGroup-4-3] ERROR org.logstash.beats.BeatsHandler - Exception: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 1 logstash_1 | 07:43:56.170 [nioEventLoopGroup-4-4] ERROR org.logstash.beats.BeatsHandler - Exception: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 3 logstash_1 | 07:43:56.175 [nioEventLoopGroup-4-4] ERROR org.logstash.beats.BeatsHandler - Exception: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 1It seems it's better to stay quiet with Filebeat :) Hopefully this helped to resolve your issue.std::queue’s big default footprint in assembly code2016-12-13T23:07:00+01:002016-12-13T23:07:00+01:00http://slawomir.net/2016/12/13/stdqueues-big-default-footprint-inRecently I've been quite busy and now I'm kind of scrounging back into C++ world. Friend of mine told me about <a href="http://www.includeos.org/" target="_blank">IncludeOS</a> project and I thought that it may be pretty good exercise to put my hands on my keyboard and help in this wonderful project.<br /><br />To be honest, the learning curve is quite steep (or I'm getting too old to learn so fast) and I'm still distracted by a lot of other things, so no big deliverables so far... but by just watching discussion on <a href="https://gitter.im/hioa-cs/IncludeOS" target="_blank">Gitter</a> and integrating it with what I know I spotted probably obvious, but a little bit surprising thing about <span style="font-family: "courier new" , "courier" , monospace;">std::queue</span>.<br /><br />std::queue is not a container. Wait, what?, you ask. It's a container adapter. It doesn't have implementation. Instead, it takes other implementation, uses it as underlying container and just provides some convenient interface for end-user. By the way it isn't the only one. There are others like <span style="font-family: "courier new" , "courier" , monospace;">std::stack</span> and <span style="font-family: "courier new" , "courier" , monospace;">std::priority_queue</span> to name a few.<br /><br />One of the dimension in which C++ shines are options for customizing stuff. We can customize things like memory allocators. In container adapters we can customize this underlying container if we decide that the one chosen by library writers isn't good match for us.<br /><br />By default, perhaps because std::queue requires fast access at the beginning and end, it's underlying container is <span style="font-family: "courier new" , "courier" , monospace;">std::deque</span>. <span style="font-family: "courier new" , "courier" , monospace;">std::deque</span> provides O(1) complexity for pushing/popping at both ends. Perfect match, isn't it?<br /><br />Well, yes if you care about performance at the cost of increased binary size. As it turns out by simply changing <span style="font-family: "courier new" , "courier" , monospace;">std::deque</span> to <span style="font-family: "courier new" , "courier" , monospace;">std::vector</span>:<br /><br /><b><span style="font-family: "courier new" , "courier" , monospace;">std::queue<<span style="color: #38761d;">int</span>> qd; </span></b><br /><b><span style="font-family: "courier new" , "courier" , monospace;">std::queue<<span style="color: #38761d;">int</span>, std::vector<<span style="color: #38761d;">int</span>>> qv;</span></b><br /><br />Generated assembly code for x86-64 clang 3.8 (-O3 -std=c++14) is 502 and 144 respectively.<br /><br />I know that in most context binary size is secondary consideration, but still I believe it's an interesting fact that the difference is so big. In other words there must be a lot of things going on under the bonnet of <span style="font-family: "courier new" , "courier" , monospace;">std::deque</span>. I don't recommend changing deque to vector in production - it can seriously damage your performance.<br /><br />You can play around with the code here: <a href="https://godbolt.org/g/XaLhS7">https://godbolt.org/g/XaLhS7</a> (code based on <a href="https://github.com/Voultapher" target="_blank">Voultapher</a> example).bjkRecently I've been quite busy and now I'm kind of scrounging back into C++ world. Friend of mine told me about IncludeOS project and I thought that it may be pretty good exercise to put my hands on my keyboard and help in this wonderful project.To be honest, the learning curve is quite steep (or I'm getting too old to learn so fast) and I'm still distracted by a lot of other things, so no big deliverables so far... but by just watching discussion on Gitter and integrating it with what I know I spotted probably obvious, but a little bit surprising thing about std::queue.std::queue is not a container. Wait, what?, you ask. It's a container adapter. It doesn't have implementation. Instead, it takes other implementation, uses it as underlying container and just provides some convenient interface for end-user. By the way it isn't the only one. There are others like std::stack and std::priority_queue to name a few.One of the dimension in which C++ shines are options for customizing stuff. We can customize things like memory allocators. In container adapters we can customize this underlying container if we decide that the one chosen by library writers isn't good match for us.By default, perhaps because std::queue requires fast access at the beginning and end, it's underlying container is std::deque. std::deque provides O(1) complexity for pushing/popping at both ends. Perfect match, isn't it?Well, yes if you care about performance at the cost of increased binary size. As it turns out by simply changing std::deque to std::vector:std::queue<int> qd; std::queue<int, std::vector<int>> qv;Generated assembly code for x86-64 clang 3.8 (-O3 -std=c++14) is 502 and 144 respectively.I know that in most context binary size is secondary consideration, but still I believe it's an interesting fact that the difference is so big. In other words there must be a lot of things going on under the bonnet of std::deque. I don't recommend changing deque to vector in production - it can seriously damage your performance.You can play around with the code here: https://godbolt.org/g/XaLhS7 (code based on Voultapher example).