• Preconfigured Jenkins cluster in Docker Swarm (proxy, accounts, plugins)

    In recent years lot of popular technologies were adjusted so they can run in Docker containers. Our industry even coined new verb - dockerization. When something is dockerized we usually expect it to behave like self-contained app that is controlled with either command line switches or environment variables. We also assume that apart of this kind of customization the dockerized thing is zero-conf - it will start right away with no further magic spells.

    It's just awesome when things work that way. Unfortunately there are exceptions and Jenkins is one of them. The problem with Jenkins is that even when you start it from within a container, you still need to:
    • open configuration wizard (it's a web page)
    • prove that you're the guy: pass it's challenge by reading some magic file and pasting its content into configuration wizard
    • configure proxy, if you're behind one
    • select plugins to be installed during initialization
    • setup admin account
    Pretty bad. It resembles installation wizard like in Windows. Phew. Couple of weeks ago I was trying to check out how well Jenkins would solve one of our data transformation (ETL) problem and was unsure how many times it will be deployed. Hence I needed to do something about this installation process so it sucks less. All of the building blocks were already on the table: Terraform, Ansible and Docker Swarm. The missing part was pre-configured dockerized Jenkins running in the Swarm.

    So this post, in DuckDuckGo-friendly list, explains how to:
    • pre-configure Jenkins with custom user (admin) account
    • pre-configure Jenkins with a proxy
    • pre-configure Jenkins with specified plugins
    • run Jenkins master and slaves entirely in Docker Swarm with Jenkins' own Swarm plugin for automatic master-slave connection establishment
    • allow Jenkins jobs to execute other Docker containers nearby (the daemon's sock trick)


    Abandon all hope, ye who enter here.

    I remember that in one of C projects (not sure what was it, but perhaps something from GNU, maybe RMS update: it was xterm) there was this comment "abandon all hope, ye who enter here". It also mentioned how many people have ignored this warning and tried to refactor something. I have the same reflections w.r.t. configuring Jenkins without custom Groovy scripts. I was reluctant to learn new language, but eventually this seemed like the most reasonable way to continue.

    Of course, all of following problems can be solved in a troglodyte way too. E.g. you can configure by hand, extract Jenkins home directory, targz it and re-use. But that brings couple of other problems. Also, surprisingly fresh Jenkins home weighted about 70MBs in my case. I always thought that it's just bunch of XML files, but perhaps it's not that straightforward. Since primitive solutions didn't work right away, I decided to stop for a while and try to solve the problem "the right way".

    System overview & requirements.

    System is simple: there's one master (and it's an brilliant example of a SPOF, but nobody cares, since you're unsure of future) and number of workers (slaves). We want workers to register to the master automatically. Unfortunately this is not possible using plain JNLP solution, because you need to register the worker in master prior to establishing a link. In theory you could do some curl magic, but fortunately there's a plugin that does it for you - Jenkins Swarm (not to be confused with Docker Swarm, as it has literally nothing to do with it). Jenkins Swarm plugin consists of two things: a plugin for master Jenkins and Java JAR for slaves.
    So we're set up. Jenkins Swarm will take care of auto-connecting slaves. Now, we must run dockerized version of these slaves and put it to Docker Swarm. But before we talk about slaves, let's handle the master.

    Jenkins master with plugins, proxy, and extra configuration.

    Let me paste Dockerfile and explain it line by line.

    FROM jenkins/jenkins:2.89.1-alpine

    ARG proxy
    ENV http_proxy=$proxy https_proxy=$proxy

    USER root
    RUN apk update && apk add python3
    COPY requirements.txt /tmp/requirements.txt
    RUN pip3 install -r /tmp/requirements.txt

    USER jenkins

    COPY plugins.txt /plugins.txt
    RUN /usr/local/bin/install-plugins.sh swarm:3.6 workflow-aggregator:2.5

    ENV JAVA_OPTS="-Djenkins.install.runSetupWizard=false"
    COPY security.groovy /usr/share/jenkins/ref/init.groovy.d/security.groovy
    COPY proxy.groovy /usr/share/jenkins/ref/init.groovy.d/proxy.groovy
    COPY executors.groovy /usr/share/jenkins/ref/init.groovy.d/executors.groovy

    We must start with some Jenkins image in order to customize it. In my case that's slim Alpine Linux version 2.89.1. Then there's build argument for the proxy. You can ignore this part if you're not behind one.

    Before we modify the image, we need to switch to root user. After we're done we should switch it back to jenkins fo better security (if you wonder how to check it without base image Dockerfile, docker history command is your friend). In my case I'm also installing some python3 stuff defined in requirements.txt dependency file. If you're not willing to add any package to the system, you can skip this entire part too.

    Then, we approach configuring plugins. In different places in Internet you can find an advice to use /usr/local/bin/plugins.sh but believe me you don't want to do this, as this installs plugins without their dependencies. Newer install-plugins.sh script takes care of dependencies for you. In our case we're installing two plugins. You might want to install just the essential one - the swarm plugin.

    Now, four nonstandard lines. I believe that setting runSetupWizard to false is self-explanatory. The rest of lines are there for account setup, proxy configuration and executors configuration.

    Let's start with setting up admin account. Groovy here we go!


    import jenkins.model.*
    import hudson.security.*
    import jenkins.security.s2m.AdminWhitelistRule

    def instance = Jenkins.getInstance()

    def user = new File("/run/secrets/jenkinsUser").text.trim()
    def pass = new File("/run/secrets/jenkinsPassword").text.trim()

    def hudsonRealm = new HudsonPrivateSecurityRealm(false)
    hudsonRealm.createAccount(user, pass)

    def strategy = new FullControlOnceLoggedInAuthorizationStrategy()


    I'm not Groovy expert so don't judge me by the code above. I have started with just knowledge that it runs over JVM :). It's actually looks like nice managed language. The good part is that, as in Python, the code mostly speaks for itself. Hudson Legacy is visible here as well. I won't go into details - if you want to know from where all of this magic comes, pay a visit to official docs. Don't forget that you can also use infamous Jenkins console. I found Groovy's dump built-in very helpful too.
    So the above script will actually setup an admin account, but doesn't hardwire anything. Both username and password come from Docker Secrets that allows you to manage sensitive data in your Swarm cluster nicely.

    Now, the second script is for proxy:


    import jenkins.model.*
    import hudson.*

    def instance = Jenkins.getInstance()
    def pc = new hudson.ProxyConfiguration("", 8080, null, null, "localhost,*.your.intranet.com");
    instance.proxy = pc;

    Here's some magic too. It sets up proxy but with specified exceptions. Then it modifies Jenkins instance (which seem to be a singleton).

    And finally, executors part. I wanted this one so master is not used as a worker at all.

    import jenkins.model.*


    Now, since the master is ready, let's configure slaves. Their Dockerfile is as follows.

    FROM docker:17.03-rc

    ARG proxy
    ENV https_proxy=$proxy http_proxy=$proxy no_proxy="localhost,*.your.intranet.com"

    RUN apk --update add openjdk8-jre git python3
    RUN wget -O swarm-client.jar http://repo.jenkins-ci.org/releases/org/jenkins-ci/plugins/swarm-client/3.3/swarm-client-3.3.jar

    ENV http_proxy= https_proxy=
    COPY entrypoint.sh /
    RUN chmod +x /entrypoint.sh
    CMD ["/entrypoint.sh"]

    This time base image is docker, because we want to have docker installed within this docker container (so this container can spawn other containers). After setting proxies (the part that is not mandatory) we must download Java Runtime Environment version 8 and download swarm-client JAR. I'm using version 3.3 which is accessible through URL as for today.
    Finally, there's an entrypoint that will execute swarm-client and do all the magic, but it heavily relies on Docker Secret named jenkinsSwarm, which should look like following.

    -master http://master_address:8080 -password jenkinsUser -username jenkinsPassword

    Here master_address must be known to slave machines (e.g. in /etc/hosts, Consul or something). You should also include username and password - the same ones that you share in other Docker Swarm secrets.

    If you're using Ansible like I do, it's pretty straightforward to utilize variables instead not to hardcode credentials. For instance ansible-vault can be used for this.

    entrypoint.sh itself is almost one-liner:

    mkdir /tmp/jenkins
    java -jar swarm-client.jar -labels=docker -executors=1 -fsroot=/tmp/jenkins -name=docker-$(hostname) $(cat /run/secrets/jenkinsSwarm)

    It assumes that it's running in the Swarm and can access /run/secrets/jenkinsSwarm (the line that's pasted above).

    Glueing it all together.

    Building blocks are already in place. Now it's time to glue everything together. I don't want to go into details here, because this is not primary topic of this blog post. If you're interested in how personally I did everything please let me know in comments, so I will create GitHub repo. Let me however give you some important hints:
    • if you want slave to be able to spawn other containers (on the same host on which the slave is running), you must bind mount docker.sock file, e.g. like this: "/var/run/docker.sock:/var/run/docker.sock". There's more to this, though! Docker daemon will not allow jenkins user to spawn containers, so you must somehow circumvent this problem. I'm circumventing this by adding jenkins user to docker group, but this works only because there's 1:1 mapping between the host and container.
    • you should have three secrets in Docker Swarm cluster: jenkinsUser, jenkinsPassword and jenkinsSwarm with username, password, and swarm-client.jar arguments respectively
    • machines must be able to communicate. For internal JNLP communication, port 50000/tcp must be opened.
    • if you set deployment mode to global in docker-compose.yml file (if you're using one), then you will have as much slaves as machines in the cluster, which can be nice
    • if you're gonna stick to this solution for a longer period of time I recommend to think about horizontally scaling out and in: it should be as simple as adding/removing machines from the cluster: just one terraform command followed by ansible-playbook spell.

    Hopefully this post helps you with setting up Jenkins cluster that simply works. If you'd like to see the code, let me know in comments!
  • Airflow Docker with Xcom push and pull

    Recently, in one projects I'm working on, we started to research technologies that can be used to design and execute data processing flows. Amount of data to be processed is counted in terabytes, hence we were aiming at solutions that can be deployed in the cloud. Solutions from Apache umbrella like Hadoop, Spark, or Flink were at the table from the very beginning, but we also looked at others like Luigi or Airflow, because our use case was neither MapReducable nor stream-based.

    Airflow caught our attention and we decided to give it a shot just to see if we can create PoC using it*. In order to execute PoC faster rather than slower, we planned to provision Swarm cluster for this.

    In the Airflow you can find couple of so-called operators that allow you to execute actions. There are operators for Bash or Python, but you can also find something for e.g. Hive. Fortunately there is also Docker operator for us.

    Local PoC
    PoC started on my laptop and not in the cluster. Thankfully, DockerOperator allows you to pass URL to docker daemon, so moving from laptop to cluster is close to just changing one parameter. Nice!

    If you want to run Airflow server locally from inside container, and have it running as non-root (you should!) and you bind docker.sock from host to the container, you must create docker group in the container that mirrors docker group on your host and then add e.g. airflow user to this group. That does the trick...

    So just running DockerOperator is not black magic. However, if your containers need to exchange data it starts to be a little bit more tricky.

    Xcom push/pull
    The push part is simple and documented. Just set xcom_push parameter to True and last line of container stdout will be published by Airflow as it was pushed programatically. It looks that this is natural Airflow way.

    Pull is not that obvious. Perhaps because it's not documented. You can't read stdin or something. The way to do this involves joining two dots:
    • command parameter can be Jinja2-templated
    • one of the macros allows you to do xcom_pull
    So you need to prepare your containers in a special way so they can pull/push. Let's start with a container that pushes something:

    FROM debian
    ENTRYPOINT echo '{"i_am_pushing": "json"}'

    Simple enough. Now pulling container:

    FROM debian
    COPY ./entrypoint /
    ENTRYPOINT ["/entrypoint"]

    Entrypoint script can be whatever and will get the JSON as $1. Crucial (and also easy to miss) thing that is required for it to work is that ENTRYPOINT must use exec form. Yes, there are two forms of ENTRYPOINT. If you use the one without array, then parameters will not be passed to the container!

    Finally, you can glue things together and you're done. The ti macro allows us to get data pushed by other task. ti stands for task_instance.

    dag = DAG('docker', default_args=default_args, schedule_interval=timedelta(1))

    t1 = DockerOperator(task_id='docker_1', dag=dag, image='docker_1', xcom_push=True)

    t2 = DockerOperator(task_id='docker_2', dag=dag, image='docker_2', command='{{ ti.xcom_pull(task_ids="docker_1") }}')


    Docker can be used in Airflow along with Xcom push/pull functionality. It isn't very convenient and is not well documented I would say, but at least it works.

    If time permits I'm going to create PR for documenting pull op. I don't know how it works out, because in Airflow GH project there are 237 PRs now and some of them are there since May 2016!

    * the funny thing is that we considered Jenkins too! ;-)
  • Tests stability S09E11 (Docker, Selenium)

    If you're experienced in setting up automated testing with Selenium and Docker you'll perhaps agree with me that it's not the most stable thing in the world. Actually it's far far away from any stable island - right in the middle of "the sea of instability".

    When you think about failures in automated testing and how they develop when the system is growing it can resemble drugs. Seriously. When you start, occasional failures are ignored. You close your eyes and click "Retry". Innocent. But after some time it snowballs into a problem. And you find yourself with a blind fold put on but you can't remember buying it.

    This post is small story how in one of small projects we started with occasional failures and ended up with... well... you'll see. Read on ;).

    For past couple of months I was thinking that "others have worse setups and live", but today it all culminated, I have achieved fourth degree of density and decided to stop being quiet.

    In the middle of this post you might start to think that our environment is simply broken. That's damn right. The cloud in which we're running is not very stable. Sometimes it behaves like it had a sulk. There are problems with proxies too. And finally we add Docker and Selenium to the mixture. I think testimonial from one of our engineers sums it all:
    if retry didn’t fix it for the 10th time, then there’s definitely something wrong
    And now something must be noted as well. The project I'm referring to is just a sideway one. It's an attempt to innovate some process, unsupported by the business whatsoever.

    The triggers
    I was pressing "Retry" button for another time on two of the e2e jobs and saw following.

    // job 1
    couldn't stat /proc/self/fd/18446744073709551615: stat /proc/self/fd/23: no such file or directory

    // job 2
    Service 'frontend' failed to build: readlink /proc/4304/exe: no such file or directory

    What the hell is this? We have never seen this before and now apparently it became a commonplace in our CI pipeline (it was nth retry).

    So this big number after /fd/ is 64-bit value of -1. Perhaps something in Selenium uses some function that returns an error and then tries to call stat syscall, passing -1 as an argument. Function return value was not checked!
    The second error message is most probably related to docker. Something tries to find where is executable for some PID. Why?

    "Retry" solution did not work this time. Re-deploying e2e workers also didn't help. I thought that now is the time when we should get some insights into what is actually happening and how many failures were caused by unstable environment.

    Luckily we're running on GitLab, which provides reasonable API. Read on to see what I've found. I personally find it hilarious.

    Insight into failures
    It's extremely easy to make use of GitLab CI API (thanks GitLab guys!). I have extracted JSON objects for every job in every pipeline that was recorded in our project and started playing with the data.

    The first thing that I checked was how many failures there are per particular type of test. Names are anonymized a little because I'm unsure if this is sensitive data or not. Better safe than sorry!

    Fig 1: Successful/failed jobs, per job name
    I knew that some tests were failing often, but these results tell that in some cases almost 50% of the jobs fail! Insane! BTW we recently split some of long-running e2e test suites into smaller jobs, which is observable from the figure.
    But now we can argue that maybe this is because of the bugs in the code. Let's see. In order to tell this we must analyze data basing on commit hashes: how many commits in particular jobs were executed multiple times and finished with different status. In other words: we look for the situations in which even without changes in the code the job status was varying.

    The numbers for our repository are:
    • number of (commit, job) pairs with at least one success: 23550
    • total number of failures for these pairs: 1484

    In other words, unstable environment was responsible for at least ~6.30% of observable failures. It might look like small number, but if you take into account that single job can last for 45 minutes, it becomes a lot of wasted time. Especially that failure notifications aren't always handled immediately. I also have a hunch that at some time people started to click "Retry" just to be sure the problem is not with the environment.

    My top 5 picks among all of these failures are below.

    hash:job           | #tot | success/fail | users clicking "Retry"
    d7f43f9c:e2e-7     | 19   |  ( 1/17)     | user-6,user-7,user-5
    2fcecb7c:e2e-7     | 16   |  ( 8/ 8)     | user-6,user-7
    2c34596f:other-1   | 14   |  ( 1/13)     | user-8
    525203c6:other-13  | 12   |  ( 1/ 8)     | user-13,user-11
    3457fbc5:e2e-6     | 11   |  ( 2/ 9)     | user-14

    So, for instance - commit d7f43f9c was failing on job e2e-7 17 times and three distinct users tried to make it pass by clicking "Retry" button over and over. And finally they made it! Ridiculous, isn't it?

    And speaking of time I've also checked jobs that lasted for enormous number of time. Winners are:

    job:status        |  time (hours)
    other-2:failed    |  167.30
    other-8:canceled  |  118.89
    other-4:canceled  |  27.19
    e2e-7:success     |  26.12
    other-1:failed    |  26.01

    Perhaps these are just outliers. Histograms would give better insight. But even if outliers, they're crazy outliers.

    I have also attempted to detect reason of the failure but this is more complex problem to solve. It requires to parse logs and guess which line was the first one indicating error condition. Then the second guess - about whether the problem originated from environment or the code.
    Maybe such a task could be somehow handled by (in)famous machine learning. Actually there are more items that could be achieved with ML support. Two most simple examples are:
    • giving estimation whether the job will fail
      • also, providing reason of failure
      • if the failure originated from faulty environment, what exactly was it?
    • estimated time for the pipeline to finish
    • auto-retry in case of env-related failure

    Apparently I've been having much more unstable e2e test environment than I ever thought. Lesson learned is that if you get used to solve problem by retrying you loose sense in how big trouble you are.

    Similarly to any other engineering problem you first need to gather data and decide what to do next. Basing on numbers I have now I'm planning to implement some ideas to make life easier.

    While analyzing the data I had moments when I couldn't stop laughing to myself. But the reality is sad. It started with occasional failures and ended with continuous problem. And we weren't doing much about it. The problem was not that we were effed in the ass. The problem was that we started to arrange our place there. Insights will help us get out.

    Share your ideas in comments. If we bootstrap discussion I'll do my best to share the code I have in GitHub.
  • C++: on the dollar sign

    In most programming languages there are sane rules that specify what can be an identifier and what cannot. Most of the time it's even intuitive - it's just something that matches [_a-zA-Z][a-zA-Z0-9]*. There are languages that allow more (e.g. $ in PHP/JS, or % in LabTalk). How about C++? Answer to this question may be a little surprise.

    Almost a year ago we had this little argument with friend of mine whether dollar sign is allowed to be used within C++ identifiers. In other words it was about whether e.g. int $this = 1; is legal C++ or not.
    Basically I was stating that's not possible. On the other hand, my friend was recalling some friend of his, which mentioned that dollars are fine.

    The first line of defense is of course nearest compiler. I decided to fire up one and simply check what happens if I compile following fragment of code.

    1 auto $foo() {
    2 int $bar = 1;
    3 return $bar;
    4 }

    At the time I had gcc-4.9.3 installed on my system (prehistoric version, I know ;-). For the record, the command was like this: g++ dollar.cpp -std=c++1y -c -Wall -Wextra -Werror.

    And to my surprise... it compiled without single complaint. Moreover, clang and MSVC gulped this down without complaining as well. Well, Sławek - I said to myself - even if you're mastering something for years, there's still much to surprise you. BTW such a conclusion puts titles like following in much funnier light.

    It was normal office day and we had other work to get done, so I reluctantly accepted this just as another dark corner. After couple of hours I forgot about the situation and let it resurface... couple of weeks later.

    So, fast forward couple of weeks. I was preparing something related to C++ and I accidentally found a reference to the dollar sign in GCC documentation. It was nice feeling, because I knew I will fill this hole in my knowledge in a matter of minutes. So what was the reason compilers were happily accepting dollar signs?
    Let me put here excerpt from GCC documentation, which speaks for itself :)
    GCC allows the ‘$’ character in identifiers as an extension for most targets. This is true regardless of the std= switch, since this extension cannot conflict with standards-conforming programs. When preprocessing assembler, however, dollars are not identifier characters by default.
    Currently the targets that by default do not permit ‘$’ are AVR, IP2K, MMIX, MIPS Irix 3, ARM aout, and PowerPC targets for the AIX operating system.
    You can override the default with -fdollars-in-identifiers or fno-dollars-in-identifiers. See fdollars-in-identifiers.

    I think three most important things are:
    1. This ain't work in macros.
    2. It doesn't seem to be correlated with -std switch.
    3. Some architectures do not permit it at all.
    What got me thinking it this list of architectures. And it took me couple of minutes to find out that e.g. assembler for ARM doesn't allow dollar sign. So any assembly code generated by GCC for ARM would not assemble if dollar sign was used. That's plausible explanation why GCC doesn't allow such a character for all architectures. It doesn't explain why compilers allow it for others, though.

    GCC could theoretically mitigate problem with particular architectures by replacing $ signs with some other character, but then bunch of other problems would appear: possible name conflicts, name mangling/demangling would yield incorrect values, and finally it wouldn't be possible to export such "changed" symbols from a library. In other words: disaster.

    What about the standard?

    After thinking about it for a minute I had strong need to see what exactly identifier does mean. So I opened N3797 and quickly found section I was looking for, namely (surprise-surprise) 2.11 Identifiers. So what does this section say?

    Right after formal definition there is an explanation which refers to sections E.1 and E.2. But that's not important here. There is one more thing that appears in the formal definition and it's extremely easy to miss this one. It's "other implementation-defined characters". What does it mean? Yup - the compiler is allowed to allow any other character to be used within identifiers at will.

    P.s. surprisingly cppcheck 1.71 doesn't report $ sign in identifiers as a problem at all.
  • Getting all parent directories of a path

    edit: reddit updates

    Few minutes ago I needed to solve trivial problem of getting all parent directories of a path. It's very easy to do it imperatively, but it would simply not satisfy me. Hence, I challenged myself to do it declaratively in Python.

    The problem is simple, but let me put an example on the table, so it's even easier to imagine what are we talking about.

    Given some path, e.g.


    You want to have following list of parents:


    It's trivial to do this using split and then some for loop. How to make it more declarative?
    Thinking more mathematically (mathematicians will perhaps cry out to heaven for vengeance after reading on, but let me at least try...) we simply want to get all of the subsets from some ordered set S that form prefix w.r.t. S. So we can simply generate pairs of numbers (1, y), representing all prefixes where y belongs to [1, len S). We can actually ignore this constant 1 and just operate on numbers.
    In Python, to generate numbers starting from len(path) and going down we can simply utilize range() and [::-1] (this reverses collections, it's an idiom). Then join() can be used on splited path, but with slicing from 1 to y. That's it. And now demonstration:

    >>> path = '/a/b/c/d'
    >>> ['/' + '/'.join(path.split('/')[1:l]) for l in range(len(path.split('/')))[::-1] if l]
    ['/a/b/c', '/a/b', '/a', '/']

    But what about performance? Which one will be faster - imperative or declarative approach? Intuition suggests that imperative version will win, but let's check.

    On picture below you can see timeit (n=1000000) results for my machine (i5-6200U, Python 3.5.2+) for three paths:

    short_path = '/lel'
    regular_path = '/jakie/jest/srednie/zagniezdzenie?'
    long_path = '/z/lekka/dlugasna/sciezka/co/by/pierdzielnik/mial/troche/roboty'

    Implementations used:

    def imper1(path):
    result = []
    for i in range(1, len(path.split('/'))):
    y = '/'.join(path.split('/')[:i]) or '/'
    return result

    def imper2(path):
    i = len(path) - 1
    l = []
    while i > 0:
    while i != 0 and path[i] != '/':
    i -= 1
    l.append(path[:i] or '/')
    i -= 1
    return l

    def decl1(path):
    return ['/' + '/'.join(path.split('/')[1:l])
                for l in range(len(path.split('/')))[::-1] if l]

    def decl2(path):
    return ['/' + '/'.join(path.split('/')[1:-l])
                for l in range(-len(path.split('/'))+1, 1) if l] 
    # decl3 hidden. read on ;-)

    It started with imper1 and decl1. I noticed that imperative version is faster. I tried to speed up declarative function by replacing [::-1] with some numbers tricks. It helped, but not to the extend I anticipated. Then, I though about speeding up imper1 by using lower-level constructs. Unsurprisingly while loops and checks were faster. Let me temporarily ignore decl3 for now and play a little with CPython bytecode.

    By looking at my results not everything is so obvious. decl{1,2} turned out to have decent performance with 4-part path, which looks like reasonable average.

    I disassembled decl1 and decl2 to see the difference in byte code. The diff is shown below.

    30 CALL_FUNCTION    1 (1 positional, 0 keyword pair) | 30 CALL_FUNCTION    1 (1 positional, 0 keyword pair)
    33 CALL_FUNCTION    1 (1 positional, 0 keyword pair) | 33 CALL_FUNCTION    1 (1 positional, 0 keyword pair)
    36 CALL_FUNCTION    1 (1 positional, 0 keyword pair) | 36 UNARY_NEGATIVE
    39 LOAD_CONST       0 (None)                         | 37 LOAD_CONST       4 (1)  
    42 LOAD_CONST       0 (None)                         | 40 BINARY_ADD
    45 LOAD_CONST       5 (-1)                           | 41 LOAD_CONST       4 (1)  
    48 BUILD_SLICE      3                                | 44 CALL_FUNCTION    2 (2 positional, 0 keyword pair)

    As we can see [::-1] is implemented as three loads and build slice operations. I think this could be optimized if we had special opcode like e.g. BUILD_REV_SLICE. My little-optimized decl2 is faster because one UNARY_NEGATIVE and one BINARY_ADD is less than LOAD_CONST, BUILD_SLICE and BINARY_SUBSCR. Performance gain here is pretty obvious. No matter what decl2 must be faster.

    What about decl2 vs imper1?
    It's more complicated and it was a little surprise that such a longer bytecode can be slower than shorter counterpart.

      3           0 BUILD_LIST               0        
                  3 STORE_FAST               1 (result)
      4           6 SETUP_LOOP              91 (to 100)                     
                  9 LOAD_GLOBAL              0 (range)
                 12 LOAD_CONST               1 (1)
                 15 LOAD_GLOBAL              1 (len)
                 18 LOAD_FAST                0 (path)
                 21 LOAD_ATTR                2 (split)
                 24 LOAD_CONST               2 ('/')
                 27 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
                 30 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
                 33 CALL_FUNCTION            2 (2 positional, 0 keyword pair)
                 36 GET_ITER                        
            >>   37 FOR_ITER                59 (to 99)
                 40 STORE_FAST               2 (i)

      5          43 LOAD_CONST               2 ('/')
                 46 LOAD_ATTR                3 (join)
                 49 LOAD_FAST                0 (path)
                 52 LOAD_ATTR                2 (split)
                 55 LOAD_CONST               2 ('/')
                 58 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
                 61 LOAD_CONST               0 (None)
                 64 LOAD_FAST                2 (i)
                 67 BUILD_SLICE              2
                 70 BINARY_SUBSCR
                 71 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
                 74 JUMP_IF_TRUE_OR_POP     80
                 77 LOAD_CONST               2 ('/')
            >>   80 STORE_FAST               3 (y)

      6          83 LOAD_FAST                1 (result)
                 86 LOAD_ATTR                4 (append)
                 89 LOAD_FAST                3 (y)
                 92 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
                 95 POP_TOP
                 96 JUMP_ABSOLUTE           37
            >>   99 POP_BLOCK

      7     >>  100 LOAD_FAST                1 (result)
                103 RETURN_VALUE

    The culprit was LOAD_CONST in decl{1,2} that was loading list-comprehension as a code object. Let's see how it looks, just for the record.

    >>> dis.dis(decl2.__code__.co_consts[1])
     21           0 BUILD_LIST               0
                  3 LOAD_FAST                0 (.0)
            >>    6 FOR_ITER                51 (to 60)
                  9 STORE_FAST               1 (l)
                 12 LOAD_FAST                1 (l)
                 15 POP_JUMP_IF_FALSE        6
                 18 LOAD_CONST               0 ('/')
                 21 LOAD_CONST               0 ('/')
                 24 LOAD_ATTR                0 (join)
                 27 LOAD_DEREF               0 (path)
                 30 LOAD_ATTR                1 (split)
                 33 LOAD_CONST               0 ('/')
                 36 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
                 39 LOAD_CONST               1 (1)
                 42 LOAD_FAST                1 (l)
                 45 UNARY_NEGATIVE
                 46 BUILD_SLICE              2
                 49 BINARY_SUBSCR
                 50 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
                 53 BINARY_ADD
                 54 LIST_APPEND              2
                 57 JUMP_ABSOLUTE            6
            >>   60 RETURN_VALUE

    So this is how list comprehensions look like when converted to byte code. Nice! Now performance results make more sense. In the project I was working on my function for getting all parent paths was called in one place and perhaps contributed to less than 5% of execution time of whole application. It would not make sense to optimize this piece of code. But it was delightful journey into internals of CPython, wasn't it?

    Now, let's get back to decl3. What have I done to make my declarative implementation 2x faster on average case and for right-part outliers? Well... I just reluctantly resigned from putting everything in one line and saved path.split('/') into separate variable. That's it.

    So what are learnings?
    • declarative method turned out to be faster than hand-crafter imperative one employing low-level constructs.
      Why? Good question! Maybe because bytecode generator knows how to produce optimized code when it encounters list comprehension? But I have written no CPython code, so it's only my speculation.
    • trying to put everything in one line can hurt - in described case split() function was major performance dragger
    reddit-related updates:
    Dunj3 outpaced me ;) - his implementation, which is better both w.r.t. "declarativeness" and performance:
    list(itertools.accumulate(path.split('/'), curry(os.sep.join)))

      syntax highlighting done with https://tohtml.com/python/