Do you know
tee program? Its
man page reads:
tee - read from standard input and write to standard output and files
It makes it easy to split output of one program into both stdout and files. It’s a nice UNIX tool. Recently I was doing code review and it turned out that equivalent of such thing may be pretty useful in Python programs too:
It allows to do extra work, so we can employ it to e.g. simultaneous hash calculation or other job.
I came up with this idea whilst reviewing some code. I saw following function (anonymized).
I don’t like such code. The
seek hack is obscure. What can be done to make it better? What if we
simply remembered what was the last byte copied by
copyfileobj accepts only two fileobjs and buffer size. Recently I was experimenting
indexed_gzip and I had to roll out my own copy of
copyfileobj that apart from copying the
data was also calculating md5 hash and number of bytes copied.
An alternative is to wrap one of the arguments with something that will do whatever we want. Let’s focus on the problem at hand: adding newline if necessary.
If we need to remember k last characters, we can simply use
tail and it will
work as a circular buffer.
In order to make it look like in the first listing we need to add trivial context manager:
This mechanism can be further improved to be more flexible etc.