tee equivalent as a Python class
Do you know tee
program? Its man
page reads:
tee - read from standard input and write to standard output and files
It makes it easy to split output of one program into both stdout and files. It’s a nice UNIX tool. Recently I was doing code review and it turned out that equivalent of such thing may be pretty useful in Python programs too:
It allows to do extra work, so we can employ it to e.g. simultaneous hash calculation or other job.
I came up with this idea whilst reviewing some code. I saw following function (anonymized).
I don’t like such code. The seek
hack is obscure. What can be done to make it better? What if we
simply remembered what was the last byte copied by shutil.copyfileobj
?
Unfortunately, copyfileobj
accepts only two fileobjs and buffer size. Recently I was experimenting
with indexed_gzip
and I had to roll out my own copy of copyfileobj
that apart from copying the
data was also calculating md5 hash and number of bytes copied.
An alternative is to wrap one of the arguments with something that will do whatever we want. Let’s focus on the problem at hand: adding newline if necessary.
If we need to remember k last characters, we can simply use collections.deque
as tail
and it will
work as a circular buffer.
In order to make it look like in the first listing we need to add trivial context manager:
And voille-a!
This mechanism can be further improved to be more flexible etc.