Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As usual be attentive and suspicious when using such tools as its suggestions might not be appropriate. For example let's take the following code:

  with open("foo") as a:
      contents = a.read(100)
Running it through refurb produces the following suggestion:

> main.py:1:1 [FURB101]: Use `y = Path(x).read_text()` instead of `with open(x, ...) as f: y = f.read()`



I'm not sure I understand what value pathlib is providing here. The explanation is to avoid the context manager / multi-line.

Why though? What is saving you one line really getting? It's not really improving readability in any way.

An alternative one liner could be

    contents = open("foo").read(100)
No need to import another library and use its features.

Added to which pathlib has some speed issues. Pathlib has some really neat advantages and should be favoured over most of os library file stuff, but not as a replacement for a simple open.


Pathlib opens and closes the file. With your version, unless I'm mistaken the file does not get closed


Python is reference counted and it is very clear in this code that a reference to the file isn't leaked, so the file will be reliably closed at the end of the statement.

Some people don't like to rely on the reference counter but I think in simple cases like this where only a temporary is created it is quite safe.


> it is very clear in this code that a reference to the file isn't leaked

Is it? What if an exception is thrown? See e.g. this SO answer[1]:

> However if an exception is thrown [...] then a reference to the file will be kept as part of the stack trace in the traceback object and the file will not be closed, at least until the next exception is thrown.

So your code, if called from somewhere else inside a try block, leaks a filehandle potentially indefinitely.

[1]: https://stackoverflow.com/a/2404671/125921


> the file will be reliably closed at the end of the statement.

The only guarantee CPython makes is that it may eventually close the file once all references are gone. The fact that it does it immediately and reliably in this specific case is an implementation detail and should not be relied upon.

Use context managers when things need to be closed and save yourself trouble down the line.


CPython relies on reference count.

other python implementations, like PyPy, does not.


I looked at the example for some time now, and I don't understand what exactly the problem is. Is it the missing `import pathlib` and then `pathlib.Path...`? Is using `read_text()` bad style?


I'm by no means a Python expert, but as far as I can tell, those two snippets do wildly different things. `.read(100)` would read 100 bytes from the file, `.read_text()` takes the entire file and returns a decoded string. Running the former on a 100GB file will work OK, running the latter on the same file will lead to OOM (unless you actually have +100GB free memory).

Basically, they are not equivalent.


As I understand, the snippets in the suggestions are not supposed to be applied verbatim, but as generic proposals that need to be tweaked for the specific example (or ignored completely).

Note that it doesn't say to replace `.read(100)` with `.read_text()`; it uses the default version (without arguments) for both. (That being said, `read_text` doesn't have a limit argument, but that is a completely different issue.)

So you're right - if you're applying the suggestion verbatim without thinking, you will get incorrect results; but the suggestions were not intended for that.


I find that linters that have a non-trivial false-positive rate get very annoying very quick.


The example shown on the Refurb GitHub page is a little unfortunate. As you say, there is also a potential scalability problem with the original design if the input file is too large to hold in memory in its entirety.

In the example, that issue could be avoided with minimal cost by reading the file lazily:

    with open(filename) as f:
        for line in f:
            do_same_things_with(line)
The specific change Refurb suggests instead doesn’t have this benefit.

However, in general switching to a lazy design won’t always be as safe and easy as it is here, so I’m not sure what I’d ideally want a tool to say in this situation. You’d need much more sophisticated analysis of the original code’s design if you wanted to do something like warning about unnecessarily reading an entire file and suggesting the lazy alternative. It doesn’t seem like a good idea to warn about all uses of the functions that will read the whole file at once, because often with Python scripts that’s a simple and effective way get the required job done and the added complexity from a more scalable design isn’t needed or justified.

Writing good style checkers is hard. :-)


The open() variant reads the first 100 text characters, not 100 bytes, because open() works in “text mode” by default, so the two are not _that_ different. Your generally point is still very correct though, the memory issue is the main problem with the “suggestion” here.


Yes, I completely missed the `100` parameter to `read`. And as a sibling comment said, I also understand them to be suggestions, but in this case, I think `read()` and `read(param)` should be treated as two separate cases, or at least this caveat made clear.


Also, I don't know if this was part of your critique, but are you also mentioning that the "a" variable is not in the suggestion? The reason behind that is long names, such as "super_special_file" would greatly increase the length of the error messages, making it hard to see the "meat" of what is being changed. Hence the x and y naming.

Perhaps I should add a flag to allow verbatim (or close to it) errors. Recreating the source lines exactly will be pretty hard, but I think we can do better here.


Perhaps substitute the actual variables if they are simple and short, otherwise do something like:

> main.py:1:1 [FURB101]: Use `y = pathlib.Path(x).read_text()` instead of `with open(x, ...) as f: y = f.read()`, where y=my_important_data and x=super_special_file

Btw: thanks, your tool suggested a handful of things I didn't know about, and pushed me towards using pathlib instead of relying on my muscle memory to do the laborious `os.path.join(..., ...)`.

One request I would have is to spell out the full name of suggested imports. I had never heard of `contextlib.suppress`. And in the above message, I think it will be obvious that you could do `from pathlib import Path` instead of the literal text in the suggestion.


I think this would be a good change. I agree that the imported names should be spelled out: You shouldn't have to go hunting down the docs before you go and change anything. Again, this would increase horizontal space, but I think it is for the better (the errors are already pretty long anyways).


Yes, Refurb error messages are not gospel: Make sure the changes make sense! Thank you for bring this bug up, I will make sure to address it tomorrow.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: