Python : Problem reading filename with brackets/long path name

Question

Python : Problem reading filename with brackets/long path name

I am trying to read excel file with pandas.

df=pd.read_excel('abcd (xyz-9) Interim Report 01-03-18.xlsx')

which gives me file not found error. If I remove brackets and rename file to 'abcd Interim Report 01-03-18.xlsx', then it works fine.

I tried renaming with shutil but it gives me the same error

shutil.copyfile('abcd (xyz-9) Interim Report 01-03-18.xlsx','test.xlsx')

I tried

1. pd.read_excel('abcd ^(xyz-9) Interim Report 01-03-18.xlsx')
2. pd.read_excel('abcd \\(xyz-9\\) Interim Report 01-03-18.xlsx')

EDIT:

The file seem to work on local drive but not on network drive even if I change the cwd to the file location.

On using glob and os.path.exists:

for i in range(0,1):
    for filename in glob.glob(fpath+"\\"+ldir[i]+"\\"+"*Interim*.xlsx"):
        print(filename)
        print(os.path.exists(filename))

\\Africa-me.xxx.com\Africa-me\xxx\xxx\xxx\xxx\06 xxx\02 xxx, xxx and xxxx xxx\03 xxx\04 xxx\05 xx xx & xx\12 2018 xx\06 xx xxx\\\AAA-61\abcd (xyz-9) Interim Report 01-03-18.xlsx
False

\\Africa-me.xxx.com\Africa-me\xxx\xxx\xxx\xxx\06 xxx\02 xxx, xxx and xxxx xxx\03 xxx\04 xxx\05 xx xx & xx\12 2018 xx\06 xx xxx\\\AAA-61\abcd Interim Report 01-03-18.xlsx
True

On using glob and os.stat:

import ctypes

for i in range(0,1):
    for filename in glob.glob(fpath+"\\"+ldir[i]+"\\"+"*Interim*.xlsx"):
        print(filename)
        try:
            print(os.stat(filename))
        except OSError as e: 
            ntstatus = ctypes.windll.ntdll.RtlGetLastNtStatus()
            print('winerror:', e.winerror) 
            print('ntstatus:', hex(ntstatus & (2**32-1)))

\\Africa-me.xxx.com\Africa-me\xxx\xxx\xxx\xxx\06 xxx\02 xxx, xxx and xxxx xxx\03 xxx\04 xxx\05 xx xx & xx\12 2018 xx\06 xx xxx\\\AAA-61\abcd (xyz-9) Interim Report 01-03-18.xlsx
winerror: 3
ntstatus: 0x80000006

\\Africa-me.xxx.com\Africa-me\xxx\xxx\xxx\xxx\06 xxx\02 xxx, xxx and xxxx xxx\03 xxx\04 xxx\05 xx xx & xx\12 2018 xx\06 xx xxx\\\AAA-61\abcd Interim Report 01-03-18.xlsx
os.stat_result(st_mode=33206, st_ino=15624813576354602, st_dev=3657573641, st_nlink=1, st_uid=0, st_gid=0, st_size=726670, st_atime=1563172745, st_mtime=1523347973, st_ctime=1563170560)

python

windows

pandas

spyder

shutil

asked on Stack Overflow Jul 15, 2019 by

Pradeep Tummala • edited Jul 16, 2019 by

Eryk Sun

2 Answers

The os.stat test shows that accessing the path with brackets fails with ERROR_PATH_NOT_FOUND (3), which is either from a missing path component or a path that's too long. We know it's not a problem with finding the final path component, since in this case we expect the error to be ERROR_FILE_NOT_FOUND (2). We know it's not a problem with reserved characters, since in this case we expect the error to be ERROR_INVALID_NAME (123). Also, in both of these cases, the Windows API would have to make an NT system call, but we see that the last NT status is STATUS_NO_MORE_FILES (0x80000006), which is from the os.listdir call in glob.glob. Thus the problem is likely that the path is too long.

For brevity or privacy, the question seems to shorten some path component names to "xxx". Probably if expanded out to the true names, we'd see that the path with "(xyz-9)" is at least 260 characters, which is one more than the maximum allowed length for a DOS path, MAX_PATH - 1 (259) characters.

We can access a long path by converting it to an extended path, which is a Unicode string that begins with the "\\?\" device-path prefix, or "\\?\UNC\" for a UNC path. First, since bytes paths in Windows are limited to MAX_PATH characters, we have to decode a bytes path to Unicode. Next we have to normalize and qualify the path via os.path.abspath because extended paths bypass normalization when accessed. Here's an extpath function to convert a DOS path to an extended path:

import os

try:
    from os import fsdecode
except ImportError: # Probably Python 2.x
    import sys
    def fsdecode(filename):
        if isinstance(filename, type(u'')):
            return filename
        elif isinstance(filename, bytes):
            return filename.decode(sys.getfilesystemencoding(), 'strict')
        raise TypeError('expected string, not {}'.format(
                type(filename).__name__))

def extpath(path):
    path = os.path.abspath(fsdecode(path))
    if not path.startswith(u'\\\\?\\'):
        if path.startswith(u'\\\\.\\'):
            path = u'\\\\?\\' + path[4:]
        elif path.startswith(u'\\\\'):
            path = u'\\\\?\\UNC\\' + path[2:]
        else:
            path = u'\\\\?\\' + path
    return path

Background

At the core of Windows NT (i.e. all Windows versions since XP) is the NTOS operating system, which uses the NT kernel. This is similar to the way 16-bit Windows was layered over DOS in the 1980s and early 1990s. But NTOS is more tightly coupled to Windows than DOS was, and it's a more capable OS (e.g. support for symmetric multiprocessing, preemptive multithreading, virtual memory, asynchronous I/O, secured objects, and multiple users with simultaneous logons and sessions).

In some ways Windows still maintains its MS-DOS roots. In particular, for disk device and filesystem paths, the Windows API uses DOS paths instead of NT object paths. This includes DOS drives "A:" through "Z:" and UNC paths such as "\\server\share\path". The Windows API normalizes DOS paths to replace forward slash with backslash; resolve relative paths (i.e. paths with no root directory or no drive) using the process working directory or per-drive working directory; resolve "." and ".." components; and trim trailing spaces and dots from the final component.

When accessing a DOS path, Windows converts it to a device path that begins with one of the WINAPI device prefixes, "\\.\" and "\\?\" (e.g. "C:\Windows" -> "\\?\C:\Windows"). For drive-letter paths and relative paths (but not UNC paths), it reserves a small set of DOS device names in the final path component (e.g. "C:\Temp\con" -> "\\.\con", and "nul" -> "\\.\nul"). For UNC paths, it uses the "UNC" device mountpoint (e.g. \\server\share\path -> "\\?\UNC\server\share\path"). Before a call passes to the NT domain, the WINAPI device prefix ("\\.\" or "\\?\") gets replaced by the NTAPI device prefix ("\??\").

These device-namespace prefixes are shorthand for the caller's local mountpoint directory in NT's object namespace (i.e. "\Sessions\0\DosDevices\<Caller's Logon Session ID>"). The local mountpoint directory implicitly shadows the global mountpoint directory (i.e. "\Global??"). For example, "C:\Temp" becomes "\??\C:\Temp", which evaluates as "\Global??\C:\Temp" if global "C:" isn't locally shadowed. The global mountpoint can be referenced explicitly via the "Global" object link (e.g. "\\?\Global\C:\Temp"), which should always be available.

Classically, DOS path normalization uses string buffers with space for no more than MAX_PATH (260) characters. In Windows 10, this legacy limit is lifted (in most cases) if long paths are enabled for the system and the application manifest declares that it's long-path aware. The "python[w].exe" executable in Python 3.6+ has this manifest setting.

If long DOS paths aren't enabled, most file API functions still support long paths. We just have to use a WINAPI device path that begins with the "\\?\" prefix, which is called an extended path. This is similar to a regular device path that begins with the "\\.\" prefix, except an extended path does not get normalized when accessed. The downside is that we have to implement our own path normalization. In Python this is implemented by os.path.abspath. We still have to manually rewrite UNC paths, but that's a simple matter of replacing a leading "\\" with "\\?\UNC\".

Note that the working directory does not support device paths, extended or not. The system has undefined behavior if we set a device path as the working directory. So don't use them with Python's os.chdir or the cwd parameter of subprocess.Popen. This means we can't get around the limit on using long relative paths by setting an extended path as the working directory. In Windows 10, if long DOS paths are enabled for the process, the working directory does support long paths, but it still only supports regular DOS paths (UNC and drive-letter paths), not device paths.

answered on Stack Overflow Jul 16, 2019 by

Eryk Sun • edited Jul 16, 2019 by

Eryk Sun

I always try to use python's pathlib module whenever I deal with files. There seems to be an error with the initial a in the filename

import pandas as pd                                                        
import pathlib                                                             
path_to_file = pathlib.Path("g:\Python\abcd (xyz-9) Interim Report 01-03-18.xlsx")
df = pd.read_exces(path_to_file)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'pandas' has no attribute 'read_exces'
df = pd.read_excel(path_to_file)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Program Files\Python37\lib\site-packages\pandas\util\_decorators.py", line 188, in wrapper
    return func(*args, **kwargs)
  File "C:\Program Files\Python37\lib\site-packages\pandas\util\_decorators.py", line 188, in wrapper
    return func(*args, **kwargs)
  File "C:\Program Files\Python37\lib\site-packages\pandas\io\excel.py", line 350, in read_excel
    io = ExcelFile(io, engine=engine)
  File "C:\Program Files\Python37\lib\site-packages\pandas\io\excel.py", line 653, in __init__
    self._reader = self._engines[engine](self._io)
  File "C:\Program Files\Python37\lib\site-packages\pandas\io\excel.py", line 424, in __init__
    self.book = xlrd.open_workbook(filepath_or_buffer)
  File "C:\Program Files\Python37\lib\site-packages\xlrd\__init__.py", line 111, in open_workbook
    with open(filename, "rb") as f:
OSError: [Errno 22] Invalid argument: 'g:\\Python\x07bcd (xyz-9) Interim Report 01-03-18.xlsx'

It seems that because of the preceding backslash, the initial a was at some point interpreted as the control symbol \x07 U+0007 : ALERT [BEL].

that's why it is necessary to use the raw string when defining the path, as Dawid suggested

path_to_file = pathlib.Path(r"g:\Python\abcd (xyz-9) Interim Report 01-03-18.xlsx")

answered on Stack Overflow Jul 15, 2019 by

Zababa • edited Jul 15, 2019 by

Zababa

User contributions licensed under CC BY-SA 3.0