Issue while trying to read a text file in databricks using Local File API's rather than Spark API

0

I'm trying to read a small txt file which is added as a table to the default db on Databricks. While trying to read the file via Local File API, I get a FileNotFoundError, but I'm able to read the same file as Spark RDD using SparkContext.

Please find the code below:

with open("/FileStore/tables/boringwords.txt", "r") as f_read:
  for line in f_read:
    print(line)

This gives me the error:

FileNotFoundError                         Traceback (most recent call last)
<command-2618449717515592> in <module>
----> 1 with open("dbfs:/FileStore/tables/boringwords.txt", "r") as f_read:
      2   for line in f_read:
      3     print(line)

FileNotFoundError: [Errno 2] No such file or directory: 'dbfs:/FileStore/tables/boringwords.txt'

Where as, I have no problem reading the file using SparkContext:

boring_words = sc.textFile("/FileStore/tables/boringwords.txt")
set(i.strip() for i in boring_words.collect())

And as expected, I get the result for the above block of code:

Out[4]: {'mad',
 'mobile',
 'filename',
 'circle',
 'cookies',
 'immigration',
 'anticipated',
 'editorials',
 'review'}

I was also referring to the DBFS documentation here to understand the Local File API's limitations but of no lead on the issue. Any help would be greatly appreciated. Thanks!

apache-spark databricks pyspark sparkapi
2021-11-24 06:16:55
3
0

The problem is that you're using the open function that works only with local files, and doesn't know anything about DBFS, or other file systems. To get this working, you need to use DBFS local file API and append the /dbfs prefix to file path: /dbfs/FileStore/....:

with open("/dbfs/FileStore/tables/boringwords.txt", "r") as f_read:
  for line in f_read:
    print(line)
2021-11-24 07:56:14
0

Alternatively you can simply use the built-in csv method:

df = spark.read.csv("dbfs:/FileStore/tables/boringwords.txt")
2021-11-24 08:51:27
0

Alternatively we can use dbutils

files = dbutils.fs.ls('/FileStore/tables/')
li = []
for fi in files: 
  print(fi.path)

Example ,

enter image description here

2021-11-24 18:26:17

In other languages

This page is in other languages

Русский
..................................................................................................................
Italiano
..................................................................................................................
Polski
..................................................................................................................
Română
..................................................................................................................
한국어
..................................................................................................................
हिन्दी
..................................................................................................................
Français
..................................................................................................................
Türk
..................................................................................................................
Česk
..................................................................................................................
Português
..................................................................................................................
ไทย
..................................................................................................................
中文
..................................................................................................................
Español
..................................................................................................................
Slovenský
..................................................................................................................