Therefore, instead of using (under "Step 2: Use a notebook to list and read shared tables" in the above URL):

client = delta_sharing.SharingClient(f"/dbfs/<dbfs-path>/config.share")


I am using:

credentials = dbutils.secrets.get(scope='redacted', key='redacted')

profile = delta_sharing.protocol.DeltaSharingProfile.from_json(credentials)

client = delta_sharing.SharingClient(profile=profile)


The above works fine. I can list the tables. Now I would like to load a table using Spark. The documentation suggests using

delta_sharing.load_as_spark(f"<profile-path>#<share-name>.<schema-name>.<table-name>", version=<version-as-of>)

But that relies on having stored the contents of the credential file in a folder in DBFS and using that path for <profile-path>. Is there an alternative way to do this with the "profile" variable I am using? By the way, the code is bold instead of formatted in code blocks because I kept getting errors that prevented me from posting.

Apache火花熟練 https://community.m.eheci.com/t5/get-started-discussions/apache-spark-proficient/m-p/39550 M703

What is the best way to be proficient in Apache spark?

I wanted to be connected to my workspace setup PostgreSQL database. I was following this article. The instructions under "Create a connection" are failing because my connection requires a database name. However, database name is not a supported variable. What options do I have to connect to my database?


Case 2: However, when a Single User Access mode cluster is activated (in the screenshot, labeled as dataengineer1@d...), dataengineer1 can view all schemas and tables. This is not the desired behavior.



I'm hoping to find a solution that ensures even in Single User Access Mode, users can only access Schemas and Tables for which they have permission.

Any insights or suggestions would be greatly appreciated. I value the expertise of this community and look forward to your responses.

Thank you


I used "AWSQuickstartCloudformationLambda" to create workspace on AWS side in my trial environment and stack was created successfully but on databricks side compute cluster is not starting. Error says:

Aws authorization failed: Failed to communicate with Aws, code: UnauthorizedOperation message: You are not authorized to perform this operation. Encoded authorization failure message: Unable to locate credentials. You can configure credentials by running "aws configure"

Is there a problem with cloudformation template ? I would assume the integration should work if cloud formation succeeds. Any help would be appreciated

工作空間區域 https://community.m.eheci.com/t5/get-started-discussions/workspace-region/m-p/39344 M683

ERRORYour workspace region is not yet supported for model serving, please see https://docs.m.eheci.com/machine-learning/model-serving/index.html#region-availability for a list of supported regions.

The account is in ap-south-1. I can see there is no cross? Does X means available or not available?

Also can account and workspace can have different region?If yes how to check and modify that

Hello,

I am new to Databricks and just started using it for my work project. I have been trying to create test notebooks for practice purposes, but when I try to rename it, either by clicking on the title or clicking Edit from the File, it shows "Element rename failed: Method not allowed".

Also when I try to move a notebook to another folder, it shows a similar message: "Method not allowed".

Does anybody know what's going on? I really need this for the project. Thanks!

After reading this post, I used init script as follows to install gdal into runtime 12.2 LTS



dbutils.fs.put("/databricks/scripts/gdal_install.sh",""" #!/bin/bash sudo add-apt-repository ppa:ubuntugis/ppa sudo apt-get update sudo apt-get install -y cmake gdal-bin libgdal-dev python3-gdal""", True)




The init script ran and cluster could start properly but when i run import gdal in notebook, i get the following error:

ModuleNotFoundError: No module named 'gdal'

I also tried installing gdal into the cluster via Maven repository, it does not work either.

May I know what I can do to get gdal installed properly?

Thank you.












We have an @adf pipeline which will run some set of activities before Azure databricks notebook is called. When the notebook is called our pipeline will launch a new cluster for each job with job compute Standard_F4 node with single worker. Launching the cluster itself is taking ~7 mins which is adding to the overall ADF pipeline run time.

Can you suggest a solution to reduce the cluster launch time?

Note: Our ADF pipeline has an event based trigger which will run when there is a file coming to ADLS again; we cannot have a cluster created and running all the time as it impacts the cost.

@Sujitha Hi Sujitha, Could you please let us know when we can see the Databricks rewards portal and we hope that the points credited over there will remain the same. Please update on these 2. 

Databrickscommunity獎勵商店不工作 https://community.m.eheci.com/t5/get-started-discussions/databrickscommunity-reward-store-is-not-working/m-p/38993 M648

Hi Guys,


Does anybody know when the Databricks community reward store portal will open?

I see it's still under construction


@Kaniz @Sujitha 



Hello,

When I make a GET request to get the list of job runs using "/api/2.1/jobs/runs/list" there are no "prev_page_token", "next_page_token" fields in the response, despite "has_more: true". 

Screenshot 2023-08-03 at 09.23.13.png

DLT管道/輪包找不到自定義庫 https://community.m.eheci.com/t5/get-started-discussions/dlt-pipeline-unable-to-find-custom-libraries-wheel-packages/m-p/38987 M646 < P >我們DLT管道和我們需要導入自定義庫打包在輪文件。< / P > < P >我們在Azure DBX和我們使用Az DevOps CI / CD來構建和部署輪包在我們的DBX的環境。


In the top of our DLT notebook we are importing the wheel package as below

%pip install /dbfs/Libraries/whls/{wheel_file_name}.whl

On execution of the pipeline we get the below error

CalledProcessError: Command 'pip --disable-pip-version-check install /dbfs/Libraries/whls/{wheel_file_name}.whl' returned non-zero exit status 1.,None,Map(),Map(),List(),List(),Map())

And from the logs you can see that the file is not accessible:

Python interpreter will be restarted. WARNING: Requirement '/dbfs/Libraries/whls/{wheel_file_name}.whl' looks like a filename, but the file does not exist Processing /dbfs/Libraries/whls/{wheel_file_name}.whl ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: '/dbfs/Libraries/whls/{wheel_file_name}.whl'

knowing that the file exists already, when checked from the DBFS Explore UI screen.

We've tried to list the available folders and files accessible by the DLT Pipeline node and we got the below:

Files in the ROOT Directory: ['mnt', 'tmp', 'local_disk0', 'dbfs', 'Volumes', 'Workspace', . . . . .] Files in the ROOT/dbfs Directory: []

As you can see dbfs looks empty and it doesn't contain any folder or file, which we can see and access from the DBFS explorer ui portal.


Volumes and Workspace files are accessible from the pipeline, but:

- Uploading to Volumes giving Error uploading without additional details to know the issue, even uploading manually from the UI

- Workspace/shared...: Files are accessible but the problem that it's not working with CI/CD pipelines to automatically push wheel files from there, so we need to upload them manually.


Any idea, how can we overcome this, and to be able to upload the wheel files via Azure DevOps to the DBX environment and to be able to import them in our DLT pipelines?


星期四,2023年8月3日07:13:11格林尼治時間 https://community.m.eheci.com/t5/get-started-discussions/dlt-pipeline-unable-to-find-custom-libraries-wheel-packages/m-p/38987 M646 Fz1 2023 - 08 - 03 - t07:13:11z
I have a scheduled task to run a workflow.

Task 1 computes some parameters and then these are to be consumed by a reporting task: Task 2.

I want Task 2 to report "failure" if Task 1 fails. However creating a workflow dependency means that Task 2 will not run if Task 1 fails.

What suggestions do I have for how to prevent parameter sharing and dependency between Task 1-2, but also allow Task 2 to fire even on Task 1's failure?

Edit: Now attaching screenshot showing Task 2 being skipped on Task 1's failure.

We are trying to get cluster life_cycle_state using API and we are able to get various values as below

Is there any other values apart from above values it would be a great help.

Hi Team,

Can you help us to understand,

1) Performance benchmarking of Liquid clustering compared to z-order and partitioning.

2) How much cost/savings it brings compared to z-order and partitioning

Regards,
Phanindra

