missing-QuestionPost主題

網絡抓取數據磚

AleksandraFrolo — 結婚,2023年8月16日07:54:38格林尼治時間

你好, web抓取最簡單的方法是什麼磚?Let's imagine that from this link:

http://automated.pythonanywhere.com , I need to grab this element "

/html/body/div[1]/div/h1[1]" and return a text, how can I do it? Can somebody write a simple code with explanation, Thank you!

我不能查看Lakehouse基本培訓視頻,我隻看到菜單,點擊不開任何東西

pg1 — 星期二,06年6月2023 19:45:49 GMT

我不能夠查看Lakehouse基本培訓視頻,我隻看到菜單,點擊不開任何

權限授予多個表/視圖模式單一

Randomname — 星期四,08年6月2023 15:39:04 GMT

是有辦法權限授予多個表/視圖使用通配符? 例子像格蘭特選擇* _view用戶?

如何使用磚和宜必思火花嗎

ccbeloy — 星期四,08年6月2023 19:00:21 GMT

宜必思現在包括pyspark後端。然而使用databricks-connect似乎沒有處理宜必思。這是示例代碼拋出一個錯誤。 從磚。連接導入DatabricksSession databricks.sdk 。核心導入配置 配置=配置(profile = dev) 火花= DatabricksSession.builder.sdkConfig(配置).getOrCreate () df = spark.read.parquet (“/ somefile.parquet”) df.createOrReplaceTempView (sometable) 進口宜必思 宜必思進口_ ibis_con = ibis.pyspark.connect(火花) 上麵拋出一個錯誤: Python39 \網站\ pyspark \ sql \ \會話連接。py", line 532, in sparkContext

raise NotImplementedError("sparkContext() is not implemented.")

NotImplementedError: sparkContext() is not implemented.

功能管理和外部表加州大學的區別

carlosjrestr — 星期五,09年6月2023 01:04:20 GMT

嗨社區, 有總結或細節指導的功能區別在統一目錄管理和外部表嗎?看文檔在磚,我找不到任何特定功能支持管理表不支持外部表。假設我有外部表使用三角洲格式我功能將無法使用,如果我不把他們管理表? 謝謝 CR

我不能訪問deltatable磚。錯誤:org.apache.spark.sql.AnalysisException:

Karthe — 星期五,09年6月2023 06:58:10 GMT

你好, 我不能夠訪問從datbase三角洲表。 當我試圖通過spark.read讀表。表命令,我得到以下錯誤: org.apache.spark.sql。AnalysisException: org.apache.hadoop.hive.ql.metadata。HiveException: . lang。RuntimeException:無法實例化org.apache.hadoop.hive.metastore。HiveMetaStoreClient

Here is the content from error log:

Fri Jun 9 05:57:38 2023 Connection to spark from PID 1377

Fri Jun 9 05:57:38 2023 Initialized gateway on port 44015

Fri Jun 9 05:57:38 2023 Connected to spark.

Tried to attach usage logger `pyspark.databricks.pandas.usage_logger`, but an exception was raised: is not a callable object

Fri Jun 9 06:04:15 2023 Connection to spark from PID 1629

Fri Jun 9 06:04:15 2023 Initialized gateway on port 37083

Fri Jun 9 06:04:16 2023 Connected to spark.

Tried to attach usage logger `pyspark.databricks.pandas.usage_logger`, but an exception was raised: is not a callable object

Fri Jun 9 06:13:51 2023 Connection to spark from PID 1924

Fri Jun 9 06:13:51 2023 Initialized gateway on port 33139

Fri Jun 9 06:13:51 2023 Connected to spark.

在創建一個UDF磚SQL,我如何聲明一個局部變量?是這樣的嗎?創建或替換函數len()設置myString =“我的價值”;返回INT返回長度(myString);

房車 — 星期五,09年6月2023 13:26:17 GMT

磚鬆弛的通道

Oliver_Angelil — 太陽,格林尼治時間2023年6月11日12:01:38

我想知道如果有一個磚鬆弛的頻道嗎?如果沒有,會有人有興趣加入一個?

我何時能得到磚數據工程師副券。

pradyumn9999 — 太陽,格林尼治時間2023年6月11日16:55:18

的一天因為我通過了考試,仍然沒有收到從磚徽章。

沒有收到證書為Apache火花3.0磚認證關聯的開發人員。

Gaurav007 — 星期一,2023年6月12日11:58:38格林尼治時間

Hi,

I have passed the exam for Databricks Certified Associate Developer for Apache Spark 3.0 with 85% on 10 jun 2023. I received a mail where badge and credentials mentioned but didn't received any certificate with it. I raised a ticket also - #00334153

Please send me the certificate on mail

每個人的項目進展如何?

Michelle_ -_Devp — 星期一,2023年6月12日15:27:34格林尼治時間

有幾天直到最後期限。 你挑選一個項目/話題嗎?有人開始建設或完成的事情嗎?不要忘記您的項目提交的演示視頻:< A href = " https://devpost.com/submit-to/18245-so-you-think-you-can-hack/manage/submissions " target = " test_blank " > https://devpost.com/submit-to/18245-so-you-think-you-can-hack/manage/submissions < / >

多個驅動程序示例jdbc客戶機集群庫添加到磚嗎?

shan_chandra — 星期一,2023年6月12日21:05:25格林尼治時間

磚如何對待一個罐子當我們上傳多個驅動程序相同的jdbc客戶機(例如oracle) jar磚怎麼治療,這將被認為是在類路徑嗎?

訪問內容在dataframe loc / iloc或[][]嗎?

AleksandraFrolo — 星期二,2023年6月13日08:02:16格林尼治時間

你好, 任務:我想明白,什麼方法是DataFrame更好的訪問內容。

My piece of code:

print("First approach: ", df["Purchase Address"][0])   print("Second approach: ", df.loc[0,"Purchase Address"])

These lines are equal to each other. For me more comfortable to use first version. Is there any recommends in pandas how to access the content?

自動化集群創建

Vidisha — 星期二,2023年6月13日10:41:58格林尼治時間

我新磚,我的領導告訴我,我們手動創建集群運行筆記本。請寫一個python腳本自動化我這樣做。e自動創建集群。

Can anyone help me to write the script using PySpark in Databricks. I have to use Azure Cloud Services for this.

dbutils命令繼續運行

ravin619 — 星期二,2023年6月13日10:29:06格林尼治時間

你好, 我運行下麵的命令在我集群 < A href = " https://dbutils.fs。ls " alt = " https://dbutils.fs。ls“目標= "平等" > dbutils.fs.ls < / > (“abfss: / / demo@ # # #。< A href = " https://dfs.core.window.net " alt = " https://dfs.core.window.net " target = "平等" > dfs.core.window.net < / >”) spark.conf我做了。在運行上述命令之前設置步驟。

It keeps on running for almost 30 mins and still shows as 'Running command'.

I have restarted the cluster many times and tried changing the resource runtime as well.

Please note I'm using azure free subscription plan

h3十六進製ID使用h3.geo_to_h3磚鑲嵌的不是一樣的

kll — 星期二,2023年6月13日18:18:34格林尼治時間

我測試磚鑲嵌空間網格索引的方法獲得的h3十六進製一個給定的緯度,長。

# Get the latitude and longitude latitude = 37.7716736 longitude = -122.4485852   # Get the resolution resolution = 7   # Get the H3 hex ID h3_hex_id = grid_longlatascellid(lit(latitude), lit(longitude), lit(resolution)).hex   # Print the H3 hex ID print(h3_hex_id)   Column<'grid_longlatascellid(CAST(37.7716736 AS DOUBLE), CAST(-122.4485852 AS DOUBLE), 7)[hex]'>

How do I see the actual hex id in the code above?

According the docs, the `h3 hex id` returned by `grid_longlatascellid` looks different from what is returned by `h3.geo_to_h3` method.

h3.geo_to_h3(float(latitude), float(longitude), 7)   '872830829ffffff'

df = spark.createDataFrame([{'lon': 30., 'lat': 10.}]) df.select(grid_longlatascellid('lon', 'lat', lit(10))).show(1, False) +----------------------------------+ |grid_longlatascellid(lon, lat, 10)| +----------------------------------+ | 623385352048508927|

How do I obtain the `h3 hex id` using Databricks Mosaic library? I have the following imports and configurations:

import h3 from mosaic import enable_mosaic enable_mosaic(spark, dbutils) from mosaic import * spark.conf.set("spark.databricks.labs.mosaic.index.system", "H3")

無法對用戶執行補丁操作api在預覽

karthik_p — 星期二,2023年6月13日22:57:41格林尼治時間

嗨團隊, 我們想鎖定用戶訪問工作區,我們能夠得到用戶和組的屬性等。但當我們做補丁操作扔500錯誤 而在路徑後的身體我們提供津貼

maxFilesPerTrigger不工作在青銅銀層

桑傑 — 結婚,2023年6月14日09:40:04格林尼治時間

你好, 我用Matillion架構從AWS S3和自動裝卸機選擇文件保存在三角洲湖。下一層選擇三角洲湖的變化,並做一些處理。我能在自動裝卸機設置批量大小和它的工作。但在青銅銀層,無法設置批量限製,其選擇的所有文件。這是我的代碼從青銅銀層. . (spark.readStream.format(“δ”) .option (“useNotification”、“true”) .option (“includeExistingFiles”、“true”) .option (“allowOverwrites”,真的) .option (“ignoreMissingFiles”,真的) 。選項(“maxFilesPerTrigger”, 100) .load (bronze_path) .writeStream 。選項(“checkpointLocation”, silver_checkpoint_path) 。觸發(processingTime = 1分鍾) .foreachBatch (foreachBatchFunction) .start () 感謝任何幫助。 問候, 桑傑

磚與介紹多維項目聯係起來

Mehala — 星期四,2023年6月15日11:43:12格林尼治時間

你好, 我想把磚與介紹多維項目/ 想使用磚和提供者的連接字符串。 介紹多維項目可以做嗎? 我哪個提供者需要使用嗎?

Or Is there any other workaround to achieve this scenario?

@Hubert Dudek ,@Werner Stinckens , @Aviral Bhardwaj , @Omkar G , @Taha Hussain , @Adam Pavlacka , @Ananth Arunachalam , @Vidula Khanna , @Jose Alfonso , @Kaniz Fatma

清單或文檔從Dev遷移到刺激

Enzo_Bahrami — 星期五,2023年6月16日17:29:11格林尼治時間

大家好!

So we have a dev environment in Databricks and want to migrate it to prod.

I need it o go over every single table, schema, notebooks, and artifacts in the databricks and make sure nothing is hard-coded for example or that there is nothing compromising the prod environment.

Do you any checklist or resource to help in this regards? Maybe a checklist of what are the best practices and what to look over. I want to prepare a diagnosis of the current status of the project.

thank you all!