Load Data From Cosmos DB To Azure SQL

Firewall Azure SQL

Go to SQL Database and click on overview and then click on set server firewall

Click on selected networks

1. Set Allow Azure services and resources to access this server
2. In FreeForAll erlauben wir jede IP Adresse. Das sollte generell nicht gemacht werden, ist aber notwendig, wenn Sie Databricks in der Community Edition verwenden und die IP Adresse nicht kennen

Connection to Cosmos DB

# Connection-related
host = "cosmosdbdrone2.cassandra.cosmos.azure.com"
port = 10350
ssl_enabled = True
username = "cosmosdbdrone2"
password = "lQ1zSS59G5Kf8AKU06Asf23i4jQn1v2b7Z9cFx0s078jw9UKjsQJBwnZnXz7eEc2ZcfECmsSgKCrACDb9CaELA=="

# Throughput-related...adjust as needed
batch_size = 1
remote_connections_per_executor = 10
concurrent_writes = 1000
concurrent_reads = 512
grouping_buffer_size = 1000
keep_alive_ms = 600000000

# Usage example
spark.conf.set("spark.cassandra.connection.host", host)
spark.conf.set("spark.cassandra.connection.port", str(port))
spark.conf.set("spark.cassandra.connection.ssl.enabled", str(ssl_enabled))
spark.conf.set("spark.cassandra.auth.username", username)
spark.conf.set("spark.cassandra.auth.password", password)

spark.conf.set("spark.cassandra.output.batch.size.rows", str(batch_size))
spark.conf.set("spark.cassandra.connection.remoteConnectionsPerExecutor", str(remote_connections_per_executor))
spark.conf.set("spark.cassandra.output.concurrent.writes", str(concurrent_writes))
spark.conf.set("spark.cassandra.concurrent.reads", str(concurrent_reads))
spark.conf.set("spark.cassandra.output.batch.grouping.buffer.size", str(grouping_buffer_size))
spark.conf.set("spark.cassandra.connection.keep_alive_ms", str(keep_alive_ms))

Load Data into Dataframe

df = spark.read.format("org.apache.spark.sql.cassandra") \
    .options(table="dronetable", keyspace="dronedb") \
    .load()

Connect to Azure SQL

jdbcHostname = "sqlserverdrone.database.windows.net"
jdbcPort = "1433"
jdbcDatabase = "azuresqldrone"
properties = {
"user" : "cnagl",
"password" : "Noodeihi!123" ,
"driver": "com.microsoft.sqlserver.jdbc.SQLServerDriver"}


url = "jdbc:sqlserver://{0}:{1};database={2}".format(jdbcHostname,jdbcPort,jdbcDatabase)

Load into Azure SQL

df.write.jdbc(url=url, table="drohne", mode="overwrite", properties=properties)

PreviousMilestone - Stream Integration mit Apache Kafka NextBereitstellen von Apache Airflow in Azure

Last updated 10 months ago