Stay Ahead, Stay ONMINE

No More Tableau Downtime: Metadata API for Proactive Data Health

In today’s world, the reliability of data solutions is everything. When we build dashboards and reports, one expects that the numbers reflected there are correct and up-to-date. Based on these numbers, insights are drawn and actions are taken. For any unforeseen reason, if the dashboards are broken or if the numbers are incorrect — then it becomes a fire-fight to fix everything. If the issues are not fixed in time, then it damages the trust placed on the data team and their solutions.  But why would dashboards be broken or have wrong numbers? If the dashboard was built correctly the first time, then 99% of the time the issue comes from the data that feeds the dashboards — from the data warehouse. Some possible scenarios are: Few ETL pipelines failed, so the new data is not yet in A table is replaced with another new one  Some columns in the table are dropped or renamed Schemas in data warehouse have changed And many more. There is still a chance that the issue is on the Tableau site, but in my experience, most of the times, it is always due to some changes in data warehouse. Even though we know the root cause, it’s not always straightforward to start working on a fix. There is no central place where you can check which Tableau data sources rely on specific tables. If you have the Tableau Data Management add-on, it could help, but from what I know, its hard to find dependencies of custom sql queries used in data sources. Nevertheless, the add-on is too expensive and most companies don’t have it. The real pain begins when you have to go through all the data sources manually to start fixing it. On top of it, you have a string of users on your head impatiently waiting for a quick-fix. The fix itself might not be difficult, it would just be a time-consuming one. What if we could anticipate these issues and identify impacted data sources before anyone notices a problem? Wouldn’t that just be great? Well, there is a way now with the Tableau Metadata API. The Metadata API uses GraphQL, a query language for APIs that returns only the data that you’re interested in. For more info on what’s possible with GraphQL, do check out GraphQL.org. In this blog post, I’ll show you how to connect to the Tableau Metadata API using Python’s Tableau Server Client (TSC) library to proactively identify data sources using specific tables, so that you can act fast before any issues arise. Once you know which Tableau data sources are affected by a specific table, you can make some updates yourself or alert the owners of those data sources about the upcoming changes so they can be prepared for it. Connecting to the Tableau Metadata API Lets connect to the Tableau Server using TSC. We need to import in all the libraries we would need for the exercise! ### Import all required libraries import tableauserverclient as t import pandas as pd import json import ast import re In order to connect to the Metadata API, you will have to first create a personal access token in your Tableau Account settings. Then update the & with the token you just created. Also update with your Tableau site. If the connection is established successfully, then “Connected” will be printed in the output window. ### Connect to Tableau server using personal access token tableau_auth = t.PersonalAccessTokenAuth(“”, “”, site_id=””) server = t.Server(“https://dub01.online.tableau.com/”, use_server_version=True) with server.auth.sign_in(tableau_auth): print(“Connected”) Lets now get a list of all data sources that are published on your site. There are many attributes you can fetch, but for the current use case, lets keep it simple and only get the id, name and owner contact information for every data source. This will be our master list to which we will add in all other information. ############### Get all the list of data sources on your Site all_datasources_query = “”” { publishedDatasources { name id owner { name email } } }””” with server.auth.sign_in(tableau_auth): result = server.metadata.query( all_datasources_query ) Since I want this blog to be focussed on how to proactively identify which data sources are affected by a specific table, I’ll not be going into the nuances of Metadata API. To better understand how the query works, you can refer to a very detailed Tableau’s own Metadata API documentation. One thing to note is that the Metadata API returns data in a JSON format. Depending on what you are querying, you’ll end up with multiple nested json lists and it can get very tricky to convert this into a pandas dataframe. For the above metadata query, you will end up with a result which would like below (this is mock data just to give you an idea of what the output looks like): { “data”: { “publishedDatasources”: [ { “name”: “Sales Performance DataSource”, “id”: “f3b1a2c4-1234-5678-9abc-1234567890ab”, “owner”: { “name”: “Alice Johnson”, “email”: “[email protected]” } }, { “name”: “Customer Orders DataSource”, “id”: “a4d2b3c5-2345-6789-abcd-2345678901bc”, “owner”: { “name”: “Bob Smith”, “email”: “[email protected]” } }, { “name”: “Product Returns and Profitability”, “id”: “c5e3d4f6-3456-789a-bcde-3456789012cd”, “owner”: { “name”: “Alice Johnson”, “email”: “[email protected]” } }, { “name”: “Customer Segmentation Analysis”, “id”: “d6f4e5a7-4567-89ab-cdef-4567890123de”, “owner”: { “name”: “Charlie Lee”, “email”: “[email protected]” } }, { “name”: “Regional Sales Trends (Custom SQL)”, “id”: “e7a5f6b8-5678-9abc-def0-5678901234ef”, “owner”: { “name”: “Bob Smith”, “email”: “[email protected]” } } ] } } We need to convert this JSON response into a dataframe so that its easy to work with. Notice that we need to extract the name and email of the owner from inside the owner object.  ### We need to convert the response into dataframe for easy data manipulation col_names = result[‘data’][‘publishedDatasources’][0].keys() master_df = pd.DataFrame(columns=col_names) for i in result[‘data’][‘publishedDatasources’]: tmp_dt = {k:v for k,v in i.items()} master_df = pd.concat([master_df, pd.DataFrame.from_dict(tmp_dt, orient=’index’).T]) # Extract the owner name and email from the owner object master_df[‘owner_name’] = master_df[‘owner’].apply(lambda x: x.get(‘name’) if isinstance(x, dict) else None) master_df[‘owner_email’] = master_df[‘owner’].apply(lambda x: x.get(’email’) if isinstance(x, dict) else None) master_df.reset_index(inplace=True) master_df.drop([‘index’,’owner’], axis=1, inplace=True) print(‘There are ‘, master_df.shape[0] , ‘ datasources in your site’) This is how the structure of master_df would look like: Sample output of code Once we have the main list ready, we can go ahead and start getting the names of the tables embedded in the data sources. If you are an avid Tableau user, you know that there are two ways to selecting tables in a Tableau data source — one is to directly choose the tables and establish a relation between them and the other is to use a custom sql query with one or more tables to achieve a new resultant table. Therefore, we need to address both the cases. Processing of Custom SQL query tables Below is the query to get the list of all custom SQLs used in the site along with their data sources. Notice that I have filtered the list to get only first 500 custom sql queries. In case there are more in your org, you will have to use an offset to get the next set of custom sql queries. There is also an option of using cursor method in Pagination when you want to fetch large list of results (refer here). For the sake of simplicity, I just use the offset method as I know, as there are less than 500 custom sql queries used on the site. # Get the data sources and the table names from all the custom sql queries used on your Site custom_table_query = “”” { customSQLTablesConnection(first: 500){ nodes { id name downstreamDatasources { name } query } } } “”” with server.auth.sign_in(tableau_auth): custom_table_query_result = server.metadata.query( custom_table_query ) Based on our mock data, this is how our output would look like: { “data”: { “customSQLTablesConnection”: { “nodes”: [ { “id”: “csql-1234”, “name”: “RegionalSales_CustomSQL”, “downstreamDatasources”: [ { “name”: “Regional Sales Trends (Custom SQL)” } ], “query”: “SELECT r.region_name, SUM(s.sales_amount) AS total_sales FROM ecommerce.sales_data.Sales s JOIN ecommerce.sales_data.Regions r ON s.region_id = r.region_id GROUP BY r.region_name” }, { “id”: “csql-5678”, “name”: “ProfitabilityAnalysis_CustomSQL”, “downstreamDatasources”: [ { “name”: “Product Returns and Profitability” } ], “query”: “SELECT p.product_category, SUM(s.profit) AS total_profit FROM ecommerce.sales_data.Sales s JOIN ecommerce.sales_data.Products p ON s.product_id = p.product_id GROUP BY p.product_category” }, { “id”: “csql-9101”, “name”: “CustomerSegmentation_CustomSQL”, “downstreamDatasources”: [ { “name”: “Customer Segmentation Analysis” } ], “query”: “SELECT c.customer_id, c.location, COUNT(o.order_id) AS total_orders FROM ecommerce.sales_data.Customers c JOIN ecommerce.sales_data.Orders o ON c.customer_id = o.customer_id GROUP BY c.customer_id, c.location” }, { “id”: “csql-3141”, “name”: “CustomerOrders_CustomSQL”, “downstreamDatasources”: [ { “name”: “Customer Orders DataSource” } ], “query”: “SELECT o.order_id, o.customer_id, o.order_date, o.sales_amount FROM ecommerce.sales_data.Orders o WHERE o.order_status = ‘Completed'” }, { “id”: “csql-3142”, “name”: “CustomerProfiles_CustomSQL”, “downstreamDatasources”: [ { “name”: “Customer Orders DataSource” } ], “query”: “SELECT c.customer_id, c.customer_name, c.segment, c.location FROM ecommerce.sales_data.Customers c WHERE c.active_flag = 1” }, { “id”: “csql-3143”, “name”: “CustomerReturns_CustomSQL”, “downstreamDatasources”: [ { “name”: “Customer Orders DataSource” } ], “query”: “SELECT r.return_id, r.order_id, r.return_reason FROM ecommerce.sales_data.Returns r” } ] } } } Just like before when we were creating the master list of data sources, here also we have nested json for the downstream data sources where we would need to extract only the “name” part of it. In the “query” column, the entire custom sql is dumped. If we use regex pattern, we can easily search for the names of the table used in the query. We know that the table names always come after FROM or a JOIN clause and they generally follow the format … The is optional and most of the times not used. There were some queries I found which used this format and I ended up only getting the database and schema names, and not the complete table name. Once we have extracted the names of the data sources and the names of the tables, we need to merge the rows per data source as there can be multiple custom sql queries used in a single data source. ### Convert the custom sql response into dataframe col_names = custom_table_query_result[‘data’][‘customSQLTablesConnection’][‘nodes’][0].keys() cs_df = pd.DataFrame(columns=col_names) for i in custom_table_query_result[‘data’][‘customSQLTablesConnection’][‘nodes’]: tmp_dt = {k:v for k,v in i.items()} cs_df = pd.concat([cs_df, pd.DataFrame.from_dict(tmp_dt, orient=’index’).T]) # Extract the data source name where the custom sql query was used cs_df[‘data_source’] = cs_df.downstreamDatasources.apply(lambda x: x[0][‘name’] if x and ‘name’ in x[0] else None) cs_df.reset_index(inplace=True) cs_df.drop([‘index’,’downstreamDatasources’], axis=1,inplace=True) ### We need to extract the table names from the sql query. We know the table name comes after FROM or JOIN clause # Note that the name of table can be of the format .. # Depending on the format of how table is called, you will have to modify the regex expression def extract_tables(sql): # Regex to match database.schema.table or schema.table, avoid alias pattern = r'(?:FROM|JOIN)s+((?:[w+]|w+).(?:[w+]|w+)(?:.(?:[w+]|w+))?)b’ matches = re.findall(pattern, sql, re.IGNORECASE) return list(set(matches)) # Unique table names cs_df[‘customSQLTables’] = cs_df[‘query’].apply(extract_tables) cs_df = cs_df[[‘data_source’,’customSQLTables’]] # We need to merge datasources as there can be multiple custom sqls used in the same data source cs_df = cs_df.groupby(‘data_source’, as_index=False).agg({ ‘customSQLTables’: lambda x: list(set(item for sublist in x for item in sublist)) # Flatten & make unique }) print(‘There are ‘, cs_df.shape[0], ‘datasources with custom sqls used in it’) After we perform all the above operations, this is how the structure of cs_df would look like: Sample output of code Processing of regular Tables in Data Sources Now we need to get the list of all the regular tables used in a datasource which are not a part of custom SQL. There are two ways to go about it. Either use the publishedDatasources object and check for upstreamTables or use DatabaseTable and check for upstreamDatasources. I’ll go by the first method because I want the results at a data source level (basically, I want some code ready to reuse when I want to check a specific data source in further detail). Here again, for the sake of simplicity, instead of going for pagination, I’m looping through each datasource to ensure I have everything. We get the upstreamTables inside of the field object so that has to be cleaned out. ############### Get the data sources with the regular table names used in your site ### Its best to extract the tables information for every data source and then merge the results. # Since we only get the table information nested under fields, in case there are hundreds of fields # used in a single data source, we will hit the response limits and will not be able to retrieve all the data. data_source_list = master_df.name.tolist() col_names = [‘name’, ‘id’, ‘extractLastUpdateTime’, ‘fields’] ds_df = pd.DataFrame(columns=col_names) with server.auth.sign_in(tableau_auth): for ds_name in data_source_list: query = “”” { publishedDatasources (filter: { name: “”””+ ds_name + “””” }) { name id extractLastUpdateTime fields { name upstreamTables { name } } } } “”” ds_name_result = server.metadata.query( query ) for i in ds_name_result[‘data’][‘publishedDatasources’]: tmp_dt = {k:v for k,v in i.items() if k != ‘fields’} tmp_dt[‘fields’] = json.dumps(i[‘fields’]) ds_df = pd.concat([ds_df, pd.DataFrame.from_dict(tmp_dt, orient=’index’).T]) ds_df.reset_index(inplace=True) This is how the structure of ds_df would look: Sample output of code We can need to flatten out the fields object and extract the field names as well as the table names. Since the table names will be repeating multiple times, we would have to deduplicate to keep only the unique ones. # Function to extract the values of fields and upstream tables in json lists def extract_values(json_list, key): values = [] for item in json_list: values.append(item[key]) return values ds_df[“fields”] = ds_df[“fields”].apply(ast.literal_eval) ds_df[‘field_names’] = ds_df.apply(lambda x: extract_values(x[‘fields’],’name’), axis=1) ds_df[‘upstreamTables’] = ds_df.apply(lambda x: extract_values(x[‘fields’],’upstreamTables’), axis=1) # Function to extract the unique table names def extract_upstreamTable_values(table_list): values = set()a for inner_list in table_list: for item in inner_list: if ‘name’ in item: values.add(item[‘name’]) return list(values) ds_df[‘upstreamTables’] = ds_df.apply(lambda x: extract_upstreamTable_values(x[‘upstreamTables’]), axis=1) ds_df.drop([“index”,”fields”], axis=1, inplace=True) Once we do the above operations, the final structure of ds_df would look something like this: Sample output of code We have all the pieces and now we just have to merge them together: ###### Join all the data together master_data = pd.merge(master_df, ds_df, how=”left”, on=[“name”,”id”]) master_data = pd.merge(master_data, cs_df, how=”left”, left_on=”name”, right_on=”data_source”) # Save the results to analyse further master_data.to_excel(“Tableau Data Sources with Tables.xlsx”, index=False) This is our final master_data: Sample Output of code Table-level Impact Analysis Let’s say there were some schema changes on the “Sales” table and you want to know which data sources will be impacted. Then you can simply write a small function which checks if a table is present in either of the two columns — upstreamTables or customSQLTables like below. def filter_rows_with_table(df, col1, col2, target_table): “”” Filters rows in df where target_table is part of any value in either col1 or col2 (supports partial match). Returns full rows (all columns retained). “”” return df[ df.apply( lambda row: (isinstance(row[col1], list) and any(target_table in item for item in row[col1])) or (isinstance(row[col2], list) and any(target_table in item for item in row[col2])), axis=1 ) ] # As an example filter_rows_with_table(master_data, ‘upstreamTables’, ‘customSQLTables’, ‘Sales’) Below is the output. You can see that 3 data sources will be impacted by this change. You can also alert the data source owners Alice and Bob in advance about this so they can start working on a fix before something breaks on the Tableau dashboards. Sample output of code You can check out the complete version of the code in my Github repository here. This is just one of the potential use-cases of the Tableau Metadata API. You can also extract the field names used in custom sql queries and add to the dataset to get a field-level impact analysis. One can also monitor the stale data sources with the extractLastUpdateTime to see if those have any issues or need to be archived if they are not used any more. We can also use the dashboards object to fetch information at a dashboard level. Final Thoughts If you have come this far, kudos. This is just one use case of automating Tableau data management. It’s time to reflect on your own work and think which of those other tasks you could automate to make your life easier. I hope this mini-project served as an enjoyable learning experience to understand the power of Tableau Metadata API. If you liked reading this, you might also like another one of my blog posts about Tableau, on some of the challenges I faced when dealing with big . Also do check out my previous blog where I explored building an interactive, database-powered app with Python, Streamlit, and SQLite. Before you go… Follow me so you don’t miss any new posts I write in future; you will find more of my articles on my . You can also connect with me on LinkedIn or Twitter!

In today’s world, the reliability of data solutions is everything. When we build dashboards and reports, one expects that the numbers reflected there are correct and up-to-date. Based on these numbers, insights are drawn and actions are taken. For any unforeseen reason, if the dashboards are broken or if the numbers are incorrect — then it becomes a fire-fight to fix everything. If the issues are not fixed in time, then it damages the trust placed on the data team and their solutions. 

But why would dashboards be broken or have wrong numbers? If the dashboard was built correctly the first time, then 99% of the time the issue comes from the data that feeds the dashboards — from the data warehouse. Some possible scenarios are:

  • Few ETL pipelines failed, so the new data is not yet in
  • A table is replaced with another new one 
  • Some columns in the table are dropped or renamed
  • Schemas in data warehouse have changed
  • And many more.

There is still a chance that the issue is on the Tableau site, but in my experience, most of the times, it is always due to some changes in data warehouse. Even though we know the root cause, it’s not always straightforward to start working on a fix. There is no central place where you can check which Tableau data sources rely on specific tables. If you have the Tableau Data Management add-on, it could help, but from what I know, its hard to find dependencies of custom sql queries used in data sources.

Nevertheless, the add-on is too expensive and most companies don’t have it. The real pain begins when you have to go through all the data sources manually to start fixing it. On top of it, you have a string of users on your head impatiently waiting for a quick-fix. The fix itself might not be difficult, it would just be a time-consuming one.

What if we could anticipate these issues and identify impacted data sources before anyone notices a problem? Wouldn’t that just be great? Well, there is a way now with the Tableau Metadata API. The Metadata API uses GraphQL, a query language for APIs that returns only the data that you’re interested in. For more info on what’s possible with GraphQL, do check out GraphQL.org.

In this blog post, I’ll show you how to connect to the Tableau Metadata API using Python’s Tableau Server Client (TSC) library to proactively identify data sources using specific tables, so that you can act fast before any issues arise. Once you know which Tableau data sources are affected by a specific table, you can make some updates yourself or alert the owners of those data sources about the upcoming changes so they can be prepared for it.

Connecting to the Tableau Metadata API

Lets connect to the Tableau Server using TSC. We need to import in all the libraries we would need for the exercise!

### Import all required libraries
import tableauserverclient as t
import pandas as pd
import json
import ast
import re

In order to connect to the Metadata API, you will have to first create a personal access token in your Tableau Account settings. Then update the & with the token you just created. Also update with your Tableau site. If the connection is established successfully, then “Connected” will be printed in the output window.

### Connect to Tableau server using personal access token
tableau_auth = t.PersonalAccessTokenAuth("", "", 
                                           site_id="")
server = t.Server("https://dub01.online.tableau.com/", use_server_version=True)

with server.auth.sign_in(tableau_auth):
        print("Connected")

Lets now get a list of all data sources that are published on your site. There are many attributes you can fetch, but for the current use case, lets keep it simple and only get the id, name and owner contact information for every data source. This will be our master list to which we will add in all other information.

############### Get all the list of data sources on your Site

all_datasources_query = """ {
  publishedDatasources {
    name
    id
    owner {
    name
    email
    }
  }
}"""
with server.auth.sign_in(tableau_auth):
    result = server.metadata.query(
        all_datasources_query
    )

Since I want this blog to be focussed on how to proactively identify which data sources are affected by a specific table, I’ll not be going into the nuances of Metadata API. To better understand how the query works, you can refer to a very detailed Tableau’s own Metadata API documentation.

One thing to note is that the Metadata API returns data in a JSON format. Depending on what you are querying, you’ll end up with multiple nested json lists and it can get very tricky to convert this into a pandas dataframe. For the above metadata query, you will end up with a result which would like below (this is mock data just to give you an idea of what the output looks like):

{
  "data": {
    "publishedDatasources": [
      {
        "name": "Sales Performance DataSource",
        "id": "f3b1a2c4-1234-5678-9abc-1234567890ab",
        "owner": {
          "name": "Alice Johnson",
          "email": "[email protected]"
        }
      },
      {
        "name": "Customer Orders DataSource",
        "id": "a4d2b3c5-2345-6789-abcd-2345678901bc",
        "owner": {
          "name": "Bob Smith",
          "email": "[email protected]"
        }
      },
      {
        "name": "Product Returns and Profitability",
        "id": "c5e3d4f6-3456-789a-bcde-3456789012cd",
        "owner": {
          "name": "Alice Johnson",
          "email": "[email protected]"
        }
      },
      {
        "name": "Customer Segmentation Analysis",
        "id": "d6f4e5a7-4567-89ab-cdef-4567890123de",
        "owner": {
          "name": "Charlie Lee",
          "email": "[email protected]"
        }
      },
      {
        "name": "Regional Sales Trends (Custom SQL)",
        "id": "e7a5f6b8-5678-9abc-def0-5678901234ef",
        "owner": {
          "name": "Bob Smith",
          "email": "[email protected]"
        }
      }
    ]
  }
}

We need to convert this JSON response into a dataframe so that its easy to work with. Notice that we need to extract the name and email of the owner from inside the owner object. 

### We need to convert the response into dataframe for easy data manipulation

col_names = result['data']['publishedDatasources'][0].keys()
master_df = pd.DataFrame(columns=col_names)

for i in result['data']['publishedDatasources']:
    tmp_dt = {k:v for k,v in i.items()}
    master_df = pd.concat([master_df, pd.DataFrame.from_dict(tmp_dt, orient='index').T])

# Extract the owner name and email from the owner object
master_df['owner_name'] = master_df['owner'].apply(lambda x: x.get('name') if isinstance(x, dict) else None)
master_df['owner_email'] = master_df['owner'].apply(lambda x: x.get('email') if isinstance(x, dict) else None)

master_df.reset_index(inplace=True)
master_df.drop(['index','owner'], axis=1, inplace=True)
print('There are ', master_df.shape[0] , ' datasources in your site')

This is how the structure of master_df would look like:

Sample output of code

Once we have the main list ready, we can go ahead and start getting the names of the tables embedded in the data sources. If you are an avid Tableau user, you know that there are two ways to selecting tables in a Tableau data source — one is to directly choose the tables and establish a relation between them and the other is to use a custom sql query with one or more tables to achieve a new resultant table. Therefore, we need to address both the cases.

Processing of Custom SQL query tables

Below is the query to get the list of all custom SQLs used in the site along with their data sources. Notice that I have filtered the list to get only first 500 custom sql queries. In case there are more in your org, you will have to use an offset to get the next set of custom sql queries. There is also an option of using cursor method in Pagination when you want to fetch large list of results (refer here). For the sake of simplicity, I just use the offset method as I know, as there are less than 500 custom sql queries used on the site.

# Get the data sources and the table names from all the custom sql queries used on your Site

custom_table_query = """  {
  customSQLTablesConnection(first: 500){
    nodes {
        id
        name
        downstreamDatasources {
        name
        }
        query
    }
  }
}
"""

with server.auth.sign_in(tableau_auth):
    custom_table_query_result = server.metadata.query(
        custom_table_query
    )

Based on our mock data, this is how our output would look like:

{
  "data": {
    "customSQLTablesConnection": {
      "nodes": [
        {
          "id": "csql-1234",
          "name": "RegionalSales_CustomSQL",
          "downstreamDatasources": [
            {
              "name": "Regional Sales Trends (Custom SQL)"
            }
          ],
          "query": "SELECT r.region_name, SUM(s.sales_amount) AS total_sales FROM ecommerce.sales_data.Sales s JOIN ecommerce.sales_data.Regions r ON s.region_id = r.region_id GROUP BY r.region_name"
        },
        {
          "id": "csql-5678",
          "name": "ProfitabilityAnalysis_CustomSQL",
          "downstreamDatasources": [
            {
              "name": "Product Returns and Profitability"
            }
          ],
          "query": "SELECT p.product_category, SUM(s.profit) AS total_profit FROM ecommerce.sales_data.Sales s JOIN ecommerce.sales_data.Products p ON s.product_id = p.product_id GROUP BY p.product_category"
        },
        {
          "id": "csql-9101",
          "name": "CustomerSegmentation_CustomSQL",
          "downstreamDatasources": [
            {
              "name": "Customer Segmentation Analysis"
            }
          ],
          "query": "SELECT c.customer_id, c.location, COUNT(o.order_id) AS total_orders FROM ecommerce.sales_data.Customers c JOIN ecommerce.sales_data.Orders o ON c.customer_id = o.customer_id GROUP BY c.customer_id, c.location"
        },
        {
          "id": "csql-3141",
          "name": "CustomerOrders_CustomSQL",
          "downstreamDatasources": [
            {
              "name": "Customer Orders DataSource"
            }
          ],
          "query": "SELECT o.order_id, o.customer_id, o.order_date, o.sales_amount FROM ecommerce.sales_data.Orders o WHERE o.order_status = 'Completed'"
        },
        {
          "id": "csql-3142",
          "name": "CustomerProfiles_CustomSQL",
          "downstreamDatasources": [
            {
              "name": "Customer Orders DataSource"
            }
          ],
          "query": "SELECT c.customer_id, c.customer_name, c.segment, c.location FROM ecommerce.sales_data.Customers c WHERE c.active_flag = 1"
        },
        {
          "id": "csql-3143",
          "name": "CustomerReturns_CustomSQL",
          "downstreamDatasources": [
            {
              "name": "Customer Orders DataSource"
            }
          ],
          "query": "SELECT r.return_id, r.order_id, r.return_reason FROM ecommerce.sales_data.Returns r"
        }
      ]
    }
  }
}

Just like before when we were creating the master list of data sources, here also we have nested json for the downstream data sources where we would need to extract only the “name” part of it. In the “query” column, the entire custom sql is dumped. If we use regex pattern, we can easily search for the names of the table used in the query.

We know that the table names always come after FROM or a JOIN clause and they generally follow the format ..

. The is optional and most of the times not used. There were some queries I found which used this format and I ended up only getting the database and schema names, and not the complete table name. Once we have extracted the names of the data sources and the names of the tables, we need to merge the rows per data source as there can be multiple custom sql queries used in a single data source.

### Convert the custom sql response into dataframe
col_names = custom_table_query_result['data']['customSQLTablesConnection']['nodes'][0].keys()
cs_df = pd.DataFrame(columns=col_names)

for i in custom_table_query_result['data']['customSQLTablesConnection']['nodes']:
    tmp_dt = {k:v for k,v in i.items()}

    cs_df = pd.concat([cs_df, pd.DataFrame.from_dict(tmp_dt, orient='index').T])

# Extract the data source name where the custom sql query was used
cs_df['data_source'] = cs_df.downstreamDatasources.apply(lambda x: x[0]['name'] if x and 'name' in x[0] else None)
cs_df.reset_index(inplace=True)
cs_df.drop(['index','downstreamDatasources'], axis=1,inplace=True)

### We need to extract the table names from the sql query. We know the table name comes after FROM or JOIN clause
# Note that the name of table can be of the format ..
# Depending on the format of how table is called, you will have to modify the regex expression

def extract_tables(sql):
    # Regex to match database.schema.table or schema.table, avoid alias
    pattern = r'(?:FROM|JOIN)s+((?:[w+]|w+).(?:[w+]|w+)(?:.(?:[w+]|w+))?)b'
    matches = re.findall(pattern, sql, re.IGNORECASE)
    return list(set(matches))  # Unique table names

cs_df['customSQLTables'] = cs_df['query'].apply(extract_tables)
cs_df = cs_df[['data_source','customSQLTables']]

# We need to merge datasources as there can be multiple custom sqls used in the same data source
cs_df = cs_df.groupby('data_source', as_index=False).agg({
    'customSQLTables': lambda x: list(set(item for sublist in x for item in sublist))  # Flatten & make unique
})

print('There are ', cs_df.shape[0], 'datasources with custom sqls used in it')

After we perform all the above operations, this is how the structure of cs_df would look like:

Sample output of code

Processing of regular Tables in Data Sources

Now we need to get the list of all the regular tables used in a datasource which are not a part of custom SQL. There are two ways to go about it. Either use the publishedDatasources object and check for upstreamTables or use DatabaseTable and check for upstreamDatasources. I’ll go by the first method because I want the results at a data source level (basically, I want some code ready to reuse when I want to check a specific data source in further detail). Here again, for the sake of simplicity, instead of going for pagination, I’m looping through each datasource to ensure I have everything. We get the upstreamTables inside of the field object so that has to be cleaned out.

############### Get the data sources with the regular table names used in your site

### Its best to extract the tables information for every data source and then merge the results.
# Since we only get the table information nested under fields, in case there are hundreds of fields 
# used in a single data source, we will hit the response limits and will not be able to retrieve all the data.

data_source_list = master_df.name.tolist()

col_names = ['name', 'id', 'extractLastUpdateTime', 'fields']
ds_df = pd.DataFrame(columns=col_names)

with server.auth.sign_in(tableau_auth):
    for ds_name in data_source_list:
        query = """ {
            publishedDatasources (filter: { name: """"+ ds_name + """" }) {
            name
            id
            extractLastUpdateTime
            fields {
                name
                upstreamTables {
                    name
                }
            }
            }
        } """
        ds_name_result = server.metadata.query(
        query
        )
        for i in ds_name_result['data']['publishedDatasources']:
            tmp_dt = {k:v for k,v in i.items() if k != 'fields'}
            tmp_dt['fields'] = json.dumps(i['fields'])
        ds_df = pd.concat([ds_df, pd.DataFrame.from_dict(tmp_dt, orient='index').T])

ds_df.reset_index(inplace=True)

This is how the structure of ds_df would look:

Sample output of code

We can need to flatten out the fields object and extract the field names as well as the table names. Since the table names will be repeating multiple times, we would have to deduplicate to keep only the unique ones.

# Function to extract the values of fields and upstream tables in json lists
def extract_values(json_list, key):
    values = []
    for item in json_list:
        values.append(item[key])
    return values

ds_df["fields"] = ds_df["fields"].apply(ast.literal_eval)
ds_df['field_names'] = ds_df.apply(lambda x: extract_values(x['fields'],'name'), axis=1)
ds_df['upstreamTables'] = ds_df.apply(lambda x: extract_values(x['fields'],'upstreamTables'), axis=1)

# Function to extract the unique table names 
def extract_upstreamTable_values(table_list):
    values = set()a
    for inner_list in table_list:
        for item in inner_list:
            if 'name' in item:
                values.add(item['name'])
    return list(values)

ds_df['upstreamTables'] = ds_df.apply(lambda x: extract_upstreamTable_values(x['upstreamTables']), axis=1)
ds_df.drop(["index","fields"], axis=1, inplace=True)

Once we do the above operations, the final structure of ds_df would look something like this:

Sample output of code

We have all the pieces and now we just have to merge them together:

###### Join all the data together
master_data = pd.merge(master_df, ds_df, how="left", on=["name","id"])
master_data = pd.merge(master_data, cs_df, how="left", left_on="name", right_on="data_source")

# Save the results to analyse further
master_data.to_excel("Tableau Data Sources with Tables.xlsx", index=False)

This is our final master_data:

Sample Output of code

Table-level Impact Analysis

Let’s say there were some schema changes on the “Sales” table and you want to know which data sources will be impacted. Then you can simply write a small function which checks if a table is present in either of the two columns — upstreamTables or customSQLTables like below.

def filter_rows_with_table(df, col1, col2, target_table):
    """
    Filters rows in df where target_table is part of any value in either col1 or col2 (supports partial match).
    Returns full rows (all columns retained).
    """
    return df[
        df.apply(
            lambda row: 
                (isinstance(row[col1], list) and any(target_table in item for item in row[col1])) or
                (isinstance(row[col2], list) and any(target_table in item for item in row[col2])),
            axis=1
        )
    ]
# As an example 
filter_rows_with_table(master_data, 'upstreamTables', 'customSQLTables', 'Sales')

Below is the output. You can see that 3 data sources will be impacted by this change. You can also alert the data source owners Alice and Bob in advance about this so they can start working on a fix before something breaks on the Tableau dashboards.

Sample output of code

You can check out the complete version of the code in my Github repository here.

This is just one of the potential use-cases of the Tableau Metadata API. You can also extract the field names used in custom sql queries and add to the dataset to get a field-level impact analysis. One can also monitor the stale data sources with the extractLastUpdateTime to see if those have any issues or need to be archived if they are not used any more. We can also use the dashboards object to fetch information at a dashboard level.

Final Thoughts

If you have come this far, kudos. This is just one use case of automating Tableau data management. It’s time to reflect on your own work and think which of those other tasks you could automate to make your life easier. I hope this mini-project served as an enjoyable learning experience to understand the power of Tableau Metadata API. If you liked reading this, you might also like another one of my blog posts about Tableau, on some of the challenges I faced when dealing with big .

Also do check out my previous blog where I explored building an interactive, database-powered app with Python, Streamlit, and SQLite.


Before you go…

Follow me so you don’t miss any new posts I write in future; you will find more of my articles on my . You can also connect with me on LinkedIn or Twitter!

Shape
Shape
Stay Ahead

Explore More Insights

Stay ahead with more perspectives on cutting-edge power, infrastructure, energy,  bitcoin and AI solutions. Explore these articles to uncover strategies and insights shaping the future of industries.

Shape

Microsoft’s largest quantum site to be built in Denmark

With this strategic move, Denmark will become Microsoft’s global quantum hub. According to the company, the expansion of the Lyngby laboratory will enable the complete core components of the Majorana chip to be manufactured directly on site. This research is based on years of cooperation with leading Danish research institutions,

Read More »

Extreme plots enterprise marketplace for AI agents, tools, apps

Extreme Networks this week previewed an AI marketplace where it plans to offer a curated catalog of AI tools, agents and applications. Called Extreme Exchange, it’s designed to give enterprise customers a way to discover, deploy, and create AI agents, microapps, and workflows in minutes rather than developing such components

Read More »

Energy Department Announces $355 Million to Expand Domestic Production of Critical Minerals and Materials

WASHINGTON—The U.S. Department of Energy (DOE) today announced $355 million for two notices of funding opportunities issued by DOE’s Office of Fossil Energy (FE) to expand domestic production of critical materials essential for advancing U.S. energy production, manufacturing, transportation and national defense. The first funding opportunity provides up to $275 million for American industrial facilities capable of producing valuable minerals from existing industrial and coal byproducts. The second provides up to $80 million to establish Mine of the Future proving grounds for real-world testing of next-generation mining technologies. The Department announced in August its intent to invest $1 billion to advance and scale mining, processing, and manufacturing technologies, delivering on President Trump’s Executive Orders, Unleashing American Energy and Immediate Measures to Increase American Mineral Production. These actions will secure America’s critical material supply chain, increase domestic mineral production, reduce reliance on foreign sources, and strengthen U.S. energy independence. “For too long, the United States has relied on foreign nations for the minerals and materials that power our economy,” said U.S. Secretary of Energy Chris Wright. “We have these resources here at home, but years of complacency ceded America’s mining and industrial base to other nations. Thanks to President Trump’s leadership, we are reversing that trend, rebuilding America’s ability to mine, process, and manufacture the materials essential to our energy and economic security.” “The Mine of the Future – Proving Ground Initiative will be among the Department of Energy’s first major investments into mining technology research and development in almost four decades,” said U.S. Department of Energy Assistant Secretary of the Office of Fossil Energy Kyle Haustveit. “This effort will help establish the United States as the world’s leading producer and processor of non-fuel minerals—creating economic prosperity in fossil energy communities across the country while strengthening critical mineral supply chains for

Read More »

Ukraine Drones Hit Russian Black Sea Oil Terminal

(Update) November 14, 2025, 9:45 AM GMT+1: Article updated with additional details. Ukrainian drones attacked Russia’s giant Black Sea port of Novorossiysk overnight, prompting a state of emergency, as Moscow launched a massive air strike on Kyiv that killed four and damaged several residential buildings. Falling drone debris caused a fire at the Russian depot located at Transneft PJSC’s Sheskharis oil terminal, the regional emergency service said on Telegram early Friday. The blaze was put out after more than 50 units of firefighting equipment were deployed at the site, authorities said, but provided no details on the damage. Novorossiysk Mayor Andrey Kravchenko announced the state of emergency on Telegram. Transneft didn’t immediately respond to a request for comment on the situation at the facility. Global benchmark Brent spiked as much as 3 percent in a rapid move toward $65 a barrel, before paring gains. A container terminal located in the port of Novorossiysk was damaged by falling debris, but continued to operate normally, Delo Group, which runs that facility, said in a statement on Telegram. Russia’s largest grain terminal, also operated by Delo Group, was impacted by drone debris, but continues to function, the Interfax news service reported, citing the terminal’s chief executive officer. Drones hit an unidentified civilian ship in the port of Novorossiysk as well, regional emergency services said, without specifying the type of the vessel. The city’s mayor reported damage to at least three residential buildings in separate statements on Telegram.  In Ukraine, four people were killed after Russia launched about 430 drones and 18 missiles – including ballistic ones – in the strike, President Volodymyr Zelenskiy said on the X platform Friday. Dozens of apartment buildings were damaged in the capital Kyiv, he said. At least 26 people were injured, including two children, and several residential buildings were damaged,

Read More »

Repsol Mulls Merger for $19B Upstream Unit

Repsol SA is considering a reverse merger of its upstream unit with potential partners including US energy producer APA Corp., people with knowledge of the matter said, as it seeks ways to list the business in New York. The Spanish oil and gas company has held exploratory discussions with APA, formerly known as Apache Corp., about the possibility of a deal, according to the people. It has also held initial talks with other potential merger partners for the business, they said.  Any deal could help Repsol bulk up the portfolio of its upstream business and provide it a faster route to becoming publicly traded.  APA shares surged as much as 7.3 percent in New York. The stock has gained about 16 percent over the past 12 months, giving the company a market value of roughly $9 billion. Repsol shares gained as much as 2.2 percent.  Repsol agreed in 2022 to sell a 25 percent stake in the upstream division to private equity firm EIG Global Energy Partners LLC in a deal valuing the business at $19 billion including debt. The transaction was aimed at helping the unit further expand in the US, while also raising funds for Repsol to invest in low-carbon activities.  Executives have said they’re preparing the upstream unit for a potential “liquidity event,” such as a public listing, in 2026. Repsol Chief Executive Officer Josu Jon Imaz told analysts last month that company is considering options including an IPO of the business, a reverse merger with a US-listed group or the introduction of a new private investor.  Deliberations are ongoing and there’s no certainty they will lead to a transaction, the people said, asking not to be identified because the information is private. Repsol continues to study a variety of options for the business and it may still opt for an

Read More »

Trump Lifts More Arctic Drilling Curbs

The Trump administration rescinded restrictions on oil drilling in Alaska’s mammoth state petroleum reserve, reversing a move by former President Joe Biden that put an estimated 8.7 billion barrels of recoverable oil off limits. The policy reversal finalized Thursday applies to the 23 million-acre National Petroleum Reserve-Alaska.  Biden in 2024, designated 13 million acres of the reserve as “special areas,” limiting future oil and gas leasing, while maintaining leasing prohibitions on 10.6 million acres of the NPR-A. The move complicated future oil drilling and production in the reserve, where ConocoPhillips is pushing to explore for more oil near its Willow project. Other active companies have included Santos Ltd., Repsol SA and Armstrong Oil & Gas Inc. The US Interior Department had already reopened the nearby Arctic National Wildlife Refuge to oil and gas leasing, following a directive Donald Trump issued after his inauguration. Increasing US production of fossil fuels has been at the center of Trump’s energy agenda, starting with an early executive order compelling a host of policy changes meant to expand Alaska’s oil, natural gas and mineral development. “This action restores common-sense management and ensures responsible development benefits both Alaska and the nation,” Interior Secretary Doug Burgum said in a statement, adding that the latest move would “strengthen American Energy Dominance and reduce reliance on foreign oil.” Alaska has forecast that crude production from the reserve will climb to 139,600 barrels per day in fiscal 2033, up from 15,800 barrels per day in fiscal 2023. The Interior Department announced last month it was opening the entire coastal plain of Alaska’s Arctic National Wildlife Refuge, some 1.56 million acres, to oil and gas leasing and planned to hold a lease sale this winter in the state petroleum reserve. What do you think? We’d love to hear from you, join the conversation

Read More »

TotalEnergies Wins 15-Year Google Contract to Supply Renewable Power

TotalEnergies SE has signed a deal to supply Google a total of 1.5 terawatt hours (TWh) of certified green electricity for 15 years to support the tech giant’s data center operations in Ohio. The power will come from the Montpelier solar project in Ohio, which is “nearing completion” and will be connected to the PJM grid system, a joint statement said. “The deal reflects Google’s strategy of enabling new, carbon-free energy to the grid systems where they operate”, the statement said. “It also aligns with TotalEnergies’ strategy to deliver tailored energy solutions for data centers, which accounted for almost three percent of the world’s energy demand in 2024”. “TotalEnergies is deploying a 10-GW portfolio in the United States, with onshore solar, wind and battery storage projects, one GW of which is located in the PJM market in the northeast of the country, and four GW on the ERCOT market in Texas”, the statement added. Stephane Michel, TotalEnergies president for gas, renewables and power at TotalEnergies, said, “This agreement illustrates TotalEnergies’ ability to meet the growing energy demands of major tech companies by leveraging its integrated portfolio of renewable and flexible assets. It also contributes to achieving our target of 12 percent profitability in the power sector”. This is the second data-center green power supply agreement announced by TotalEnergies this month. On November 4 it said it had bagged a 10-year contract to supply Data4 data centers in Spain with a total of 610 gigawatt hours (GWh) of renewable electricity starting 2026. The power will come from Spanish wind and solar farms with a combined capacity of 30 MW. The plants “are about to start production”, a joint statement said. “As European leader in the data center industry, Data4 is now established in six countries, and announced its plan to invest nearly EUR 2 billion [$2.32 billion] by 2030 to

Read More »

Meren Bumps Up Production Guidance

Meren Energy Inc on Thursday raised its projected entitlement output for 2025 from 32,000-37,000 barrels of oil equivalent per day (boepd) to 34,500-37,500 boepd. The Vancouver, Canada-based company, which explores and develops oil and gas in Africa, also revised up its forecast for working-interest production from 28,000-33,000 boepd to 30,000-33,000 boepd. Meren, which currently derives its production offshore Nigeria, defines entitlement production as “calculated using the economic interest methodology and includes cost recovery oil, royalty oil and profit oil”. Working-interest production, according to Meren, is derived by multiplying project volumes by the company’s effective working interest in each license. In the third quarter, Meren, which this year rebranded from Africa Oil Corp, produced 35,600 boepd, down from 41,200 boepd in Q3 2024. Meren derives its production from Akpo and Egina, both operated by TotalEnergies SE, and Chevron Corp-operated Agbami. Production enhancement and exploration activities are progressing in the fields. “Following the break to the Akpo/Egina (PPL 2/3) drilling campaign in Q3 2025, efforts are underway to recommence the campaign”, Meren said. “As previously communicated, this break will allow for the interpretation of 4D seismic data to enhance the maturation of future infill well opportunities. Accordingly, the aim is to secure a deepwater drilling rig within the gap and start with the drilling of the Akpo Far East near-field prospect, followed by the drilling of further development wells on Akpo and Egina fields. “Akpo Far East is an infrastructure-led exploration opportunity that in case of commercial exploration success, presents an attractive short cycle, high-return investment opportunity that would utilize the existing Akpo facilities. Akpo Far East prospect has an unrisked, best estimate, gross field prospective resource volume of 143.6 MMboe. The targeted hydrocarbons are predicted to be light, high gas-oil-ratio oil equivalent to those found in the Akpo field. If successful,

Read More »

Arista, Palo Alto bolster AI data center security

“Based on this inspection, the NGFW creates a comprehensive, application-aware security policy. It then instructs the Arista fabric to enforce that policy at wire speed for all subsequent, similar flows,” Kotamraju wrote. “This ‘inspect-once, enforce-many’ model delivers granular zero trust security without the performance bottlenecks of hairpinning all traffic through a firewall or forcing a costly, disruptive network redesign.” The second capability is a dynamic quarantine feature that enables the Palo Alto NGFWs to identify evasive threats using Cloud-Delivered Security Services (CDSS). “These services, such as Advanced WildFire for zero-day malware and Advanced Threat Prevention for unknown exploits, leverage global threat intelligence to detect and block attacks that traditional security misses,” Kotamraju wrote. The Arista fabric can intelligently offload trusted, high-bandwidth “elephant flows” from the firewall after inspection, freeing it to focus on high-risk traffic. When a threat is detected, the NGFW signals Arista CloudVision, which programs the network switches to automatically quarantine the compromised workload at hardware line-rate, according to Kotamraju: “This immediate response halts the lateral spread of a threat without creating a performance bottleneck or requiring manual intervention.” The third feature is unified policy orchestration, where Palo Alto Networks’ management plane centralizes zone-based and microperimeter policies, and CloudVision MSS responds with the offload and enforcement of Arista switches. “This treats the entire geo-distributed network as a single logical switch, allowing workloads to be migrated freely across cloud networks and security domains,” Srikanta and Barbieri wrote. Lastly, the Arista Validated Design (AVD) data models enable network-as-a-code, integrating with CI/CD pipelines. AVDs can also be generated by Arista’s AVA (Autonomous Virtual Assist) AI agents that incorporate best practices, testing, guardrails, and generated configurations. “Our integration directly resolves this conflict by creating a clean architectural separation that decouples the network fabric from security policy. This allows the NetOps team (managing the Arista

Read More »

AMD outlines ambitious plan for AI-driven data centers

“There are very beefy workloads that you must have that performance for to run the enterprise,” he said. “The Fortune 500 mainstream enterprise customers are now … adopting Epyc faster than anyone. We’ve seen a 3x adoption this year. And what that does is drives back to the on-prem enterprise adoption, so that the hybrid multi-cloud is end-to-end on Epyc.” One of the key focus areas for AMD’s Epyc strategy has been our ecosystem build out. It has almost 180 platforms, from racks to blades to towers to edge devices, and 3,000 solutions in the market on top of those platforms. One of the areas where AMD pushes into the enterprise is what it calls industry or vertical workloads. “These are the workloads that drive the end business. So in semiconductors, that’s telco, it’s the network, and the goal there is to accelerate those workloads and either driving more throughput or drive faster time to market or faster time to results. And we almost double our competition in terms of faster time to results,” said McNamara. And it’s paying off. McNamara noted that over 60% of the Fortune 100 are using AMD, and that’s growing quarterly. “We track that very, very closely,” he said. The other question is are they getting new customer acquisitions, customers with Epyc for the first time? “We’ve doubled that year on year.” AMD didn’t just brag, it laid out a road map for the next two years, and 2026 is going to be a very busy year. That will be the year that new CPUs, both client and server, built on the Zen 6 architecture begin to appear. On the server side, that means the Venice generation of Epyc server processors. Zen 6 processors will be built on 2 nanometer design generated by (you guessed

Read More »

Building the Regional Edge: DartPoints CEO Scott Willis on High-Density AI Workloads in Non-Tier-One Markets

When DartPoints CEO Scott Willis took the stage on “the Distributed Edge” panel at the 2025 Data Center Frontier Trends Summit, his message resonated across a room full of developers, operators, and hyperscale strategists: the future of AI infrastructure will be built far beyond the nation’s tier-one metros. On the latest episode of the Data Center Frontier Show, Willis expands on that thesis, mapping out how DartPoints has positioned itself for a moment when digital infrastructure inevitably becomes more distributed, and why that moment has now arrived. DartPoints’ strategy centers on what Willis calls the “regional edge”—markets in the Midwest, Southeast, and South Central regions that sit outside traditional cloud hubs but are increasingly essential to the evolving AI economy. These are not tower-edge micro-nodes, nor hyperscale mega-campuses. Instead, they are regional data centers designed to serve enterprises with colocation, cloud, hybrid cloud, multi-tenant cloud, DRaaS, and backup workloads, while increasingly accommodating the AI-driven use cases shaping the next phase of digital infrastructure. As inference expands and latency-sensitive applications proliferate, Willis sees the industry’s momentum bending toward the very markets DartPoints has spent years cultivating. Interconnection as Foundation for Regional AI Growth A key part of the company’s differentiation is its interconnection strategy. Every DartPoints facility is built to operate as a deeply interconnected environment, drawing in all available carriers within a market and stitching sites together through a regional fiber fabric. Willis describes fiber as the “nervous system” of the modern data center, and for DartPoints that means creating an interconnection model robust enough to support a mix of enterprise cloud, multi-site disaster recovery, and emerging AI inference workloads. The company is already hosting latency-sensitive deployments in select facilities—particularly inference AI and specialized healthcare applications—and Willis expects such deployments to expand significantly as regional AI architectures become more widely

Read More »

Key takeaways from Cisco Partner Summit

Brian Ortbals, senior vice president from World Wide Technology, which is one of Cisco’s biggest and most important partners stated: “Cisco engaged partners early in the process and took our feedback along the way. We believe now is the right time for these changes as it will enable us to capitalize on the changes in the market.” The reality is, the more successful its more-than-half-a-million partners are, the more successful Cisco will be. Platform approach is coming together When Jeetu Patel took the reigns as chief product officer, one of his goals was to make the Cisco portfolio a “force multiple.” Patel has stated repeatedly that, historically, Cisco acted more as a technology holding company with good products in networking, security, collaboration, data center and other areas. In this case, product breadth was not an advantage, as everything must be sold as “best of breed,” which is a tough ask of the salesforce and partner community. Since then, there have been many examples of the coming together of the portfolio to create products that leverage the breadth of the platform. The latest is the Unified Edge appliance, an all-in-one solution that brings together compute, networking, storage and security. Cisco has been aggressive with AI products in the data center, and Cisco Unified Edge compliments that work with a device designed to bring AI to edge locations. This is ideally suited for retail, manufacturing, healthcare, factories and other industries where it’s more cost effecting and performative to run AI where the data lives.

Read More »

AI networking demand fueled Cisco’s upbeat Q1 financials

Customers are very focused on modernizing their network infrastructure in the enterprise in preparation for inferencing and AI workloads, Robbins said. “These things are always multi-year efforts,” and this is only the beginning, Robbins said. The AI opportunity “As we look at the AI opportunity, we see customer use cases growing across training, inferencing, and connectivity, with secure networking increasingly critical as workloads move from the data center to end users, devices, and agents at the edge,” Robbins said. “Agents are transforming network traffic from predictable bursts to persistent high-intensity loads, with agentic AI queries generating up to 25 times more network traffic than chatbots.” “Instead of pulling data to and from the data center, AI workloads require models and infrastructure to be closer to where data is created and decisions are made, particularly in industries such as retail, healthcare, and manufacturing.” Robbins pointed to last week’s introduction of Cisco Unified Edge, a converged platform that integrates networking, compute and storage to help enterprise customers more efficiently handle data from AI and other workloads at the edge. “Unified Edge enables real-time inferencing for agentic and physical AI workloads, so enterprises can confidently deploy and manage AI at scale,” Robbins said. On the hyperscaler front, “we see a lot of solid pipeline throughout the rest of the year. The use cases, we see it expanding,” Robbins said. “Obviously, we’ve been selling networking infrastructure under the training models. We’ve been selling scale-out. We launched the P200-based router that will begin to address some of the scale-across opportunities.” Cisco has also seen great success with its pluggable optics, Robbins said. “All of the hyperscalers now are officially customers of our pluggable optics, so we feel like that’s a great opportunity. They not only plug into our products, but they can be used with other companies’

Read More »

When the Cloud Leaves Earth: Google and NVIDIA Test Space Data Centers for the Orbital AI Era

On November 4, 2025, Google unveiled Project Suncatcher, a moonshot research initiative exploring the feasibility of AI data centers in space. The concept envisions constellations of solar-powered satellites in Low Earth Orbit (LEO), each equipped with Tensor Processing Units (TPUs) and interconnected via free-space optical laser links. Google’s stated objective is to launch prototype satellites by early 2027 to test the idea and evaluate scaling paths if the technology proves viable. Rather than a commitment to move production AI workloads off-planet, Suncatcher represents a time-bound research program designed to validate whether solar-powered, laser-linked LEO constellations can augment terrestrial AI factories, particularly for power-intensive, latency-tolerant tasks. The 2025–2027 window effectively serves as a go/no-go phase to assess key technical hurdles including thermal management, radiation resilience, launch economics, and optical-link reliability. If these milestones are met, Suncatcher could signal the emergence of a new cloud tier: one that scales AI with solar energy rather than substations. Inside Google’s Suncatcher Vision Google has released a detailed technical paper titled “Towards a Future Space-Based, Highly Scalable AI Infrastructure Design.” The accompanying Google Research blog describes Project Suncatcher as “a moonshot exploring a new frontier” – an early-stage effort to test whether AI compute clusters in orbit can become a viable complement to terrestrial data centers. The paper outlines several foundational design concepts: Orbit and Power Project Suncatcher targets Low Earth Orbit (LEO), where solar irradiance is significantly higher and can remain continuous in specific orbital paths. Google emphasizes that space-based solar generation will serve as the primary power source for the TPU-equipped satellites. Compute and Interconnect Each satellite would host Tensor Processing Unit (TPU) accelerators, forming a constellation connected through free-space optical inter-satellite links (ISLs). Together, these would function as a disaggregated orbital AI cluster, capable of executing large-scale batch and training workloads. Downlink

Read More »

Microsoft will invest $80B in AI data centers in fiscal 2025

And Microsoft isn’t the only one that is ramping up its investments into AI-enabled data centers. Rival cloud service providers are all investing in either upgrading or opening new data centers to capture a larger chunk of business from developers and users of large language models (LLMs).  In a report published in October 2024, Bloomberg Intelligence estimated that demand for generative AI would push Microsoft, AWS, Google, Oracle, Meta, and Apple would between them devote $200 billion to capex in 2025, up from $110 billion in 2023. Microsoft is one of the biggest spenders, followed closely by Google and AWS, Bloomberg Intelligence said. Its estimate of Microsoft’s capital spending on AI, at $62.4 billion for calendar 2025, is lower than Smith’s claim that the company will invest $80 billion in the fiscal year to June 30, 2025. Both figures, though, are way higher than Microsoft’s 2020 capital expenditure of “just” $17.6 billion. The majority of the increased spending is tied to cloud services and the expansion of AI infrastructure needed to provide compute capacity for OpenAI workloads. Separately, last October Amazon CEO Andy Jassy said his company planned total capex spend of $75 billion in 2024 and even more in 2025, with much of it going to AWS, its cloud computing division.

Read More »

John Deere unveils more autonomous farm machines to address skill labor shortage

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Self-driving tractors might be the path to self-driving cars. John Deere has revealed a new line of autonomous machines and tech across agriculture, construction and commercial landscaping. The Moline, Illinois-based John Deere has been in business for 187 years, yet it’s been a regular as a non-tech company showing off technology at the big tech trade show in Las Vegas and is back at CES 2025 with more autonomous tractors and other vehicles. This is not something we usually cover, but John Deere has a lot of data that is interesting in the big picture of tech. The message from the company is that there aren’t enough skilled farm laborers to do the work that its customers need. It’s been a challenge for most of the last two decades, said Jahmy Hindman, CTO at John Deere, in a briefing. Much of the tech will come this fall and after that. He noted that the average farmer in the U.S. is over 58 and works 12 to 18 hours a day to grow food for us. And he said the American Farm Bureau Federation estimates there are roughly 2.4 million farm jobs that need to be filled annually; and the agricultural work force continues to shrink. (This is my hint to the anti-immigration crowd). John Deere’s autonomous 9RX Tractor. Farmers can oversee it using an app. While each of these industries experiences their own set of challenges, a commonality across all is skilled labor availability. In construction, about 80% percent of contractors struggle to find skilled labor. And in commercial landscaping, 86% of landscaping business owners can’t find labor to fill open positions, he said. “They have to figure out how to do

Read More »

2025 playbook for enterprise AI success, from agents to evals

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More 2025 is poised to be a pivotal year for enterprise AI. The past year has seen rapid innovation, and this year will see the same. This has made it more critical than ever to revisit your AI strategy to stay competitive and create value for your customers. From scaling AI agents to optimizing costs, here are the five critical areas enterprises should prioritize for their AI strategy this year. 1. Agents: the next generation of automation AI agents are no longer theoretical. In 2025, they’re indispensable tools for enterprises looking to streamline operations and enhance customer interactions. Unlike traditional software, agents powered by large language models (LLMs) can make nuanced decisions, navigate complex multi-step tasks, and integrate seamlessly with tools and APIs. At the start of 2024, agents were not ready for prime time, making frustrating mistakes like hallucinating URLs. They started getting better as frontier large language models themselves improved. “Let me put it this way,” said Sam Witteveen, cofounder of Red Dragon, a company that develops agents for companies, and that recently reviewed the 48 agents it built last year. “Interestingly, the ones that we built at the start of the year, a lot of those worked way better at the end of the year just because the models got better.” Witteveen shared this in the video podcast we filmed to discuss these five big trends in detail. Models are getting better and hallucinating less, and they’re also being trained to do agentic tasks. Another feature that the model providers are researching is a way to use the LLM as a judge, and as models get cheaper (something we’ll cover below), companies can use three or more models to

Read More »

OpenAI’s red teaming innovations define new essentials for security leaders in the AI era

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More OpenAI has taken a more aggressive approach to red teaming than its AI competitors, demonstrating its security teams’ advanced capabilities in two areas: multi-step reinforcement and external red teaming. OpenAI recently released two papers that set a new competitive standard for improving the quality, reliability and safety of AI models in these two techniques and more. The first paper, “OpenAI’s Approach to External Red Teaming for AI Models and Systems,” reports that specialized teams outside the company have proven effective in uncovering vulnerabilities that might otherwise have made it into a released model because in-house testing techniques may have missed them. In the second paper, “Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning,” OpenAI introduces an automated framework that relies on iterative reinforcement learning to generate a broad spectrum of novel, wide-ranging attacks. Going all-in on red teaming pays practical, competitive dividends It’s encouraging to see competitive intensity in red teaming growing among AI companies. When Anthropic released its AI red team guidelines in June of last year, it joined AI providers including Google, Microsoft, Nvidia, OpenAI, and even the U.S.’s National Institute of Standards and Technology (NIST), which all had released red teaming frameworks. Investing heavily in red teaming yields tangible benefits for security leaders in any organization. OpenAI’s paper on external red teaming provides a detailed analysis of how the company strives to create specialized external teams that include cybersecurity and subject matter experts. The goal is to see if knowledgeable external teams can defeat models’ security perimeters and find gaps in their security, biases and controls that prompt-based testing couldn’t find. What makes OpenAI’s recent papers noteworthy is how well they define using human-in-the-middle

Read More »