As business increasingly rely on data to drive decisions, the ability to extract meaningful insights from unstructured text is becoming essential. snowflake cortex, a suite of AI capabilities integrated into the snowflake Data Cloud, empowers users to perform advanced text analysis without needing extensive machine learning expertise. In my recent work, I explored Snowflake Cortex’s translate, summarize, sentiment, and classify_text functions and built two small basic project to analyse sentiments and classify text. This blog shares my experience, code snippets, and insights from leveraging these powerful tools.
Snowflake Cortex is an intelligent, fully managed service that offers machine learning and AI solutions to Snowflake Users.
SQL & Python functions that leverages large-language-models for understanding, querying, translating, summarizing and generating free-form text.
SQL functions that performs predicitive analysis such as forecasting and anomaly detection such as forecasting and anomaly detection using machine learning to help gain insights into your structured data and accelerate everyday analytics.
Snowflake Cortex provides a set of serverless functions that leverage advanced large language models (LLMs) to process and analyze text directly within snowflake. These functions are accessible via SQL, making them easy to integrate into existing data workflows. let’s begin with creating a database. Use Public Schema.
The SNOWFLAKE.CORTEX.TRANSLATE function in Snowflake Cortex can translate text into a wide range of languages, but the exact number of supported languages is not explicitly stated as a fixed number in the Snowflake Documentation. Based on the documentation and typical capabilities of large language models(LLMs) used by Cortex, it supports at least 100 languages, covering most widely spoken and many less common language codes.
Snowflake Cortex’s TRANSLATE function uses ISO-639-1language codes for specifying source and target languages. Below is a list of some commonly supported language codes for the SNOWFLAKE.CORTEX.TRANSLATE function, based on Snowflake’s documentation and typical LLM capabilities: en: English; de:German; es:Spanish; fr:French; it:Italian; ja: Japanese; ko:Korean; pt: Portuguese; zh:Chinese (Simplified); ru: Russian; ar: Arabic; hi:Hindi; nl: Dutch; sv: Swedish; pl: Polish .
The SNOWFLAKE.CORTEX.SUMMARIZE function takes a piece of text as input and produces a shorter version that captures the key points or main ideas. It leverages advanced LLMs (e.g., Mixtral, Llama, or others supported by Cortex) to understand the context and meaning of the text, ensuring the summary is coherent and relevant. The function is executed via SQL, making it accessible to users familiar with Snowflake’s query interface, and it operates serverlessly, so no external infrastructure is needed.
Purpose: To reduce long text into a concise summary, saving time and enabling quick insights.
Key Benefit: It automates text summarization, eliminating the need for manual review or complex NLP pipelines.
Use Cases:
The SNOWFLAKE.CORTEX.SENTIMENT function in Snowflake Cortex is a machine learning tool that analyses the emotional tone of text and returns a sentiment score or label indicating whether the text expresses a positive, negative, or neutral sentiment. It’s part of Snowflake Cortex’s suite of AI and ML functions, allowing users to perform sentiment analysis directly within the Snowflake Data Cloud using SQL, without needing external tools or infrastructure.
Purpose: To evaluate the sentiment of text data, such as customer reviews, support tickets, or social media posts, to understand emotions or opinions.
How It Works: The function uses a large language model (LLM) or specialized ML model to process the input text, analysing its content, context, and tone. It assigns a numerical score or categorical label based on the emotional sentiment.
Use Cases:
The SNOWFLAKE.CORTEX.CLASSIFY_TEXT function in Snowflake Cortex is a machine learning tool designed to categorize text into user-defined labels or categories based on its content. It’s part of Snowflake Cortex’s suite of AI and ML functions, enabling users to perform text classification tasks directly within the Snowflake Data Cloud using SQL, without needing external machine learning frameworks or infrastructure.
Purpose: To automatically assign text to one or more predefined categories (labels), such as “positive/negative,” “urgent/non-urgent,” or custom labels like “complaint,” “praise,” or “inquiry,” based on its meaning and context.
How It Works: The function leverages a large language model (LLM) or specialized classification model to analyze the input text’s semantics, tone, and patterns. It compares the text to the provided labels and selects the most appropriate one(s) based on learned patterns or model training.
Benefits:
Use Cases:
This is a small project to give an idea on Sentiment Analysis, and working of the Cortex.Sentiment. I analyzed sentiments from a dataset of social media comments. The goal was to compute sentiment scores and categorize comments as positive, negative, or neutral.
Load the data into the table using the copy into command.
Add a column to the table, sentiment score for sentiment analysis.
Update the column by using snowflake.cortex.sentiment(columns_name);
Add a column for classify comments based on their sentiment scores and categorized them.
Check the results and distributions by grouping the comments by classification type.
The analysis revealed the proportion of positive, negative, and neutral comments, which could help businesses gauge public sentiment toward a product or topic. The sentiment function’s accuracy in detecting nuanced tones was impressive, though I noticed it occasionally struggled with sarcasm or mixed emotions, which is a known challenge in sentiment analysis.
In this, I used the classify_text function to classify IMDB movie reviews as positive or negative, leveraging a dataset of reviews.
Add a column for classification and used the classify_text function to label reviews.
I inspected the assigned classification with the original sentiment labels to evaluate accuracy.
The classify_text function performed well in assigning accurate labels to reviews, especially when the text clearly expressed a positive or negative opinion. The ability to customize the task description and labels made the function versatile. However, I observed that very short reviews or those with ambiguous languages sometimes led to misclassifications, highlighting the importance or clear input text.
Working with Snowflake Cortex was a seamless experience, thanks to its SQL- based interface and integration with Snowflake’s data platform.
Snowflake Cortex is a game-changer for organizations looking to incorporate AI-driven text analysis into their data workflows. My experiments with translate, summarize, sentiment, and classify_text, along with the two projects, demonstrated how these tools can unlock insights from text data with minimal effort. Whether you’re analyzing customer feedback, summarizing reports, or classifying reviews, Cortex provides a robust, scalable solution. I’m excited to explore more advanced use cases, such as combining Cortex with Snowflake’s Snowpark for custom machine learning models, in future projects.
If you’re curious about Snowflake Cortex, I encourage you to try it out in your Snowflake environment. The possibilities for text analysis are vast, and Cortex makes it easier than ever to get started!