DataScava

DataScava DataScava is an unstructured text data miner built on patented deterministic methods that keep the human in command.

DataScava is an advanced unstructured text mining solution that pinpoints high-quality data with user-controlled business and domain language. It evolved from TalentBrowser, where we patented deterministic methods to structure, measure, filter, and curate nonlinear content — without requiring training data or manual labeling. By enabling you to harness your expertise — working standalone or alongs

ide other solutions — DataScava helps you:

* Structure, measure, filter, match, route, sort, and rank raw text automatically
* Feed explainable, auditable outputs into AI, LLMs, ML, RPA, BI, Research, TA, and BAU applications
* Create domain-specific data pipelines upstream, audit and measure results downstream
* Get structured, high-quality datasets and outputs you can act on
* Stay in control with results you can see, refine, and trust

How It Works:

DataScava applies three complementary methodologies that focus on your business language and expertise, not generic language models:

DSLP | Domain-Specific Language Processing – Structures and measures user-defined key terms exactly, with no disambiguation

TTT | Tailored Topics Taxonomies – Import, build, or select vocabularies and data types that reflect your expertise

WTS | Weighted Topic Scoring – Prioritize outcomes with transparent, explainable scoring that reflects your rules and thresholds

Together, they form a patented approach we call “Profile Matching of Unstructured Documents” — modeled after a contour profile gauge carpentry tool, because DataScava measures language based on your priorities. The DataScava Difference:

Less time, more accuracy – Filters and categorizes automatically
Precision at scale – Numeric results you can trust across industries

Transparency over black-box AI – See and audit exactly why a file matched
Scalable and domain-specific – Refine vocabularies, taxonomies, and scoring

Human in Command – Automation works alongside your expertise

Here's our article published in CDO Magazine: "Machines in the Conversation:  The Case for a More Data-Centric AI," comm...
06/27/2023

Here's our article published in CDO Magazine: "Machines in the Conversation: The Case for a More Data-Centric AI," commissioned by DataScava and TalentBrowser from Scott Spangler, former IBM Watson Chief Data Scientist, named IBM Engineer, and author of the book Mining the Talk: Unlocking the Business Value in Unstructured Information. See our DataScava website for Scott's full series of articles about our three methods that pinpoint the data you care about:

DSIndex Domain-Specific Language Processing (alternative to NLP)
DSTopics Tailored Topics Taxonomies (encapsulate your expertise)
DSMatch Weighted Topic Scoring (filter and match using your criteria)

https://www.cdomagazine.tech//opinion-analysis/machines-in-the-conversation-the-case-for-a-more-data-centric-ai

In this article, Scott explains the drawbacks of using a pure Machine Learning/NLP approach to RPA and then explains the...
04/26/2023

In this article, Scott explains the drawbacks of using a pure Machine Learning/NLP approach to RPA and then explains the need for customer understanding through three fundamental capabilities: classification of content, characterization of the customer, and customization of features. Scott illustrates how DataScava technology can be employed to fill in these critical gaps and provide a better customer experience by readily capturing existing in-house expertise.

DataScava commissioned named IBM Distinguished Engineer Scott Spangler to discuss his views on the difference between knowing and understanding when it comes to implementing Robot Process Automation. Scott is a former IBM Watson Health Researcher, Chief Data Scientist, and the author of the book “...

DataScava commissioned named IBM Distinguished Engineer Scott Spangler to discuss his views on the importance of fully u...
04/26/2023

DataScava commissioned named IBM Distinguished Engineer Scott Spangler to discuss his views on the importance of fully utilizing Unstructured Data in Business Intelligence analytics. Scott is a former IBM Watson Health Researcher, Chief Data Scientist, and the author of the book "Mining the Talk: Unlocking the Business Value in Unstructured Information."

This article provides an overview of the most common current approaches to analyzing unstructured information in BI — Machine Learning, Generic Taxonomies and Text Mining — highlighting the specific drawbacks of each. It then explains the importance of subject matter expert-driven taxonomies and illustrates how DataScava can be used to build and deploy these taxonomies at scale and mine unstructured data to maximize business value.

DataScava commissioned named IBM Distinguished Engineer Scott Spangler to discuss his views on the importance of fully utilizing Unstructured Data in Business Intelligence analytics. Scott is a former IBM Watson Health Researcher, Chief Data Scientist, and the author of the book "Mining the Talk: Un

I'm here with Scott Spangler, former IBM Watson Health Researcher, Chief Data Scientist, Distinguished Engineer and auth...
04/26/2023

I'm here with Scott Spangler, former IBM Watson Health Researcher, Chief Data Scientist, Distinguished Engineer and author of “Mining the Talk: Unlocking the Business Value in Unstructured Information.” We’re discussing how DataScava's domain-specific approach to unstructured text mining complements real-world big data applications in Artificial Intelligence and Machine Learning.

November 15, 2021 I'm here with Scott Spangler, former IBM Watson Health Researcher, Chief Data Scientist, Distinguished Engineer and author of “Mining the Talk: Unlocking the Business Value in Unstructured Information.” We’re discussing how DataScava's domain-specific approach to unstructured...

Here's our new article "Machines in the Conversation: The Case for a more Data-Centric AI."DataScava commissioned named ...
02/26/2023

Here's our new article "Machines in the Conversation: The Case for a more Data-Centric AI."

DataScava commissioned named IBM Distinguished Engineer Scott Spangler to discuss his views on the latest developments in generative AI. Scott is a former IBM Chief Data Scientist and author of the book "Mining the Talk: Unlocking the Business Value in Unstructured Information."

In this article, he argues that too much focus on generative AI distracts from the important value a more Data-Centric AI approach can provide to business applications. He then discusses the key technologies that we use to enable such an approach within the organization:

✔️ Topic models which reflect the primary areas of focus;
✔️ Flexible topic scoring to encode the organization's priorities;
✔️ Customized text processing that mirrors the way people actually communicate in the industry.

Learn more at https://datascava.com/

DataScava commissioned named IBM Distinguished Engineer Scott Spangler to discuss his views on the latest developments in generative AI. Scott is a former IBM Chief Data Scientist and author of the book Mining the Talk: Unlocking the Business Value in Unstructured Information.

DataScava commissioned named IBM Distinguished Engineer Scott Spangler to discuss his views on the latest developments i...
02/26/2023

DataScava commissioned named IBM Distinguished Engineer Scott Spangler to discuss his views on the latest developments in generative AI. Scott is a former IBM Chief Data Scientist and author of the book "Mining the Talk: Unlocking the Business Value in Unstructured Information."

In this article, he argues that too much focus on generative AI distracts from the important value a more Data-Centric AI approach can provide to business applications. He then discusses the key technologies that we use to enable such an approach within the organization:

✔️ Topic models which reflect the primary areas of focus;
✔️ Flexible topic scoring to encode the organization's priorities;
✔️ Customized text processing that mirrors the way people actually communicate in the industry.

DataScava commissioned named IBM Distinguished Engineer Scott Spangler to discuss his views on the latest developments in generative AI. Scott is a former IBM Chief Data Scientist and author of the book Mining the Talk: Unlocking the Business Value in Unstructured Information.

This is the * first *  in a series of articles DataScava commissioned from the brilliant Scott Spangler, former IBM Wats...
03/17/2022

This is the * first * in a series of articles DataScava commissioned from the brilliant Scott Spangler, former IBM Watson Health Researcher, Chief Data Scientist, and author of the book 💥 “Mining the Talk: Unlocking the Business Value in Unstructured Information.” 💥

Scott discusses how DataScava's patented domain-specific approach to unstructured text data mining complements real-world applications in AI, ML, RPA, BI, Talent and Research. Read his Q&A and the full series on our website.

DataScava commissioned named IBM Distinguished Engineer Scott Spangler to discuss how our patented domain-specific approach to unstructured text mining…

"As companies double down on business initiatives built around technologies like predictive analytics, machine learning,...
01/12/2020

"As companies double down on business initiatives built around technologies like predictive analytics, machine learning, and cognitive computing, there’s one element they ignore at their peril — humans," according to two MIT experts. Here are their 5 steps to keep your AI "people-centered."

1. Classify what you’re trying to accomplish with AI.
2. Embrace transparency, explainability and reversibility.
3. Establish data advocates because “garbage in, garbage out” holds.
4. Practice “mindful monitoring” of your data sets.
5. Ground your expectations and focus on a people-oriented AI agenda.

Artificial intelligence should amplify human strengths. Here’s a roadmap for developing systems that do just that.

Got unstructured text data? Check out our guest post for KDnuggets describing how our Domain-Specific Language Processin...
08/15/2019

Got unstructured text data? Check out our guest post for KDnuggets describing how our Domain-Specific Language Processing and patented Weighted Topic Scoring can be your alternative or adjunct to NLP to mine valuable information from data by following your guidance and using the language of your business.

Processing unstructured text data in real-time is challenging when applying NLP or NLU. Find out how Domain-Specific Language Processing can also help mine valuable information from data by following your guidance and using the language of your business.

DataScava's patented “Weighted Topic Scoring” finds the most relevant documents from large data sets, providing highly p...
06/06/2019

DataScava's patented “Weighted Topic Scoring” finds the most relevant documents from large data sets, providing highly precise results you can see, control and measure. Visit our website to learn how https://datascava.com/.

If your performance plan includes any of the following:1. Increase Efficiency of Data Scientists  2. Increase Accuracy o...
04/16/2019

If your performance plan includes any of the following:

1. Increase Efficiency of Data Scientists
2. Increase Accuracy of unstructured data Searches
3. Reduce Risk of AI/ML Suggested Actions
4. Satisfy Regulators requirements for explaining AI/ML output
5. Build leading edge systems to identify the most viable documents within large unstructured data sets

If you strive for Excellence in any of these areas..... read on.

Real-time mining of unstructured textual content isn’t simple. Available solutions don’t work well unless they’re fine-tuned to meet your specific needs and address the unique quirks in your company’s information.

Address

315 Madison Avenue
New York, NY
10017

Opening Hours

Monday 8am - 6pm
Tuesday 8am - 6pm
Wednesday 8am - 6pm
Thursday 8am - 6pm
Friday 8am - 6pm

Alerts

Be the first to know and let us send you an email when DataScava posts news and promotions. Your email address will not be used for any other purpose, and you can unsubscribe at any time.

Contact The Business

Send a message to DataScava:

Share