Please Note: This schedule is in EST (Eastern Standard Time, UTC-5)
Generative AI Summit 2023
LIVE: Thursday, 20th July
Data Engineering Summit 2023
LIVE: Wednesday, 18th January
LIVE: Thursday, 20th July
LIVE: Wednesday, 18th January
10:00 - 10:05
Welcome To ODSC Generative AI Summit!
10:05 - 10:50
Responsible AI In The Age Of Generative AI Panel

Generative AI has changed the landscape for the ethical and responsible use of AI in business or academic settings. This panel will highlight modern difficulties of implementing generative AI, appropriate and tangible solutions, and what we can look forward to in the future.

Responsible AI In The Age Of Generative AI Panel image
Elizabeth M. Adams
Affiliate Fellow | Stanford Institute (HAI)
Responsible AI In The Age Of Generative AI Panel image
Eli Chen
CTO & Co-Founder | Credo AI
Responsible AI In The Age Of Generative AI Panel image
Tracy Ring
CDO/Global GenAI Lead | Accenture AI
10:20 - 10:50
LLMOps for Enterprise: Key Challenges when Deploying for Production

Generative AI and LLMs are definitely buzzwords, but how can your organisation get value from these potentially game-changing technologies? Furthermore, how can you overcome the key challenges this technology is posing to organisations around the world. In this session, the Seldon technical team are here to cut through the noise and dive into the key opportunities and challenges of Generative AI and LLMs, as well as some of the best practice approaches to deploy these models at scale.

From model inference and environmental impact to audit and privacy, LLMOps is essential for the responsible deployment of LLMs at scale. At Seldon, we’re already helping organisations to deploy LLMs and are helping our customers make this easier, cheaper and faster. We’ve brought together experts from across the Seldon technical team, CTO Clive Cox, MLOps Engineer Sherif Akoush and Solutions Engineer Andrew Wilson, to tackle this tricky topic from all angles.

LLMOps for Enterprise: Key Challenges when Deploying for Production image
Clive Cox
CTO | Seldon
LLMOps for Enterprise: Key Challenges when Deploying for Production image
Andrew Wilson
Solutions Engineer | Seldon
LLMOps for Enterprise: Key Challenges when Deploying for Production image
Sherif Akoush
MLOps Engineer | Seldon
10:55 - 11:25
Unlocking the Power of Generative AI with MosaicML

Generative AI has taken the world by storm; however, challenges with technical complexity, security, and cost have limited its adoption by many organizations. In this session, we will explore how MosaicML’s full-stack platform for generative AI makes it easy and efficient for developers to build and deploy models in a secure environment. We will take a deeper look into real-world examples of how businesses are using their proprietary data to train and deploy LLMs and other generative AI models with MosaicML.

Unlocking the Power of Generative AI with MosaicML image
Hagay Lupesko
VP Engineering | MosiacML
11:00 - 11:30
The MLOps Stack in a Gen-AI World

As companies continue to embrace GenAI models, streamlining ML pipelines and productionizing models becomes crucial to make GenAI models work for you as a business. In a nutshell GenAI MLOps is a comprehensive approach to the Gen AI models pipeline that makes sure each stage of the model pipeline is ready for production and no less important monitored and maintained properly in production. If your models are doing great in experimentation but you are still trying to put all the production pieces together, This session might help you understand what’s going wrong and how to fix it. By working according to this methodology data scientists can iterate rapidly which is at the core of a successful Gen AI project.

Learn how to:
– Maintain a centralized production focused model registry
– Monitor and track your Gen AI models in your production environment
– Enhance your Gen AI capabilities and accuracy in a continuous manner during production

The MLOps Stack in a Gen-AI World image
Yuval Fernbach
Co-founder & CTO | Qwak
11:30 - 12:00
On Brains, Waves and Representations

Generative AI has made enormous strides in recent years. In this talk I will discuss how to build meaningful inductive biases into models for spatio-temporal data domains, such as video. We first generalize the idea of equivariance to a much looser and learnable constraint, and then add a prior that latent variable representations should evolve as PDEs and in particular waves. We find that this idea leads to a new form of disentangling. We also show that it is surprisingly easy to get wavelike dynamics in the latent representations and show that neurons develop a form of orientation selectivity and topography. All in all, we argue that this brain inspired inductive bias might help learning of sequence data.

On Brains, Waves and Representations image
Max Welling Ph.D
Distinguished Scientist | Microsoft Research
11:35 - 12:20
Generative Adversarial Networks 101 (Tutorial)

Generative models are at the heart of DeepFakes, and can be used to synthesize, replace, or swap attributes of images. Learn the basics of Generative Adversarial Networks, the famous GANs, from the ground up: autoencoders, latent spaces, generators, discriminators, GANs, DCGANs, WGANs, and more. The main goal of this session is to show you how GANs work: we will learn about latent spaces and how to use them to generate synthetic data while discussing implementation and training details, such as Wasserstein distance and gradient penalty. We will use Google Colab and work our way together into building and training GANs. You should be comfortable using Jupyter notebooks and Numpy, and training simple models in PyTorch.

Generative Adversarial Networks 101 (Tutorial) image
Daniel Voigt Godoy
Data Scientist And Author | Independent
12:05 - 12:35
Machines vs. Minds: Navigating the Future of Generative AI

What is generative AI? How does machine creativity relate to human creativity? What will become of us in the age of creative machines? We will delve into the essence of generative AI, drawing on comparisons to our own brains. The discussion will delve into the unique strengths of humans and machines, and explore the potential for effective collaboration between us and AI systems. A vision of the future of creativity will be presented, along with a discussion of potential risks brought on by these powerful creative machines.

Machines vs. Minds: Navigating the Future of Generative AI image
Maya Ackerman
CEO and Co-Founder | WaveAI
12:30 - 13:15
Pretrain Vision and Language Foundation Models on AWS (Tutorial)

Whether they are intimidating or exciting, high performant or expensive, the future of machine learning and artificial intelligence is clearly trending towards foundation models. In this session we’ll dive into this topic, exploring both beneficial and challenging aspects of this technology today. In particular we’ll learn about key technologies available on AWS that help you pretrain the foundation models of the future. From distributed training to custom accelerators, reward modeling to reinforcement learning, learn how to create your own state-of-the-art models.

Pretrain Vision and Language Foundation Models on AWS (Tutorial) image
Emily Webber
Principal ML Solutions Architect | AWS
12:40 - 13:10
Generative Large Language Models and Hallucinations

Generative Large Language Models (LLMs) such as GPT4 and ChatGPT have revolutionized the field of artificial intelligence with their impressive capabilities. However, a major challenge that these models present is their tendency to ‘hallucinate’ confidently, meaning they can create plausible-sounding yet false information. For businesses aiming to implement these LLMs into enterprise or end-user applications, it is crucial to address this hallucination problem to ensure the delivery of accurate, reliable information.

This talk aims to delve into the intricacies of the hallucination problem in LLMs and shed light on effective strategies to overcome it. We will explore how LLMs, in their quest to provide relevant and comprehensive responses, often generate information that sounds accurate but may not necessarily be factual or grounded in reality.

The crux of our discussion will be the innovative solution of Truth Checker models. These models serve as a second layer of scrutiny that can discern the accuracy of the information generated by LLMs. By cross-verifying the output against a vast array of trusted and verifiable sources, they ensure the veracity of the data provided by the LLMs.

Generative Large Language Models and Hallucinations image
Chandra Khatri
Co-Founder | Got It AI
13:15 - 13:45
Recent Advances in Diffusion Generative Models

Generative models are typically based on explicit representations of probability distributions (e.g., autoregressive or VAEs) or implicit sampling procedures (e.g., GANs). I will present an alternative approach based on modeling directly the vector field of gradients of the data distribution (scores) which underlies recent score-based diffusion models. This framework allows flexible architectures, requires no sampling during training or the use of adversarial training methods. Additionally, score-based diffusion generative models enable exact likelihood evaluation through connections with neural ODEs, achieving state-of-the-art sample quality and excellent likelihoods on image datasets. I will discuss numerical and distillation methods to accelerate sampling and their application to inverse problem solving.

Recent Advances in Diffusion Generative Models image
Stefano Ermon Ph.D
Assistant Professor | Stanford University
13:20 - 13:50
Generative AI with Hugging Face

Generative AI is a rapidly growing field, but it can be shrouded in mystery and jargon, making it difficult for non-technical professionals to understand. This talk aims to demystify generative AI and introduce you to building generative AI models and applications with Hugging Face open-source solutions. By the end of this talk, you will better understand how generative AI works and how it can be applied in various industries, such as marketing and customer service. They will also have a high-level understanding of the underlying models, which will enable them to make more informed decisions about using generative AI in their businesses.

Generative AI with Hugging Face image
Julien Simon
Chief Evangelist | Hugging Face
13:50 - 14:20
BloombergGPT: A Large Language Model for Finance

The use of NLP in the realm of financial technology is broad and complex, with applications ranging from sentiment analysis and named entity recognition to question answering. Large Language Models (LLMs) have been shown to be effective on a variety of tasks; however, no LLM specialized for the financial domain has been reported in literature. In this work, we present BloombergGPT, a 50 billion parameter language model that is trained on a wide range of financial data. We construct a 363 billion token dataset based on Bloomberg’s extensive data sources, perhaps the largest domain-specific dataset yet, augmented with 345 billion tokens from general purpose datasets. We validate BloombergGPT on standard LLM benchmarks, open financial benchmarks, and a suite of internal benchmarks that most accurately reflect our intended usage. Our mixed dataset training leads to a model that outperforms existing models on financial tasks by significant margins without sacrificing performance on general LLM benchmarks. Additionally, we explain our modeling choices, training process, and evaluation methodology.

BloombergGPT: A Large Language Model for Finance image
Ozan Irsoy Ph.D
Research Scientist | Bloomberg
13:55 - 14:25
How To Train Your Vicuna – Finetuning, Serving, and Evaluating LLMs In The Wild

Since Meta released the Llama weights and OpenAI announced GPT-4, the landscape of open large language models (LLMs) are seeing rapid changes every day.

In this talk, I will talk about our recent experience in finetuning, evaluating, and serving the chatbot, Vicuna, which is considered a high-quality open-source chatbot closest to ChatGPT (GPT-3.5-turbo) even today. I will briefly explain how we curated a high-quality dataset and finetuned llama to Vicuna. I will then discuss how we serve Vicuna, together with many other chatbots in the Chatbot Arena (https://arena.lmsys.org/), achieving high throughput and low latency, given only a limited amount of university-donated GPUs. I’ll also discuss emerging system and ML challenges in serving and evaluating LLMs, and our ongoing effort. This is joint work with members of the LMSYS Org team at https://lmsys.org.

How To Train Your Vicuna – Finetuning, Serving, and Evaluating LLMs In The Wild image
Hao Zhang
Assistant Professor | UCSD
14:25 - 14:55
Matching Identities Using Large Language Models

Almost every application in the world depends on understand the relationships between people and companies. From master data management to anti-money laundering to deduplicating your salesforce instance, many applications depend on the capacity to efficiently search through databases of personal or corporate names and understand who is likely the same entity. A single individual can be referred to by various name variants, which may be written using different scripts, aliases, or nicknames. In this talk, we will introduce a new method for name matching using a large language model that works on the byte level, which we fine-tuned to embed personal names in a vector space for name retrieval tasks. We will outline the fine-tuning process, and discuss results from using test sets with multiple scripts, and comparing our LLM model to some strong baseline models.”

Matching Identities Using Large Language Models image
Catherine Havasi
Chief Of Innovation | Babel Street
Matching Identities Using Large Language Models image
Kfir Bar
Chief Scientist | Babel Street
14:30 - 15:00
Text to Insights: Building Real-Time Analytics Systems with Generative AI

Text to SQL is a long-standing challenge in the NLP community, but advancements in Generative AI, particularly large language models (LLMs), have brought us closer than ever before. Join us to explore the complexities of building Text to SQL systems, focusing on open models.

We will discuss various approaches for constructing these systems, including LLM types, finetuning methods, and data augmentation techniques for training optimal models that generate SQL from text and describe query results. Discover how to avoid pitfalls by using Retrieval augmented generation and providing context with metadata.

Moreover, we will demonstrate how to integrate a text to SQL system with Apache Spark structured streaming to create a real-time insight engine that maintains data freshness. Throughout the session, we will guide you through the end-to-end process of building such a system using open source tools and models.

Text to Insights: Building Real-Time Analytics Systems with Generative AI image
Avinash Sooriyarachchi
Solutions Architect | Databricks
Text to Insights: Building Real-Time Analytics Systems with Generative AI image
Dillon Bostwick
Senior Solutions Architect | Databricks
15:00 - 15:45
Government Policy in Generative AI Panel

With the explosion of AI over the past year – and specifically with generative AI – laws and regulations have been forced to adapt in a short amount of time. In this panel, the experts will discuss the difficulties faced with the rise of Generative AI, what’s being done to address this need for improved governance, and what does the future hold for laws and regulations surrounding artificial intelligence.

Government Policy in Generative AI Panel image
Brian Drake
Federal CTO | Accrete AI
Government Policy in Generative AI Panel image
David Danks Ph.D
Professor | UC San Diego
Government Policy in Generative AI Panel image
Eric Xing Ph.D
Professor | Carnegie Mellon
15:05 - 15:35
Accelerating Virtual Twins using Generative AI and Synthetic Clinical Trial Data

Clinical trial data remain mainly siloed given concerns of patient privacy and clinical trial sponsor identity disclosure. Advances in generative AI have enabled the creation of generative adversarial networks and variational autoencoders to generate synthetic data using real data where these synthetic datasets are able to mimic the properties and trends of real data without disclosing patient-specific information. In the context of healthcare datasets, synthetic data presents a “Virtual Twin” for real clinical trial data, preserving the clinical insights, endpoints and outcomes of interest present in the real clinical trial data while most importantly, protecting patient privacy and trial sponsor anonymity. In this talk, we will discuss (i) open-source generative models to create synthetic data, (ii) cross-industry use cases for synthetic data with a specific focus on healthcare (data augmentation, test data creation, ML model improvements), and (iii) suite of metrics to evaluate synthetic data quality focusing on fidelity, utility, and privacy. The key-takeaways will focus on how synthetic and generative AI can accelerate the growth of a healthy ecosystem for data sharing and continued innovation in healthcare and other industries.

Accelerating Virtual Twins using Generative AI and Synthetic Clinical Trial Data image
Afrah Shafquat Ph.D
Senior Data Scientist II | Medidata AI
15:50 - 16:00
Wrap Up Summary
10:00 - 10:35
Beyond Monitoring: The Rise of Data Observability

Broken data is costly, time-consuming, and nowadays, an all-too-common reality for even the most advanced data teams. In this talk, I’ll introduce this problem, called “data downtime” — periods of time when data is partial, erroneous, missing or otherwise inaccurate — and discuss how to eliminate it in your data ecosystem with end-to-end data observability. Drawing corollaries to application observability in software engineering, data observability is a critical component of the modern DataOps workflow and the key to ensuring data trust at scale. I’ll share why data observability matters when it comes to building a better data quality strategy and highlight tactics you can use to address it today.

Beyond Monitoring: The Rise of Data Observability image
Shane Murray
Field CTO | Monte Carlo
10:40 - 11:15
Streaming Featurization with Ibis, Substrait and Apache Arrow

In this talk, you’ll learn how Two Sigma and Voltron Data are collaborating to improve the performance of featurization workflows using the Ibis, Substrait, Arrow software stack. Wes McKinney and David Palaitis have been working together since 2016 on the design and implementation of high performance data engines for processing unstructured, high volume, streaming datasets for use in machine learning algorithms. While Palaitis has focused on using these tools to support machine learning at Two Sigma, Wes has built out a new business to support the open source computing libraries that are critical to supporting high performance featurization for quant finance workloads.

Streaming Featurization with Ibis, Substrait and Apache Arrow image
Wes McKinney
CTO and Co-Founder | Voltron Data
Streaming Featurization with Ibis, Substrait and Apache Arrow image
David Palaitis
Managing Director | Two Sigma
11:20 - 11:55
Applying Engineering Best Practices in Data Lakes Architectures

Engineering best practices are methods and tools that allow high quality, high velocity software development. Among the most common application development best practices are the agile methodology, continuous integration and continuous deployment and production monitoring. Do those practices apply for data engineers working over high scale data lakes?
In this talk, we will show how adopting those practices to data lakes is a must, as it provides us with a safe environment to operate in, that produces higher quality data in less time. Our time will be spent on actual data engineering and less on manual plumbing of data pipelines.

The set of tools that would allow us to implement engineering best practices, such as data version control, orchestration, data quality platforms, and data monitoring tools, are right at the tip of our fingers. We will show how to combine those tools to create robust data products over data lakes.

Applying Engineering Best Practices in Data Lakes Architectures image
Einatt Orr
CEO and Co-Founder | Treeverse
11:20 - 11:55
Thrive in the Data Tooling Tornado: Lead, Hire, and Execute Better by Escaping Older Industrial Antipatterns

The data tooling landscape has exploded to hundreds of products. New ones emerge almost daily. In this environment, firms struggle just to meet legacy business commitments. And the exciting next-gen projects, leveraging analytics and machine learning (AI), are notoriously unsuccessful: their failure rate is estimated as high as 85 percent!

In this talk, we’ll learn why many of these challenges result from outdated anti-patterns held over from 20th-century industry. These older patterns emphasize efficiency over effectiveness and are not appropriate for 2023 — leading to results both ineffective and inefficient.

We’ll look at adjustments in approach that make it easier for data teams to hire, manage, retain, and execute effectively using modern data tooling — all while gaining that sought-after efficiency.

This session is aimed at medium and large businesses and will be especially useful *outside* of the “big tech” software/SaaS industry.

Thrive in the Data Tooling Tornado: Lead, Hire, and Execute Better by Escaping Older Industrial Antipatterns image
Adam Breindel
Independent Consultant
12:00 - 12:35
From BI to AI : Lakehouse is the modern data architecture

Industries are constantly evolving their infrastructure to fit large amounts of data that they collect from multiple sources and increasingly adopting a new data architecture pattern where a single system can house data for many different workloads. They have already been storing all their data in the cloud data lakes but when it comes to decision making, they have to move their data somewhere else. This creates complexity, multiple copies, delays, and new failure modes. Moreover, enterprise use cases include machine learning and AI, for which neither data lakes nor warehouses are ideal. Lakehouse combines the strengths of data warehouse and data lakes into a single system, allowing data teams to accelerate their use cases as they are able to use one system rather than needing to access multiple systems, thus eliminating data silos and duplicity of data, offering you reliability and cost efficiency. Lakehouse is based on open formats such as Delta Lake, that provides support for advanced analytics and AI with performance and reliability guarantees. Through this talk, we will cover the evolution of modern data architecture, foundation principles and cover some production examples of lakehouses.

From BI to AI : Lakehouse is the modern data architecture image
Vini Jaiswal
Developer Advocate | Databricks
12:00 - 12:35
Demystifying Data Mesh – Tackling common misconceptions about Data Mesh

Data Mesh is moving forward on its hype cycle. More and more vendors are naming them data mesh solutions. Inherently this is wrong. Data mesh is about federating responsibilities by acknowledging a distributed landscape. Within this session, Wannes will address more misconceptions and will return to explain the core concepts of data mesh.

Demystifying Data Mesh – Tackling common misconceptions about Data Mesh image
Wannes Rosiers
Product Manager | Ratio
12:40 - 13:15
Reliable Pipelines and High Quality Data Without the Toil

Bad data sucks. But it’s a struggle keeping data fresh and high quality as pipelines get complicated. Data observability is the ability to understand what’s happening to, and within, your pipelines at all times. It enables data engineers to identify pipeline issues sooner, spot pipeline performance opportunities more easily, and reduce toilsome maintenance work.

Data observability techniques were pioneered by large scale data teams at companies like Uber, AirBnB, and Intuit. But today they’re accessible to team’s of nearly any size.

In this talk you’ll hear about the history of data quality testing and data observability inside Uber, the differences between data observability and other methods like data pipeline tests, how techniques developed there can be applied by data engineers anywhere, and an overview of both commercially available and open source tools available today.

Reliable Pipelines and High Quality Data Without the Toil image
Kyle Kirwan
Co-Founder and CEO | Big Eye
12:40 - 13:15
Data-Planning to Implementation

How can businesses leverage big data, fast data, traditional data and modern data for decision making? How can businesses realize value from data? What are the capabilities needed for enterprise data management? “Data: Planning to Implementation” will provide a strategic perspective to the “why, what, where, when, how and whom” of data management across industry.

Data-Planning to Implementation image
Balaji Raghunathan
VP of Digital Experience | ITC Infotech
13:20 - 13:55
Building Trusted Platforms by Protecting Data

In an age of unparalleled innovation, our technical computing power has far outpaced our moral processing power. The challenge in the coming decade will be sustaining the engagement and monetization of data while also building for safety, trust and transparency. Protecting user data is not just about compliance with regulations but also about business maturity. Ensuring sound data governance is critical to help avoid the excesses that have hurt the reputations of the tech industry. In this talk, using data privacy as a vehicle, Nishant Bhajaria will help connect the global regulatory landscape and high-scale technical operations to privacy tools, processes and metrics. The end goal is to help evolve technical innovation to preserve the gains of the bottom-up agile revolution while protecting consumers and growing the business,

Building Trusted Platforms by Protecting Data image
Nishant Bhajaria
Director of Privacy Engineering | Uber
13:20 - 13:55
Automated Data Classification

Automating Data Classification is key to a successful data privacy program. Data privacy policies apply to specific types of data and without knowing which datasets contain this regulated data, it is impossible to protect it. In any vast and dynamic data estate, manual labeling or classification of data is impractical. This talk will cover the challenges and different approaches for automating data classification.

Automated Data Classification image
Alex Gorelik
Distinguished Engineer | LinkedIn
14:00 - 14:35
Getting into Data Engineering

Curious about becoming a data engineer? This talk will cover the key things you should consider about a career in data engineering, particularly against the backdrop of 2023’s economic climate.

Getting into Data Engineering image
Joe Reis
CEO | Ternary Data
14:00 - 14:35
Leveraging Data in Motion in a Cloud-first World

Apache Kafka has emerged to the de-facto standard for event streaming platform in enterprise architectures. Many business applications are moving away from data-at-rest to an event-driven architecture so that they could leverage the data in real time as new events occur. More than 80% of the Fortune 100 are building their businesses on this new platform. In this talk, I will first share the story behind Kafka: how it was invented, what problem it was trying to solve and how it has been evolving. Then, I will talk about how making Kafka Cloud native creates new opportunities for building one system of record and some real world use cases.

Leveraging Data in Motion in a Cloud-first World image
Jun Rao
Co-Founder | Confluent
14:40 - 15:15
Spark, Cloud, DBT, Data Warehouses

We are going to discuss current Data Engineering trends and how the industry is moving toward a new data stack. I will first discuss the current tech stack which most companies are using, why there is a need for a shift, and how the current tech stack is moving towards data warehouses and delta lakes.

Spark, Cloud, DBT, Data Warehouses image
Navdeep Kaur
Founder | Techno Avengers
14:40 - 15:15
Assessing Data Quality: The 3 Facets of Data “Fitness”

While most of us are used to assessing the quality of data for gaps, errors, and other data integrity problems, understanding whether the information we have is “fit” for our intended purpose can be a little trickier. In this session, I’ll cover the three essential facets of data “fitness” that can help you ensure that the your data can really give you the answers you want.

Assessing Data Quality: The 3 Facets of Data “Fitness” image
Susan McGregor
Associate Research Scholar | Columbia University's Data Science Institute
15:20 - 15:55
Building a Data Mesh: Strategies and Best Practices for navigating the Implementation of a Data Mesh

Data mesh is a new approach to to thinking about data based on a distributed architecture for data management that promotes decentralized ownership and control of data assets. It emphasizes the use of domain-driven design and self-service access to data, with the goal of improving the quality and usability of data for business decision-making. In this talk, we will explore the principles and practices of data mesh and how to implement it in an organization.

Building a Data Mesh: Strategies and Best Practices for navigating the Implementation of a Data Mesh image
Hajar Khizou
Lead Data Engineer | SustainCERT
15:25 - 15:55
Using Compression, Deduplication & Encryption for Better Data Management

Cloud storage footprint is in exabytes and exponentially growing and companies pay billions of dollars to store and retrieve data. In this talk, we will cover some of the space and time optimizations, which have historically been applied to on-premise file storage, and how they would be applied to objects stored in Cloud

Deduplication and compression are techniques that have been traditionally used to reduce the amount of storage used by applications. Data encryption is table stakes for any remote storage offering and today, we have client-side and server-side encryption support by Cloud providers.

Combining compression, encryption, and deduplication for object stores in Cloud is challenging due to the nature of overwrites and versioning, but the right strategy can save millions for an organization. We will cover some strategies for employing these techniques depending on whether an organization prefers client side or server side encryption, and discuss online and offline deduplication of objects.
Companies such as Box, and Netflix, employ a subset of these techniques to reduce their cloud footprint and provide agility in their cloud operations.

Using Compression, Deduplication & Encryption for Better Data Management image
Tejas Chopra
Senior Software Engineer | Netflix
Select date to see events.
Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Youtube
Consent to display content from - Youtube
Vimeo
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google