The event, organized by the IIT Alumni SoCal chapter and the global IIT AI ML Forum, featured a discussion on the implications of DeepSeek, a new AI model, on industries and AI technology. Key speakers included Tushar Kant, Smita Baga, Ashish Bansal, and Kaushal Mishra, who explored DeepSeek's cost-effectiveness, reasoning capabilities, and potential to democratize AI across sectors like healthcare, fintech, and retail. The conversation also touched on the technical nuances of model distillation, security concerns, and the evolving landscape of AI hardware, emphasizing the need for strategic adoption and innovation in AI applications.
Introduction and Event Overview
- The event is organized by the IIT Alumni SoCal chapter and the global IIT AI ML Forum.
- The SoCal chapter has over 3,000 IIT alumni and was officially recognized as a nonprofit in 2018.
- The chapter has five tracks for its agenda: Technology Forum, Entrepreneurship, Career Development, Industry Insights, and Social Connections.
- Upcoming events include a valuation session on startups, a career networking workshop, and a summer picnic.
"This event has been organized in joint collaboration between IIT alumni SoCal chapter and the global IIT AI ml Forum."
- The event is a collaborative effort aimed at bringing valuable sessions to the community.
Panel Introductions
- Tushar Kant is the moderator, with extensive experience in AI and product management at major tech companies.
- Panelists include Smi Baga, Ais Bunel, and Kel Mra, each bringing unique expertise in AI, business strategy, and enterprise solutions.
"Tushar has been in the AI industry and product management across the tech Giants... His specialization is Gen AI, computer vision, reinforcement learning, personalization."
- Tushar's experience and specialization make him an ideal moderator for discussions on AI advancements.
Deep Seek and Cost Efficiency
- Deep Seek claims to train models with comparable performance to OpenAI at a fraction of the cost.
- The architecture uses a mixture of experts, reducing compute needs but requiring more memory.
- Innovations include mixed precision training, using smaller precision numbers to reduce memory bandwidth needs.
"The architecture that deep seek is using is called a mixture of experts... This means that if you think about the total number of neurons in the panel it is like n times the number of people and the panel."
- The mixture of experts architecture allows for efficient resource use by activating only relevant experts for specific tasks.
Impact on Startups and SMBs
- Deep Seek's cost-effectiveness is empowering startups and SMBs to leverage AI without extensive resources.
- Startups are using Deep Seek to offer competitive products at lower costs, gaining market entry with more clients.
- The democratization of AI is enabling smaller players to innovate and compete with tech giants.
"It has significantly leveled the playing field... what tech Giants could do before now startups and small businesses are looking forward to it."
- The availability of cost-effective AI models is allowing smaller companies to innovate and compete in the AI space.
Enterprise Adoption and Caution
- Enterprises are cautiously exploring Deep Seek within controlled environments for internal use.
- Concerns include performance, accuracy, and the lack of SLA guarantees for customer-facing applications.
- Deep Seek is primarily used for operational efficiency and cost reduction, with a focus on augmenting rather than replacing human roles.
"Most companies try to use it in development rather than in production... it is only going to augment human."
- Enterprises are leveraging AI for operational improvements while maintaining human oversight and control.
Technical Challenges and Model Limitations
- Current AI models, including Deep Seek, struggle with certain tasks like math and geometry, requiring significant fine-tuning.
- The models often perform well on benchmarks but may not translate to real-world applications without additional training.
- The lack of inherent multimodal capabilities limits the models' effectiveness in tasks requiring visual understanding.
"Our testing shows that unfortunately none of the models including Deep Seek come close to what is required even for the grade eight Maths."
- Despite advancements, AI models still face challenges in specific domains, necessitating further development and customization.
Open Source Models and Enterprise Support
- Open source models like Llama offer flexibility but lack the direct support and SLAs provided by proprietary models.
- Enterprises must weigh the benefits of community support against the risks of using unsupported models in production.
- The decision to adopt open source models depends on factors like cost, flexibility, and the ability to manage and customize models internally.
"In case of AI there a lot of Open Source stuff like in case of gen you have Llama for which there is not really any company stepping up to say hey it's going to be managed Llama from us."
- The reliance on open source models requires enterprises to assess their internal capabilities and risk tolerance for unsupported solutions.
Open Source and Privacy Concerns
- Open Source models like Deep Seek have revitalized web search by integrating advanced models such as perplexity.
- Startups leverage open source due to its cost-effectiveness, though they may initially overlook privacy concerns.
- Tech giants use open source internally for prototyping before moving to closed-source models.
"Startups do not have much money to play with, and you know what the big companies bring to the table."
- Startups prioritize cost-effectiveness over privacy, while large companies use open source models internally for testing.
Reinforcement Learning and Supervised Fine-Tuning
- Deep Seek's innovation lies in using reinforcement learning without supervised fine-tuning to build reasoning capabilities.
- Traditional models relied on supervised fine-tuning for training, whereas Deep Seek allows models to learn through trial and error.
- The process mirrors human learning, where limited information is provided, forcing fundamental understanding.
"The reasoning aspect of it of this model is in some ways a bolt-on."
- Reinforcement learning allows models to self-correct and improve reasoning, similar to human critical thinking.
Multi-Step Reasoning and Emergent Behavior
- Deep Seek incorporates multi-step reasoning, akin to human problem-solving methods.
- The model can identify and correct its reasoning path, demonstrating emergent behavior.
- This method is crucial in fields like education, where critical thinking is emphasized over rote learning.
"Reasoning... is a multi-step process for any of these models."
- Multi-step reasoning helps models break down complex problems into manageable steps, enhancing their problem-solving capabilities.
Industry Applications and Opportunities
- Open-source models democratize AI, offering opportunities across various sectors like retail, healthcare, and manufacturing.
- Startups are leveraging these models to innovate in areas such as drug discovery and e-commerce platforms.
- The potential for emergent behavior in customer interactions and decision-making processes presents new business opportunities.
"The most innovation happening in creating new software, new reasoning models, trying to predict..."
- Industries are exploring open-source models for innovative applications, especially where customer interaction and autonomous decision-making are involved.
Model Distillation and Business Implications
- Model distillation allows for creating smaller, efficient models from larger ones, reducing computational costs.
- Startups benefit from model distillation as it facilitates faster prototyping and deployment across platforms.
- This technique is particularly beneficial in domains with extensive data, such as finance and legal sectors.
"Model distillation is fine as long as you're using it for your own internal use cases."
- Model distillation provides a cost-effective means for startups to develop and refine products, offering competitive advantages.
Security Concerns with Open Source Models
- Security remains a critical concern with open-source models, particularly regarding data used for training and model integrity.
- Ensuring robust security measures is vital to protect sensitive information and maintain trust in AI applications.
"Security of data that is used for training, security of model, security of results..."
- Addressing security concerns is crucial for the widespread adoption and trust in open-source AI models, especially in sensitive industries.
Security Concerns in AI Models
- Security is often overlooked in AI models, but it is a critical issue as these models become more prevalent in sensitive areas like healthcare.
- There are two main aspects of security: model tampering and bias introduction. Models can be tampered with by altering inputs, and biases can be injected to change outputs.
- As AI models become more common, evaluating them on security alongside cost and performance becomes crucial.
"Security is frequently overlooked and it is a very big issue... as these AI models become more commonplace... I think security becomes even more important."
- The quote emphasizes the growing importance of security in AI models as they are increasingly used in critical applications.
Microsoft's Approach to AI Security
- Microsoft prioritizes security, building a responsible AI framework before designing models.
- Security is a primary concern for any production-level AI, regardless of the industry.
- Microsoft has established itself as a leader in security by integrating strong defenses and sharing these with others.
"Security comes first and Microsoft happens to be... the largest security company in the world."
- Microsoft’s commitment to security is foundational, positioning it as a leader in AI security practices.
Challenges and Solutions in LLM Security
- Data residency and unauthorized access are major concerns when using models like Deep Seek.
- Companies often ban certain models due to security concerns, though hosting locally or through secure cloud services can mitigate these issues.
- Cloud service providers offer varying levels of security guarantees depending on the control they have over the models.
"If you're accessing it through an API and through a chatbot... data residency is the biggest question."
- Hosting models locally or through secure services can alleviate data residency concerns, a major security issue.
Governance and Cybersecurity in AI
- Strengthening governance policies is crucial to prevent unauthorized model access and backdoor attacks.
- Continuous testing of boundaries and securing hybrid infrastructures are necessary for robust cybersecurity.
- Synthetic data generation poses security concerns, as patterns can still reveal sensitive information.
"Security as we know... we always have to be very, very, very cautious about security... we have to strengthen our governance policies."
- Strong governance and continuous testing are essential to maintain security in AI systems.
Distinguishing Security from Privacy
- Security and privacy are often conflated but are distinct; both need to be handled independently.
- Security involves ensuring the integrity of the model, while privacy concerns the protection of data.
- Practitioners must focus on sandboxing environments to prevent unauthorized access and malicious activities.
"We must not conflate privacy with security... these are two independent aspects that both need to be handled."
- Differentiating between security and privacy helps in addressing each aspect effectively, ensuring comprehensive protection.
Prompt Injection and Jailbreaking in AI Models
- Prompt injection and jailbreaking are significant security concerns, often leading to unauthorized access to system prompts.
- The probabilistic nature of AI models makes it challenging to predict and prevent these issues.
- Strong guardrails and narrow use cases are recommended to mitigate the risks associated with prompt injection and jailbreaking.
"Almost all prompts have been jailbroken by somebody... the same input may generate three different outputs at three different points in time."
- The unpredictable nature of AI outputs necessitates robust security measures to protect against prompt injection and jailbreaking.
Evolving Paradigms in AI Security
- The shift from white-box to black-box testing in AI security involves continuous monitoring and applying psychological tests to systems.
- AI systems are treated analogously to human behavior, with monitoring strategies akin to managing Insider threats.
- This approach reflects the evolving nature of AI security, focusing on system behavior over time.
"We are moving from an AI security and safety standpoint... to a model of continuous monitoring."
- Continuous monitoring and treating AI systems like human behavior reflect the dynamic nature of AI security.
Impact of AI on Hardware Market Dynamics
- Despite advancements in AI models, GPUs remain crucial for training due to their specialized architecture.
- CPUs are versatile but lack the specialized capabilities of GPUs for efficient training.
- Inference tasks may use CPUs, but the demand for GPUs in training persists due to their optimized performance for specific tasks.
"I don't have any good news for the CPU people... GPUs are a special purpose computer machine."
- The specialized nature of GPUs ensures their continued relevance in AI training, despite advancements in model efficiency.
AI Applications in Healthcare
- AI is being integrated into telehealth, mental health support, and conversational AI assistants to enhance patient care.
- AI-driven digital twins and synthetic data generation are used to simulate medical scenarios and test new treatments.
- These applications demonstrate AI's potential to transform healthcare delivery and patient engagement.
"AI can provide bite-size therapy... and conversational AI assistants can understand concerns based on personality."
- AI applications in healthcare offer innovative solutions for patient support and medical research, highlighting its transformative potential.
These notes provide a comprehensive overview of the key themes and discussions from the transcript, offering insights into the complexities and advancements in AI security, applications, and market dynamics.
Role of CPUs and GPUs in AI Inferencing
- CPUs have played a significant role in AI inferencing, but the field remains complex.
- The architecture for influencing AI models can execute multiple large language models (LLMs) concurrently, which may aid in reasoning and executing a mixture of experts.
- The dichotomy exists between concurrent execution of LLMs on high-performance systems versus breaking problems into smaller parts that require less computational power.
"Given that Sambanova architecture can execute multiple large language models concurrently, will that help with reasoning?"
- The concurrent execution of multiple LLMs may aid in reasoning by allowing for the execution of a mixture of experts.
Sambanova and Data Flow Architecture
- Sambanova employs a unique data flow architecture providing ten times better performance than Nvidia, with significantly lower energy consumption and data center space requirements.
- The architecture originated from MIT and was further developed at Stanford, gaining traction as an alternative to Nvidia for security and governance reasons.
- Sambanova is focusing on building a global cloud infrastructure using this architecture, expanding into markets like India.
"They have picked an architecture called data flow architecture which is unique in the industry... it's giving 10 times better performance than Nvidia."
- Sambanova's data flow architecture offers superior performance and efficiency compared to Nvidia, making it a compelling alternative in the AI space.
Chip Design and AI Model Scaling
- Designing chips for AI models is challenging due to the rapid pace of machine learning advancements.
- The Chinchilla scaling laws from DeepMind establish a linear relationship between the number of tokens and computational power required for training.
- Models can vary in memory and compute balance, impacting their efficiency and application.
"There is a relationship between the number of tokens that you want to train on and the compute or the flops that you will need."
- The Chinchilla scaling laws highlight the importance of balancing memory and compute resources in AI model training.
Challenges in Concurrent Execution of LLMs
- Running the same query across multiple models for voting can increase token costs and latency.
- Variability in token production among models complicates the feasibility of using a composition of experts in production environments.
"It's not really feasible to do this composition of experts for production and latency use cases."
- The practical challenges of concurrent execution of LLMs include increased costs and latency, limiting its application in real-world scenarios.
Qualcomm's AI Strategy
- Qualcomm is leveraging its expertise in chipsets to enter the AI space, focusing on inferencing in smartphones, PCs, and IoT.
- The company aims to optimize AI performance, balancing high-end capabilities with cost-effective solutions.
- Qualcomm is working on building an enablement layer to enhance its AI offerings and compete with established players like Nvidia.
"Qualcomm has been a chipset provider... now the question has been how to leverage that expertise to get into more into the AI space."
- Qualcomm is strategically expanding into AI by utilizing its chipset expertise and focusing on optimization and enablement.
Rebranding and Market Positioning in AI
- Companies need to balance their established brand identities with new AI capabilities, positioning themselves as AI-enabled rather than purely AI-focused.
- Startups and larger enterprises must integrate AI into their offerings while maintaining their core brand values.
- Investors are cautiously optimistic about AI, balancing excitement with a wait-and-watch approach to new developments.
"You can't really rebrand themselves from being a medical company to be an AI company. It is medicine with AI as an assistant."
- Companies should integrate AI as an enhancement to their existing offerings, maintaining their core brand identity while leveraging AI capabilities.
Future Prospects and Strategic Insights
- The AI landscape is rapidly evolving, and companies must continuously adapt to new technologies and business models.
- Strategic partnerships and collaborations, such as those facilitated by industry forums, are crucial for staying competitive.
- The focus should remain on solving real-world business problems with AI, ensuring long-term success and relevance.
"Nothing takes over the world in a second... focus on what value or what business problem you're trying to solve."
- The strategic focus should be on addressing genuine business needs with AI, ensuring sustainable growth and adaptation in a dynamic market.