2022-01-03

Web3, Web 3.0, and the Semantic Web

header-image

In March 1989, Sir Tim Berners-Lee laid out his vision for the World Wide Web in “Information Management: A Proposal”. In its initial phase (1990-2005), commonly referred to as Web 1.0, web users were passive consumers of a global web of static hypertext documents. Beginning around 2005, Web 2.0 added the ability for users to interact and collaborate. Users could publish, comment, and share content via dynamic websites, social media networks, blogs, image-sharing sites, video-sharing sites, and through rating systems on eCommerce and Content sites. Social media and the mobile web resulted in a massive increase in users who were always connected.

That, in turn, resulted in a massive increase in the amount of data users generated, which could be used to create better and more targeted marketing, UX, and product development. Machine learning provided new ways to produce actionable intelligence from this Big Data. There was also an increase in collaboration and access to information for companies that deployed Web 2.0 technologies:

Half of respondents report that Web 2.0 technologies have fostered in-company interactions across geographic borders; 45 percent cite interactions across functions, and 39 percent across business units.  ~McKinsey Quarterly

Although Web 2.0 brought big benefits for businesses and users, it wasn’t all rainbows and unicorns. Allowing users to publish content and interact with apps, did not result in a more democratized, decentralized web. The end result of Web 2.0 was a web totally dominated by a few digital monopolies. Users’ data belongs not to the users who generate the data but to companies like Google, Amazon, and Facebook. These digital behemoths own, process, package, and sell user data which is then used to manipulate user behavior in ways users are rarely aware of. The bigger surprise was that more connectivity and instant access to information did not bring people closer together and make them happier and smarter, in fact, it seems to have done the opposite.

Information is power, and certainly, having more information about clients has been a powerful tool for companies. Being able to process mammoth amounts of data has been a boon to science and research. But, Web 2.0 also enabled an explosion of conspiracy theories, intentional misinformation, and pseudo-science that has had a disastrous effect on social cohesion, mental health, and political stability.

It has been theorized that an infinite number of monkeys banging on an infinite number of typewriters would eventually reproduce the written works of Shakespeare. Thanks to the Internet, we now know this is not true.  ~Kurt Vonnegut

It’s difficult to fully measure the balance of the benefits with the societal costs of Web 2.0. Still, technology is evolving rapidly, so before we will have time to figure that out we will be facing a brand new paradigm with a whole new slate of opportunities and unintended consequences.

A Brief History of the Web

Web 1.0: the ready-only web
Users could search for information and view it.
Web 2.0: the read-write web
Users could now create, transact and collaborate but were dependent on the permission of platform providers who required that you hand over your personal data to have access. The power resides in the center, in the hands of the platform owners.
Web 3.0: the read-write-execute web
Open, trustless, permissionless, and ubiquitous. The power is distributed to the user, who no longer needs the permission of the digital monopolies to create, transact and collaborate.

There is no governing body that defines what is and what isn’t Web 3.0 so let's look at the trends that are shaping it.

Edge Computing

The computing paradigm of Web 2.0 concentrates data and applications in a cloud-based centralized server. Microsoft Azure, Amazon Web Services, and Google Cloud provide scalable cloud-based data center computing, storage, application services, security, and bandwidth accessible from anywhere 24/7.

Data generated on the edge of the network - on mobile phones, users’ computers, IoT devices, and sensors - is sent to a centralized server for storage and processing, the results can then be sent back to the edge device. That is a model that has worked very well but is now under strain because of the massive increase in connected devices, an expanding tsunami of data, and the increasing computational demands of AI. Bandwidth constraints, latency, and network disruptions begin to become limiting factors when moving this geometrically increasing load across a network with finite capacity. If your connected device has to ask and get instructions from a centralized server on the other side of the world before it can do anything, you will perceive this delay as latency.

This is why startups like Atlazo are developing a hyper-low-power system-on-chip that can power AI and machine learning workloads in miniaturized smart devices. Syntiant’s new NDP120 Neural Decision Processor uses neural processing to enable battery-powered devices to run multiple applications without having a major impact on the battery. In the same vein, it’s rumored that Amazon is working on an AI chip for Alexa that will enable it to process voice commands right on your Alexa rather than having to send the command to the Amazon server for processing.

If you can’t get the data to the data center from the edge, move the data center to the edge. The increasing computing power and storage capacity of edge devices make it possible to offload varying amounts of data and processing responsibilities from the center to the edge. To get an idea of the speed at which processing power on the edge is increasing - it took 117 state-of-the-art Sun computers to render the CGI for the movie Toy Story in 1995. The 117 computers had a combined total of 1 billion transistors. There are 15 billion transistors in the Apple A15 Bionic chip that powers iPhone 13.

Every single edge device will be a data center…Every single cell tower will be a data center, every base station…Every single car… truck, shuttle will be a data center. ~Jen-Hsun ‘Jensen’ Huang, CEO of NVIDIA

Edge computing will allow networks to efficiently divide up the workload and reduce data traffic between the server and the edge devices.

Trust

Internet transactions require trust. Just as data and processing services are centralized in the cloud computing model, trust is currently centralized with trusted gatekeepers and intermediaries. If you use Facebook, you trust Facebook not to abuse the personal data you entrust them with. If you search with Google you entrust Google with your data and search activities. If you use online brokerage or banking services you are entrusting large financial institutions with your money and credit information.

According to the 2021 Edelman Trust Barometer, trust in institutions in the world’s two largest economies, the USA and China, has dropped precipitously. Edelman Trust Index 2020-2021:

Edelman trust index for USA and China

According to a Washington Post poll tech companies are doing even worse. 73% of internet users do not trust Facebook with their personal data and Instagram, and Tik Tok are not doing much better. The poll asked: How much do you trust each of the following companies or services to responsibly handle your personal information and data on your Internet activity?

Consumer trust in tech brands chart

You could make the case that there is a crisis in trust with the intermediaries and institutions that are tasked with maintaining trust in society online and off.

No longer trust or want to pay for the services of trust providers? Eliminate the need for them. In a trustless system, participants do not need to know or trust each other or a third party for the system to function. The $3 trillion cryptocurrency market has made blockchain the 900-lb gorilla of trustless systems. Blockchain smart contracts are now being used successfully in healthcare, finance, IoT, and the arts.

A Blockchain is a type of distributed ledger/database. A Blockchain collects data in groups, known as blocks. Each block contains the cryptographic hash of the previous block, linking the blocks into a growing chain of blocks, hence the name Blockchain. Each block in the chain confirms the integrity of the previous block, all the way back to the initial block, which is known as the genesis block.

Blocks can be used to record transactions across many computers in a peer-to-peer network. Blocks in the chain cannot be altered retroactively, without the alteration of all subsequent blocks. This ensures a high degree of data security and integrity and enables any participant in the Blockchain to verify and audit transactions right back to the genesis block.

Global blockchain technology market chart

I think trust in itself is actually just a bad thing all around. Trust implies that you are you're placing some sort of authority in somebody else, or in some organization, and they will be able to use this authority in some arbitrary way. ~Gavin Wood, founder of the Web3 foundation, Co-founder of Etherium.

Blockchain can operate on a centralized cloud-based system, but the centralized architecture creates performance issues because of latency, bandwidth limits, and network disruptions. Low latency edge computing can cure the performance issues with Blockchain while Blockchain can provide security, and consistently manage distributed data for applications like cryptocurrency, smart grid, and IoT. Blockchain and Edge Computing seem made for each other and are shaping up to be two key components of Web 3.0.

The Semantic Web

The web borrowed its page-centric content model from print. Unstructured content, front-end, and backend are all integrated into a monolithic model. In this model content is bound exclusively to the page it lives in. When websites were essentially the hyperlinked virtual version of magazines, brochures, and catalogs this model worked well. The explosion of connected IoT devices, web and mobile apps, IoT, voice applications, and VR and AR technologies has now moved content far beyond the confines of the page.

Back in 2001, an article titled “The Semantic Web”, was published in Scientific American magazine. The authors, Tim Berners-Lee (the inventor of the World Wide Web), Ora Lassila, and James Hendler described a new form of Web content meaningful to computers that they thought would “unleash a revolution of new possibilities”. The idea created a lot of buzz but never got off the ground. Now it seems that Web 3.0 heralds a return of the concept of a Semantic Web.  

The new demands on content have resulted in the development of ‘Headless’ Content Management Systems that are front-end agnostic as well as a move from unstructured to structured content. Semantically marked-up content managed by front-end agnostic systems frees content from the page and makes content 'meaningful to computers' in a way similar to what Berners-Lee envisioned in 2001.

This will enable a create once, publish anywhere model for content. HTML and CSS markup has been used exclusively to define styles and control web page layout. Semantic markup is meant to define meaning. It conveys information about each element of content to human and machine users and is apart and completely agnostic to presentation. Content is separated from and no longer dependent on presentation, device, or platform.

Traditional CMS vs Headless CMS

This makes it possible for machines without the aid of advanced AI to identify content according to meaning rather than keywords. Add in AI and the possibilities become endless on how content can be aggregated, presented, and distributed.

Google, Microsoft, and Yahoo have all been working on semantic search with structured data and parsing unstructured data to provide structure for more than a decade. Semantic search is based on meaning rather than keywords. Google is encouraging developers to use schema.org markup (which is based on the RDF ) to add structure to content. Google has made available a Structured Data Markup Helper tool to aid content creators in marking up content.

Third-Wave AI

NVIDIA introduced the first GPU (Graphic Processing Unit) in 1999. The first applications were graphics-intensive processes like video games and 3d cad. In 2012 University of Toronto Ph.D. student, Alex Krizhevsky, plugged 1.2 million images into a deep learning neural network powered by two NVIDIA GeForce gaming cards for the ImageNet Large Scale Visual Recognition Challenge. His model blew away the competition and bettered the previous year’s winner by a shockingly large margin. It demonstrated the clear superiority of GPUs over CPUs in deep learning AI applications.

Realizing the potential for GPUs in AI processing, NVIDIA released CUDA in 2006, a parallel computing platform and programming model that makes using a GPU for general-purpose computing simple and elegant. Deep learning frameworks that are now using CUDA for GPU support include TensorFlow, Torch, PyTorch, Keras, MXNet, and Caffe2. By providing a cost-effective computing power solution, NVIDIA GPUs and CUDA have been the key enablers in the exponential growth of deep learning AI. This has brought AI into the mainstream of computing and made it a key technology in everything from edge computing to medical research.

AI computing is the future of computing ~Jen-Hsun ‘Jensen’ Huang, CEO of NVIDIA

AI is advancing at a blistering pace. In 2018, Open AI released the first Generative Pre-trained Transformer (GPT), a neural network machine learning model. GPT-1 worked with 110 million parameters. Wu Dao 2.0 developed by the Beijing Academy of Artificial Intelligence (BAAI) has a capacity of 1.75 trillion parameters. In 2022 GPT-4 will be released and will have 100 trillion parameters.

GPT neural networks number of parameters

Google has developed two specialized AI. LaMDA (Language Model for Dialogue Applications) which will enable chatbots to engage in human like levels of conversation, and MUM which will be a revolutionary advance for the Google search engine. MUM could make SEO obsolete as MUM’s natural language abilities could make keywords and backlinks irrelevant.

DARPA (Defense Advanced Research Projects Agency), the folks who invented the internet are investing $2bn USD to accelerate the development of the ‘third wave’ of AI.

Today, machines lack contextual reasoning capabilities and their training must cover every eventuality – which is not only costly – but ultimately impossible.

We want to explore how machines can acquire human-like communication and reasoning capabilities, with the ability to recognize new situations and environments and adapt to them. ~Dr Steven Walker, Director of DARPA

The capabilities of these neural networks dwarf anything we have seen up until now. ARK Invest estimates that deep learning will add $30 trillion to global equity market capitalization in the next 15-20 years.

The amazing pace of development in AI will have profound implications for Web 3.0 as AI now figures into all the trends that are converging to enable Web 3.0. In Web 2.0 Machine Learning began to enter the mainstream and have a measurable impact on bottom lines. While AI played a supporting role in Web 2.0, AI will be the tip of the spear of this next digital transformation.

What is Web 3.0?

What are the characteristics and implications of the digital universe commonly referred to as Web 3.0 or Web3? What will the impact be of the confluence of all these aforementioned trends?

Large Scale Disintermediation

Trustless: Peer-to-peer transactions without the need of a trusted middleman.

Permissionless: No need for gatekeepers or platforms requiring permissions to be able to collaborate, communicate and transact.

Web 2.0 created a digital surveillance state where Big Brother in the form of companies like  Facebook and Google track your every move. In order to search or to socialize online, you need to agree to surrender your privacy and your data. Google and Facebook can then process, package and sell your data. Without the need for a middleman, users can retain ownership of their data, and with Blockchain technologies like smart contracts and NFTs, or technologies like Solid users can share and or monetize their data securely.

Decentralization

Computation, Data, and Intelligence are moving from the center to the edge. This puts more power in the hands of users. Users will no longer be dependent on big centralized services to search, store, and process data and execute applications.

The Intelligent Web

AI with natural language processing capability will totally transform the digital experience. Conversational interaction with search, chatbots, virtual assistants and applications that closely mimic human-to-human interaction is a seismic change in machine-human interaction. Computers that can understand information and content as humans do, and then create new content, package information and present it according to the intelligence it derives is another world changing advance.

These capabilities are not dependent on machines achieving human level AGI (Artificial General Intelligence). With the latest advances in AI, machine behavior online can be virtually indistinguishable from human behavior to human users, even though those machines don't in reality possess general intelligence. AGI is also not a requirement for machines to perform and learn autonomously without human mediation. Some experts think that the singularity (machine intelligence that matches and then surpasses human intelligence) may arrive in the next decade and some think it will never be possible. But the advances in AI already baked in the cake are sufficient to reshape our world and way of life completely.

The User Experience a decade from now will look absolutely nothing like the digital experience of Web 2.0. All of the models for monetizing current web interactions will go out the window and will have to be replaced with completely new business models for businesses to survive and thrive.

Data Integrity and Security

Blockchain is considered by most observers as a key component in providing data integrity and security in a decentralized Web 3.0.

  • Immutable: Blocks are immutable. Once data is written and verified it cannot be erased or replaced. Data tampering is exceedingly difficult in the network by a rogue administrator or third-party hacks. No system is unhackable but Blockchain raises the bar on security—more on Blockchain vulnerabilities.
  • Transparent: Blockchain relies on transparency in lieu of regulation. All transactions regarding the underlying asset of a token on the Blockchain would be 100% transparent for any participant to view and won’t require a forensic accountant to understand.
  • Traceability: An audit trail for all transactions, right back to the point of origin. This can be applied to investment vehicles, products in supply chains, or content.
  • Resilient: Edge Computing and the Blockchain distributed ledger together make for a much more difficult to disrupt network than a network dependent on centralized data centers.

Cooked books, insider trading, dangerous faulty products in the supply chain. Many of these abuses that regulators attempt to police are only possible because of a lack of immutability, transparency, and traceability. Transparency, immutability, and traceability are powerful incentives for good behavior and can have huge benefits for investors, consumers, and businesses and save $Billions.

Privacy

The promise of users owning and controlling their own data will enable user-controlled levels of privacy, which is an option not available for users of Web 2.0. In applications where Blockchain is employed, Blockchain’s immutability and transparency would seem to be at odds with privacy but there are ways to design Blockchain systems that comply with privacy standards like GDPR. Especially if users are the ones owning their data and are sharing it via smart contracts for specific uses that they control.

Tim Berners-Lee, who brought us the World Wide Web and the concept of the Semantic Web has a new project addressing data privacy, ownership, and security. Solid is a specification that lets people store their data securely in decentralized data stores called Pods. Pods are like secure personal web servers for data. When data is stored in someone's Pod, they control which people and applications can access it. Solid does not use Blockchain but it can interact with and use blockchain initiatives. Solid technology is seeing some traction. The Government of Flanders announced that it is working on giving every citizen a Solid Pod.

Web 3.0 technologies have the potential to enable users to have both privacy and personalization. That would mean users no longer have to trade privacy for personalization. Good news for users, potentially bad news for companies whose business model depends on selling users’ data.

Prepare for Historic Levels of Disruption

It’s not hard to find posts gushing over the imminent digital utopia that Web 3.0 will usher in. A democratic, decentralized web where the power is in the hands of the users. Certainly, there is the potential for unprecedented benefits for users. Businesses that can adapt to the new paradigm and take advantage of the new opportunities will also see huge benefits.

However, just a cursory reading of history or even a look back at Web 2.0 will tell you that with disruptive change comes unknowable, unforeseeable, and unintended consequences. The bigger the disruption, the bigger the consequences. It’s impossible to predict what the full impact of Web 3.0 will be. But it has the potential to cause major disruptions to our financial, commercial, social, and political systems. That’s a very complex ecosystem. When you start moving the puzzle pieces around it’s impossible to say what the knock-on effects — good or bad — will be. Think back at how Web 2.0 completely transformed our world — social media, mobile apps, cloud computing, machine learning, and big data — and now imagine exponentially bigger change at a much faster pace.

The pandemic proved that businesses could be incredibly agile and could affect dramatic change virtually overnight. Those skills will be critical as we transition to Web 3.0. You should prepare for the trends that can be reasonably anticipated but you must also prepare to be able to respond rapidly to large changes that can not be predicted.