Virtual Database technology transforms the Web into a database, using adapters (called Wrappers) that analyze the HTML from Web sites to present them as relational data sources. As data sources become self-describing, VDB technology will automatically create the adapters. Virtual databases of several hundred thousand data sources will occur. In fact, the entire Web could become one unified database, fulfilling Junglees vision.
The database application operates on this unified schema, issuing SQL queries through the JDBC or ODBC API; the application itself can be built using standard RAD tools such as Delphi, PowerBuilder, Visual Basic, or similar Java toolkits.
The VDB is accessed through the VDB Server, and is administered through the browser-based VDB Console. The VDB also contains, for each external data source, a wrapper that interfaces the data source to the VDB server.
A wrapper makes an arbitrary external data source, such as a web site, behave like an RDBMS, while the VDB Server integrates these separate relational databases into a unified Virtual Database (VDB).
Wrappers are software programs used to wrap around an information source. They are mostly used to automatically fill out forms on the web. As such, they can be used to access databases over the web. A wrapper interfaces with a web site, typically using HTTP and HTML or XML. The wrapper is accessed via the JDBC API, through which clients can issue SQL queries. A SQL query issued to the wrapper might result in the wrapper filling out a HTML form on the web site, navigating and parsing the resulting HTML pages, and transforming the data into rows in a relational table. The wrapper uses data transformation rules to format the data to fit the schema, and uses the data validation rules to ensure data integrity.
When the VDBMS receives the query, the query processor component decomposes the query, determines the fragments to be sent down to the individual data sources, and combines their results. The query result cache caches results from data sources for performance.
Virtual Database Definition (VDD) objects are used to describe database sources, how to connect to a database source, and the application programming interfaces (APIs) and appropriate primary object layer to use when accessing these database sources. To help in database access management, db Objects also includes a database definition or dictionary repository utility program (VPVddm20.exe). These VDD objects support serialization for persistent storage to a disk file.
The utility program can be used to maintain VDD objects that define connections to a given database, reads in existing database dictionaries or schema, automatically creates symbolic references for all tables and fields, and matches database definition fields with Data Definition object references
Virtual Database Definition (VDD) objects, as maintained by the Definition utility program (VPVddm20.exe), also stores the type of underlining database (i.e. Access, SQL Server, Oracle, Multi Value, etc.), the type of Application programming interface (API) along with the primary object layer used to operate against the database (i.e. DAO, ADO, or the Vantage my Server protocol), and any connection parameters and location information for the database. Once define, these VDD objects are “attached” to database objects provided by the Database Operation Services DLL (VPDb20.dll) during runtime.
Tasks of a virtual database:
For a virtual database to be complete, it should be able to do all the tasks that a normal database does. Because the World Wide Web is involved, there will also be some extra tasks.
First of all, it should be possible to model the web. It is also necessary that the virtual database can be queried. The information that is queried for, should be extracted and integrated.
The main activity of the virtual database is on the web, the integrated information is best presented in the form of a web page. This makes the design consistent.
Security is an important issue, it is important to protect the information that travels over the web. When this is secure information, it should be protected.
Advantages and Disadvantages of virtual databases:
A big advantage of using virtual databases is the fact that information is always up to date, if the used sources are up to date. Even if the information would change rapidly, the virtual database would always get the latest 'available' version. The information would not be copied from the sources to the virtual database. As such, no continuous updates are necessary. Information is directly extracted from the source.
There is also only one point of entry in the virtual database. This does not mean that the database can only be used from one point .Quite the contrary, as long as a computer with a network connection and a browser is available, the virtual database is available. The big advantage here is that when upgrading only the virtual database has to be upgraded, no client software has to be changed or maintained.
A final big advantage is the flexibility of the information representation. One is not limited anymore to linear documents. A document can have links to other documents. In addition, additional media can be included, such as pictures, sound and video.
There are disadvantages with using a virtual database.
One disadvantage of a virtual database is that searches may be slightly slower than when conducted against a single database. Information that has to be exchanged over the net will sometimes travel slowly.
Moreover the information will not always be guaranteed. Servers can go down; information can be changed by the authors in an eye blink.
The queries used to extract information are often complex. They have to be decomposed for the various sources they will query. This involves not only new query languages but mostly also wrappers. Most of the query languages are not for 'casual users' and the wrappers have to be generated and updated if necessary.
The VDB at Work: Real-World Applications
Canopy has applied VDB technology in several key domains: Employment Classifieds, Consumer Shopping, Real Estate, and Apartment listings.
The Job Canopy VDB application integrates job listings from over 700 data sources, including employer web sites, flat files, and legacy data feeds. The schema for this VDB includes 31 attributes of interest to employers and
Job seekers, including job title, job category, job location, and contact information. These data sources are scoured each week to ensure that the information is always fresh. Listings from different employers are normalized to
Have the same set of fields and the same vocabulary.
The Shop Canopy VDB application allows comparison shopping over 40 merchants in 8 categories, including Books, Music, Computer Hardware, and Consumer Electronics. Shop Canopy is deployed on the Yahoo! Visa
The Shop Canopy application brings together buyers and sellers online to create marketplaces on the Web. Shop Canopy allows consumers to easily access and compares product and pricing information from merchants simultaneously, and then link to a specific merchant's site to make a purchase. VDB technology reduces the time spent looking for specific items by searching through affiliated online merchants and compiling a single list of all the vendors that offer the specified item, plus availability, shipping, pricing and other information helpful for
making product choices.