As the driving force in big data, Tel Aviv Sourasky Medical Center has been maintaining, expanding, and scaling its database for over 10 years, following international and Israeli standards for medical data protection, privacy and confidentiality such as US HIPAA and EU GDPR. This extensive database contains a vast amount of data collected during patient encounters throughout the hospital – from inpatient stays, ambulatory visits, imaging scans, laboratory tests, and other specialties.
Thanks to a major investment, the Medical Center's I-Medata AI Center is expanding the platform to a groundbreaking cloud-based infrastructure, which will support big data projects into the future.
Data ocean sources
The data ocean features data from sources such as: electronic medical records, diagnoses, surgeries, medical procedures, imaging scans, laboratory results, medications, vital signs, management information, operational information, financial data, etc.
Well-architected data infrastructure
The New innovative big data infrastructure integrates and processes data from multiple sources, enabling innovators to develop AI products.
Data infrastructure foundation
Foundation infrastructure: SQL Data Warehouse (DWH)
Populated, managed, enhanced, and scaled up over the past 10+ years
Managed as an operational database
Leverages the ability to integrate, process, and analyze clinical and other information
Built and managed by the Medical Center's Business Intelligence Team
Innovative infrastructure and systems: Sandbox, k2view fabric, MDClone.
Innovative data infrastructure
- Big data database
- Resides on cloud servers
- Defines the individual patient as the core data unit
- Enables innovators to process complex data at high speeds
- Supports creating and running sophisticated AI models
MDClone: Data mining for researchers
MDClone is an infrastructure for managing big data. The system's data structure is different from a regular SQL data warehouse in that the data is not structured as a relational database, but rather arranged along a timeline.
MDClone's benefit is that it empowers researchers and product developers with the ability to easily access data, which supports and promotes Research and Development Department projects and data use.
MDClone user interface
MDClone enables researchers and product developers who don't have SQL knowledge to access and retrieve data from the data warehouse.
The database's minimized structure gives researchers the ability to accurately define the research population and produce big data subsets in minimal time, which shortens the research process significantly.
MDClone: Synthetic data
The MDClone system presents anonymous data and also offers an option to present synthetic data. Synthetic data does not correspond to source data, so there is no way for patients to be
identified. However, from a statistical perspective, the data produces results that are identical to the source data.
K2View Fabric is a state-of-the-art platform for integrating and managing data distributed across multiple source systems. The system provides data to each end user or application in real time.
The benefits of the unified data architecture include the real-time ability to:
- Shorten data query and optimization time
- Conduct queries using external query tools at both the individual logical unit (entity) level and the integrated data level
- Create expert systems
- Run smart algorithms to identify, prioritize, and predict results
The platform presents an alternate approach to combine, store, and retrieve data stored across various systems. The warehouse is structured at the logical unit level, and is synchronized in real time with the core systems.
The logical units represent the core entities of the organization (such as inpatient). Fabric creates a distributed database that combines the data and presents all the data for the logical unit defined.
A sandbox is a development environment designed specifically to solve business problems related to product development. Tel Aviv Sourasky Medical Center uses the R&D Sandbox to solve clinical and operational product development challenges through data manipulation and analysis.
Similar to real sandboxes, where children play, dream, and build to their hearts' content without disturbing their surroundings, the virtual sandbox is a development environment for data science and data analytics that does not disturb real data.
This environment is equipped with a range of "toys" (tools) to perform any and all calculations, processing, data manipulation, integration, and data analyses that the researchers dream up. The environment is located on a private cloud secured by the organization's firewall. This infrastructure empowers researchers to reap the benefits of the cloud and maintain data integrity without compromising information confidentiality or disturbing source systems. The Sandbox contains tools for conducting projects, performing research, and developing data-driven products.
Examples are presented in the Sandbox's Research Rooms.